Question on Task Chaining

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on Task Chaining

Isha Arkatkar
Hi all,

   I am new to flink and wanted to understand effects of setting task
chaining and resource group allocation. I tried following configurations,
with task slots set to 2:

1. map(..).disableChaining.filter(..)
     In this case, I can see 2 vertices for this application, both on the
same node. Do  these two operators share JVM? If not, does it go through
serialization flow?

2. If I do not specify disableChaining, does that mean the map and filter
operations run on the same thread or on the same JVM?

Thanks!
Isha
Reply | Threaded
Open this post in threaded view
|

Re: Question on Task Chaining

Stephan Ewen
Hi!

Chained tasks run in the same thread, and there is no serialization
involved. The FilterFunction is directly called on the result of the
MapFunction.

Records between non-chained tasks always go through the serialization stack.

Even in the case of non-chained tasks, the different operators share a slot
in the TaskManager, which means they are executed in the same JVM. If you
want to be sure, check the web dashboard. You can click in the map and on
the filter operator to check where all the subtasks run. If the subtask (n)
from the map and filer function have the same host/port for the
TaskManager, then they are in the same JVM.

To prevent sharing a slot, use the "startNewResourceGroup()" call. In that
case, the FilterFunction will be in a different slot than the MapFunction.
The system will try to co-locate them (locality aware scheduling), so they
may end up in the same JVM if that TaskManager had more free slots. If not,
they run in different JVMs.

Hope that helps...

Greetings,
Stephan

On Sat, Nov 7, 2015 at 2:26 AM, Isha Arkatkar <[hidden email]> wrote:

> Hi all,
>
>    I am new to flink and wanted to understand effects of setting task
> chaining and resource group allocation. I tried following configurations,
> with task slots set to 2:
>
> 1. map(..).disableChaining.filter(..)
>      In this case, I can see 2 vertices for this application, both on the
> same node. Do  these two operators share JVM? If not, does it go through
> serialization flow?
>
> 2. If I do not specify disableChaining, does that mean the map and filter
> operations run on the same thread or on the same JVM?
>
> Thanks!
> Isha
>
Reply | Threaded
Open this post in threaded view
|

Re: Question on Task Chaining

Isha Arkatkar
Thanks Stephan! That helps :)

Regards,
Isha

On Sat, Nov 7, 2015 at 7:08 AM, Stephan Ewen <[hidden email]> wrote:

> Hi!
>
> Chained tasks run in the same thread, and there is no serialization
> involved. The FilterFunction is directly called on the result of the
> MapFunction.
>
> Records between non-chained tasks always go through the serialization
> stack.
>
> Even in the case of non-chained tasks, the different operators share a slot
> in the TaskManager, which means they are executed in the same JVM. If you
> want to be sure, check the web dashboard. You can click in the map and on
> the filter operator to check where all the subtasks run. If the subtask (n)
> from the map and filer function have the same host/port for the
> TaskManager, then they are in the same JVM.
>
> To prevent sharing a slot, use the "startNewResourceGroup()" call. In that
> case, the FilterFunction will be in a different slot than the MapFunction.
> The system will try to co-locate them (locality aware scheduling), so they
> may end up in the same JVM if that TaskManager had more free slots. If not,
> they run in different JVMs.
>
> Hope that helps...
>
> Greetings,
> Stephan
>
> On Sat, Nov 7, 2015 at 2:26 AM, Isha Arkatkar <[hidden email]>
> wrote:
>
> > Hi all,
> >
> >    I am new to flink and wanted to understand effects of setting task
> > chaining and resource group allocation. I tried following configurations,
> > with task slots set to 2:
> >
> > 1. map(..).disableChaining.filter(..)
> >      In this case, I can see 2 vertices for this application, both on the
> > same node. Do  these two operators share JVM? If not, does it go through
> > serialization flow?
> >
> > 2. If I do not specify disableChaining, does that mean the map and filter
> > operations run on the same thread or on the same JVM?
> >
> > Thanks!
> > Isha
> >
>