FLIP-6 and running many "small" jobs

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

FLIP-6 and running many "small" jobs

Maciek Próchniak
Hi,

we're looking at FLIP-6 and while it looks really great we started to
wonder how it fits in our use case.

We currently have around 20 processes but the idea is to have many more
of them. Many of them are pretty "small" - them don't large sources, are
stateless, mainly filtering data.

As I understand, FLIP-6 makes job even more heavyweight thing than today
- e.g. each job will have it's own jobmanager process etc.

Our concern is that each job will now require more resources - e.g. the
number of threads, memory and so on. We are thinking about a way to make
some jobs share these resources - of course that mean they won't be
really isolated from each other.

So far the only idea we see is deploying these small jobs together, as
one job - but this leads to some problems, like how to track which
version is really deployed (we talk about stateless processes so the
only problem is maintaining source kafka offsets)

Unfortunatelly our jobs can have many different sources and outcomes, so
we don't think doing sth similar to King&RBEA would work for us...

Do you have any views/ideas about such use case? Or is common view that
we should deploy our stuff to mesos and let it handle resource
allocation? But still - for some jobs we'd need sth like "1/4" slot :)

thanks,

maciek

mxm
Reply | Threaded
Open this post in threaded view
|

Re: FLIP-6 and running many "small" jobs

mxm
Hi Maciek,

Your use case will be covered by the FLIP-6 "Sessions". Sessions are
similar to how the on-premise Flink or the Yarn session operates
today. We will have a long-running dispatcher, resource manager, and
task managers. We will bring up a job manager for each job but the
overhead for this one node (non HA) is relatively little if you have a
cluster with many nodes. After all, the resource intensive computation
is performed by the task managers. The job manager is only responsible
for coordinating the job execution.

Note that the dispatcher hosts the web UI and is responsible for
taking care of the job submission. The role of the resource manager
changes slightly to span across jobs. Task managers have always been
able to serve multiple jobs. Dispatcher, resource manager and task
managers live across jobs within a session.

In my opinion, you won't have to change you use pattern once FLIP-6 is
ready, which is targeted for Flink 1.3.0.

-Max


On Thu, Oct 20, 2016 at 10:07 AM, Maciek Próchniak <[hidden email]> wrote:

> Hi,
>
> we're looking at FLIP-6 and while it looks really great we started to wonder
> how it fits in our use case.
>
> We currently have around 20 processes but the idea is to have many more of
> them. Many of them are pretty "small" - them don't large sources, are
> stateless, mainly filtering data.
>
> As I understand, FLIP-6 makes job even more heavyweight thing than today -
> e.g. each job will have it's own jobmanager process etc.
>
> Our concern is that each job will now require more resources - e.g. the
> number of threads, memory and so on. We are thinking about a way to make
> some jobs share these resources - of course that mean they won't be really
> isolated from each other.
>
> So far the only idea we see is deploying these small jobs together, as one
> job - but this leads to some problems, like how to track which version is
> really deployed (we talk about stateless processes so the only problem is
> maintaining source kafka offsets)
>
> Unfortunatelly our jobs can have many different sources and outcomes, so we
> don't think doing sth similar to King&RBEA would work for us...
>
> Do you have any views/ideas about such use case? Or is common view that we
> should deploy our stuff to mesos and let it handle resource allocation? But
> still - for some jobs we'd need sth like "1/4" slot :)
>
> thanks,
>
> maciek
>
Reply | Threaded
Open this post in threaded view
|

Re: FLIP-6 and running many "small" jobs

Maciek Próchniak
Hi Max,

thanks for answer.

I still have to wrap my head around it, but I hope we'll manage to work
it out - maybe when 1.3.x arrives I'll have access to some nice mesos
cluster... or not... we'll see :)

thanks,

maciek


On 25/10/2016 17:49, Maximilian Michels wrote:

> Hi Maciek,
>
> Your use case will be covered by the FLIP-6 "Sessions". Sessions are
> similar to how the on-premise Flink or the Yarn session operates
> today. We will have a long-running dispatcher, resource manager, and
> task managers. We will bring up a job manager for each job but the
> overhead for this one node (non HA) is relatively little if you have a
> cluster with many nodes. After all, the resource intensive computation
> is performed by the task managers. The job manager is only responsible
> for coordinating the job execution.
>
> Note that the dispatcher hosts the web UI and is responsible for
> taking care of the job submission. The role of the resource manager
> changes slightly to span across jobs. Task managers have always been
> able to serve multiple jobs. Dispatcher, resource manager and task
> managers live across jobs within a session.
>
> In my opinion, you won't have to change you use pattern once FLIP-6 is
> ready, which is targeted for Flink 1.3.0.
>
> -Max
>
>
> On Thu, Oct 20, 2016 at 10:07 AM, Maciek Próchniak <[hidden email]> wrote:
>> Hi,
>>
>> we're looking at FLIP-6 and while it looks really great we started to wonder
>> how it fits in our use case.
>>
>> We currently have around 20 processes but the idea is to have many more of
>> them. Many of them are pretty "small" - them don't large sources, are
>> stateless, mainly filtering data.
>>
>> As I understand, FLIP-6 makes job even more heavyweight thing than today -
>> e.g. each job will have it's own jobmanager process etc.
>>
>> Our concern is that each job will now require more resources - e.g. the
>> number of threads, memory and so on. We are thinking about a way to make
>> some jobs share these resources - of course that mean they won't be really
>> isolated from each other.
>>
>> So far the only idea we see is deploying these small jobs together, as one
>> job - but this leads to some problems, like how to track which version is
>> really deployed (we talk about stateless processes so the only problem is
>> maintaining source kafka offsets)
>>
>> Unfortunatelly our jobs can have many different sources and outcomes, so we
>> don't think doing sth similar to King&RBEA would work for us...
>>
>> Do you have any views/ideas about such use case? Or is common view that we
>> should deploy our stuff to mesos and let it handle resource allocation? But
>> still - for some jobs we'd need sth like "1/4" slot :)
>>
>> thanks,
>>
>> maciek
>>