[QUESTION] thread model in Flink makes me confused

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[QUESTION] thread model in Flink makes me confused

伍翀(云邪)

As I know, Flink uses thread model, that means one TaskManager process may run many different operator threads from different jobs. So tasks from different jobs will compete for memory and CPU in the one process. In the worst case scenario, the bad job will eat most of CPU and memroy which may lead to OOM, and then the regular job died too. And there's another problem, tasks from different jobs will print there logs into the same file(the taskmanager log file). This increases the difficulty of debugging.

As I know, Storm will spawn workers for every job. The tasks in one worker belong to the same job. So I'm confused the purpose or advantages of Flink design. One more question, is there any tips to solves the issues above? Or any suggestions to implemention the similar desgin with Storm ? 

Thank you for any answers in advance!

Regards,
Jark Wu



Reply | Threaded
Open this post in threaded view
|

Re: [QUESTION] thread model in Flink makes me confused

Eron Wright
One option is to use a separate cluster (JobManager + TaskManagers) for each job.   This is fairly straightforward with the YARN support - "flink run” can launch a cluster for a job and tear it down afterwards.

Of course this means you must deploy YARN.   That doesn’t necessarily imply HDFS though a Hadoop-compatible filesystem (HCFS) is needed to support the YARN staging directory.

This approach also facilitates richer scheduling and multi-user scenarios.  

One downside is the loss of a unified web UI to view all jobs.


> On May 11, 2016, at 8:32 AM, Jark Wu <[hidden email]> wrote:
>
>
> As I know, Flink uses thread model, that means one TaskManager process may run many different operator threads from different jobs. So tasks from different jobs will compete for memory and CPU in the one process. In the worst case scenario, the bad job will eat most of CPU and memroy which may lead to OOM, and then the regular job died too. And there's another problem, tasks from different jobs will print there logs into the same file(the taskmanager log file). This increases the difficulty of debugging.
>
> As I know, Storm will spawn workers for every job. The tasks in one worker belong to the same job. So I'm confused the purpose or advantages of Flink design. One more question, is there any tips to solves the issues above? Or any suggestions to implemention the similar desgin with Storm ?
>
> Thank you for any answers in advance!
>
> Regards,
> Jark Wu
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [QUESTION] thread model in Flink makes me confused

Aljoscha Krettek-2
I favor the one-cluster-per job approach. If this becomes the dominant
approach to doing things we could also think about introducing a separate
component that would allow monitoring the jobs in these per-job clusters as
is now possible when running multiple jobs in a single cluster.

On Thu, 12 May 2016 at 01:59 Wright, Eron <[hidden email]> wrote:

> One option is to use a separate cluster (JobManager + TaskManagers) for
> each job.   This is fairly straightforward with the YARN support - "flink
> run” can launch a cluster for a job and tear it down afterwards.
>
> Of course this means you must deploy YARN.   That doesn’t necessarily
> imply HDFS though a Hadoop-compatible filesystem (HCFS) is needed to
> support the YARN staging directory.
>
> This approach also facilitates richer scheduling and multi-user scenarios.
>
> One downside is the loss of a unified web UI to view all jobs.
>
>
> > On May 11, 2016, at 8:32 AM, Jark Wu <[hidden email]> wrote:
> >
> >
> > As I know, Flink uses thread model, that means one TaskManager process
> may run many different operator threads from different jobs. So tasks from
> different jobs will compete for memory and CPU in the one process. In the
> worst case scenario, the bad job will eat most of CPU and memroy which may
> lead to OOM, and then the regular job died too. And there's another
> problem, tasks from different jobs will print there logs into the same
> file(the taskmanager log file). This increases the difficulty of debugging.
> >
> > As I know, Storm will spawn workers for every job. The tasks in one
> worker belong to the same job. So I'm confused the purpose or advantages of
> Flink design. One more question, is there any tips to solves the issues
> above? Or any suggestions to implemention the similar desgin with Storm ?
> >
> > Thank you for any answers in advance!
> >
> > Regards,
> > Jark Wu
> >
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [QUESTION] thread model in Flink makes me confused

Flavio Pompermaier
That would be definitely awesome (and useful also for us)! +1


On Thu, May 12, 2016 at 7:38 AM, Aljoscha Krettek <[hidden email]>
wrote:

> I favor the one-cluster-per job approach. If this becomes the dominant
> approach to doing things we could also think about introducing a separate
> component that would allow monitoring the jobs in these per-job clusters as
> is now possible when running multiple jobs in a single cluster.
>
> On Thu, 12 May 2016 at 01:59 Wright, Eron <[hidden email]> wrote:
>
> > One option is to use a separate cluster (JobManager + TaskManagers) for
> > each job.   This is fairly straightforward with the YARN support - "flink
> > run” can launch a cluster for a job and tear it down afterwards.
> >
> > Of course this means you must deploy YARN.   That doesn’t necessarily
> > imply HDFS though a Hadoop-compatible filesystem (HCFS) is needed to
> > support the YARN staging directory.
> >
> > This approach also facilitates richer scheduling and multi-user
> scenarios.
> >
> > One downside is the loss of a unified web UI to view all jobs.
> >
> >
> > > On May 11, 2016, at 8:32 AM, Jark Wu <[hidden email]>
> wrote:
> > >
> > >
> > > As I know, Flink uses thread model, that means one TaskManager process
> > may run many different operator threads from different jobs. So tasks
> from
> > different jobs will compete for memory and CPU in the one process. In the
> > worst case scenario, the bad job will eat most of CPU and memroy which
> may
> > lead to OOM, and then the regular job died too. And there's another
> > problem, tasks from different jobs will print there logs into the same
> > file(the taskmanager log file). This increases the difficulty of
> debugging.
> > >
> > > As I know, Storm will spawn workers for every job. The tasks in one
> > worker belong to the same job. So I'm confused the purpose or advantages
> of
> > Flink design. One more question, is there any tips to solves the issues
> > above? Or any suggestions to implemention the similar desgin with Storm ?
> > >
> > > Thank you for any answers in advance!
> > >
> > > Regards,
> > > Jark Wu
> > >
> > >
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [QUESTION] thread model in Flink makes me confused

Eron Wright
In reply to this post by Aljoscha Krettek-2
Funny you should say that, because in a recent discussion with Stephan and Jamie, we talked about reworking the web UI to talk to numerous job managers.   I’ve been looking into is as part of the Mesos work (FLINK-1984).  I’ll start a new thread about it soon.

> On May 11, 2016, at 10:38 PM, Aljoscha Krettek <[hidden email]> wrote:
>
> I favor the one-cluster-per job approach. If this becomes the dominant
> approach to doing things we could also think about introducing a separate
> component that would allow monitoring the jobs in these per-job clusters as
> is now possible when running multiple jobs in a single cluster.
>
> On Thu, 12 May 2016 at 01:59 Wright, Eron <[hidden email]> wrote:
>
>> One option is to use a separate cluster (JobManager + TaskManagers) for
>> each job.   This is fairly straightforward with the YARN support - "flink
>> run” can launch a cluster for a job and tear it down afterwards.
>>
>> Of course this means you must deploy YARN.   That doesn’t necessarily
>> imply HDFS though a Hadoop-compatible filesystem (HCFS) is needed to
>> support the YARN staging directory.
>>
>> This approach also facilitates richer scheduling and multi-user scenarios.
>>
>> One downside is the loss of a unified web UI to view all jobs.
>>
>>
>>> On May 11, 2016, at 8:32 AM, Jark Wu <[hidden email]> wrote:
>>>
>>>
>>> As I know, Flink uses thread model, that means one TaskManager process
>> may run many different operator threads from different jobs. So tasks from
>> different jobs will compete for memory and CPU in the one process. In the
>> worst case scenario, the bad job will eat most of CPU and memroy which may
>> lead to OOM, and then the regular job died too. And there's another
>> problem, tasks from different jobs will print there logs into the same
>> file(the taskmanager log file). This increases the difficulty of debugging.
>>>
>>> As I know, Storm will spawn workers for every job. The tasks in one
>> worker belong to the same job. So I'm confused the purpose or advantages of
>> Flink design. One more question, is there any tips to solves the issues
>> above? Or any suggestions to implemention the similar desgin with Storm ?
>>>
>>> Thank you for any answers in advance!
>>>
>>> Regards,
>>> Jark Wu
>>>
>>>
>>>
>>
>>