[Discuss] Why different job's tasks can run in the single process.

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss] Why different job's tasks can run in the single process.

Longda Feng
hi ,
Sorry for asking the quest here? Any answer will be apprecated.
Why different job's tasks can run in the single process. (There are some different job's tasks  in one TaskManager).It seems Flink-on-Yarn can let different job  run on different process. But for standalone mode, this problem still exists.
Why design Flink like this?The advantage What I can thought is as following:(1) All task can share bigger memory pool.(2) The communication between the tasks in the same process will be fast.
But this design will impact to the stability. Flink provide User-Define-Function interface, if one of the User-Define-Function crash, It maybe crack the whole JVM, If the TaskManager crash, all other job's task in this TaskManager will be impacted. Even if the JVM don't crash, but maybe lead to some other unexpected problem, what's more this will make the code too sophisticated。Normal framework like Spark/Storm/Samza won't run different job's tasks in the same process。As one normal user, stability has the highest priority. 

ThanksLongda 



Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Why different job's tasks can run in the single process.

Aljoscha Krettek-2
Hi,
yes, you are definitely right that allowing to run multiple user code tasks
in the same TaskManager JVM is not good for stability. This mode is still
there from the very early days of Flink where Yarn was not yet available.
In a production environment I would now recommend to always run one
Flink-Yarn cluster per job to get good isolation between different jobs.

Cheers,
Aljoscha

On Wed, 29 Jun 2016 at 09:18 Longda Feng <[hidden email]>
wrote:

> hi ,
> Sorry for asking the quest here? Any answer will be apprecated.
> Why different job's tasks can run in the single process. (There are some
> different job's tasks  in one TaskManager).It seems Flink-on-Yarn can let
> different job  run on different process. But for standalone mode, this
> problem still exists.
> Why design Flink like this?The advantage What I can thought is as
> following:(1) All task can share bigger memory pool.(2) The communication
> between the tasks in the same process will be fast.
> But this design will impact to the stability. Flink provide
> User-Define-Function interface, if one of the User-Define-Function crash,
> It maybe crack the whole JVM, If the TaskManager crash, all other job's
> task in this TaskManager will be impacted. Even if the JVM don't crash, but
> maybe lead to some other unexpected problem, what's more this will make the
> code too sophisticated。Normal framework like Spark/Storm/Samza won't run
> different job's tasks in the same process。As one normal user, stability has
> the highest priority.
>
> ThanksLongda
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Why different job's tasks can run in the single process.

Longda Feng
In reply to this post by Longda Feng


This means Standalone mode is just for prototype.
But I think we need a lightweight solution for streaming process, standalone is the best solution. Some times, we need setup a flink cluster on a small cluster. setup a yarn cluster isn't convenient. 
(1) in small company, the number of machine is small (2) When a data center is small, but we still need do some computing in this data center(3) some machines are in the while-list, they have been authorited to access some special data or machine, but the number of these machine is small.(4) some machine has critical data, they can't be shared with others, but the number of these machine is small.(5) when a team start to learn flink, he will setup a small cluster firstly, maybe he wo't want to setup a huge system, perfer to a small system


regardsLongda

------------------------------------------------------------------From:Aljoscha Krettek <[hidden email]>Send Time:2016年6月29日(星期三) 21:48To:封仲淹(纪君祥) <[hidden email]>; dev <[hidden email]>Subject:Re: [Discuss] Why different job's tasks can run in the single process.
Hi,
yes, you are definitely right that allowing to run multiple user code tasks
in the same TaskManager JVM is not good for stability. This mode is still
there from the very early days of Flink where Yarn was not yet available.
In a production environment I would now recommend to always run one
Flink-Yarn cluster per job to get good isolation between different jobs.

Cheers,
Aljoscha

On Wed, 29 Jun 2016 at 09:18 Longda Feng <[hidden email]>
wrote:

> hi ,
> Sorry for asking the quest here? Any answer will be apprecated.
> Why different job's tasks can run in the single process. (There are some
> different job's tasks  in one TaskManager).It seems Flink-on-Yarn can let
> different job  run on different process. But for standalone mode, this
> problem still exists.
> Why design Flink like this?The advantage What I can thought is as
> following:(1) All task can share bigger memory pool.(2) The communication
> between the tasks in the same process will be fast.
> But this design will impact to the stability. Flink provide
> User-Define-Function interface, if one of the User-Define-Function crash,
> It maybe crack the whole JVM, If the TaskManager crash, all other job's
> task in this TaskManager will be impacted. Even if the JVM don't crash, but
> maybe lead to some other unexpected problem, what's more this will make the
> code too sophisticated。Normal framework like Spark/Storm/Samza won't run
> different job's tasks in the same process。As one normal user, stability has
> the highest priority.
>
> ThanksLongda
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Why different job's tasks can run in the single process.

Aljoscha Krettek-2
Hi,
no, it is not just for prototyping. Since it is actually the oldest mode of
execution it is also the most stable. You should have no problem using it.

Cheers,
Aljoscha

On Thu, 30 Jun 2016 at 09:54 Longda Feng <[hidden email]>
wrote:

>
>
> This means Standalone mode is just for prototype.
> But I think we need a lightweight solution for streaming process,
> standalone is the best solution. Some times, we need setup a flink cluster
> on a small cluster. setup a yarn cluster isn't convenient.
> (1) in small company, the number of machine is small (2) When a data
> center is small, but we still need do some computing in this data center(3)
> some machines are in the while-list, they have been authorited to access
> some special data or machine, but the number of these machine is small.(4)
> some machine has critical data, they can't be shared with others, but the
> number of these machine is small.(5) when a team start to learn flink, he
> will setup a small cluster firstly, maybe he wo't want to setup a huge
> system, perfer to a small system
>
>
> regardsLongda
>
> ------------------------------------------------------------------From:Aljoscha
> Krettek <[hidden email]>Send Time:2016年6月29日(星期三) 21:48To:封仲淹(纪君祥) <
> [hidden email]>; dev <[hidden email]>Subject:Re:
> [Discuss] Why different job's tasks can run in the single process.
> Hi,
> yes, you are definitely right that allowing to run multiple user code tasks
> in the same TaskManager JVM is not good for stability. This mode is still
> there from the very early days of Flink where Yarn was not yet available.
> In a production environment I would now recommend to always run one
> Flink-Yarn cluster per job to get good isolation between different jobs.
>
> Cheers,
> Aljoscha
>
> On Wed, 29 Jun 2016 at 09:18 Longda Feng <[hidden email]>
> wrote:
>
> > hi ,
> > Sorry for asking the quest here? Any answer will be apprecated.
> > Why different job's tasks can run in the single process. (There are some
> > different job's tasks  in one TaskManager).It seems Flink-on-Yarn can let
> > different job  run on different process. But for standalone mode, this
> > problem still exists.
> > Why design Flink like this?The advantage What I can thought is as
> > following:(1) All task can share bigger memory pool.(2) The communication
> > between the tasks in the same process will be fast.
> > But this design will impact to the stability. Flink provide
> > User-Define-Function interface, if one of the User-Define-Function crash,
> > It maybe crack the whole JVM, If the TaskManager crash, all other job's
>
> > task in this TaskManager will be impacted. Even if the JVM don't crash, but
>
> > maybe lead to some other unexpected problem, what's more this will make the
> > code too sophisticated。Normal framework like Spark/Storm/Samza won't run
>
> > different job's tasks in the same process。As one normal user, stability has
> > the highest priority.
> >
> > ThanksLongda
> >
> >
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Why different job's tasks can run in the single process.

Kevin Jacobs
In reply to this post by Longda Feng
In my opinion the streaming process can be perfectly simulated on a
single node. You can setup a message distribution system like Kafka on a
single node, you can run Spark on a single node and the only thing you
need to change when running it on a cluster is that you need to change
the environment. So there is no need to setup a cluster when testing the
streaming process.

Regards,
Kevin

On 30-06-16 09:54, Longda Feng wrote:

>
> This means Standalone mode is just for prototype.
> But I think we need a lightweight solution for streaming process, standalone is the best solution. Some times, we need setup a flink cluster on a small cluster. setup a yarn cluster isn't convenient.
> (1) in small company, the number of machine is small (2) When a data center is small, but we still need do some computing in this data center(3) some machines are in the while-list, they have been authorited to access some special data or machine, but the number of these machine is small.(4) some machine has critical data, they can't be shared with others, but the number of these machine is small.(5) when a team start to learn flink, he will setup a small cluster firstly, maybe he wo't want to setup a huge system, perfer to a small system
>
>
> regardsLongda
>
> ------------------------------------------------------------------From:Aljoscha Krettek <[hidden email]>Send Time:2016年6月29日(星期三) 21:48To:封仲淹(纪君祥) <[hidden email]>; dev <[hidden email]>Subject:Re: [Discuss] Why different job's tasks can run in the single process.
> Hi,
> yes, you are definitely right that allowing to run multiple user code tasks
> in the same TaskManager JVM is not good for stability. This mode is still
> there from the very early days of Flink where Yarn was not yet available.
> In a production environment I would now recommend to always run one
> Flink-Yarn cluster per job to get good isolation between different jobs.
>
> Cheers,
> Aljoscha
>
> On Wed, 29 Jun 2016 at 09:18 Longda Feng <[hidden email]>
> wrote:
>
>>   hi ,
>>   Sorry for asking the quest here? Any answer will be apprecated.
>>   Why different job's tasks can run in the single process. (There are some
>>   different job's tasks  in one TaskManager).It seems Flink-on-Yarn can let
>>   different job  run on different process. But for standalone mode, this
>>   problem still exists.
>>   Why design Flink like this?The advantage What I can thought is as
>>   following:(1) All task can share bigger memory pool.(2) The communication
>>   between the tasks in the same process will be fast.
>>   But this design will impact to the stability. Flink provide
>>   User-Define-Function interface, if one of the User-Define-Function crash,
>>   It maybe crack the whole JVM, If the TaskManager crash, all other job's
>>   task in this TaskManager will be impacted. Even if the JVM don't crash, but
>>   maybe lead to some other unexpected problem, what's more this will make the
>>   code too sophisticated。Normal framework like Spark/Storm/Samza won't run
>>   different job's tasks in the same process。As one normal user, stability has
>>   the highest priority.
>>
>>   ThanksLongda
>>
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Why different job's tasks can run in the single process.

Longda Feng
In reply to this post by Longda Feng

What I means is standalone mode is not suitable for production enironment. It is just for testing or prototype.

regardsLongda
------------------------------------------------------------------From:Kevin Jacobs <[hidden email]>Send Time:2016年6月30日(星期四) 16:20To:dev <[hidden email]>; 封仲淹(纪君祥) <[hidden email]>Subject:Re: [Discuss] Why different job's tasks can run in the single process.
In my opinion the streaming process can be perfectly simulated on a 
single node. You can setup a message distribution system like Kafka on a 
single node, you can run Spark on a single node and the only thing you 
need to change when running it on a cluster is that you need to change 
the environment. So there is no need to setup a cluster when testing the 
streaming process.

Regards,
Kevin

On 30-06-16 09:54, Longda Feng wrote:

>
> This means Standalone mode is just for prototype.
> But I think we need a lightweight solution for streaming process, standalone is the best solution. Some times, we need setup a flink cluster on a small cluster. setup a yarn cluster isn't convenient.
> (1) in small company, the number of machine is small (2) When a data center is small, but we still need do some computing in this data center(3) some machines are in the while-list, they have been authorited to access some special data or machine, but the number of these machine is small.(4) some machine has critical data, they can't be shared with others, but the number of these machine is small.(5) when a team start to learn flink, he will setup a small cluster firstly, maybe he wo't want to setup a huge system, perfer to a small system
>
>
> regardsLongda
>
> ------------------------------------------------------------------From:Aljoscha Krettek <[hidden email]>Send Time:2016年6月29日(星期三) 21:48To:封仲淹(纪君祥) <[hidden email]>; dev <[hidden email]>Subject:Re: [Discuss] Why different job's tasks can run in the single process.
> Hi,
> yes, you are definitely right that allowing to run multiple user code tasks
> in the same TaskManager JVM is not good for stability. This mode is still
> there from the very early days of Flink where Yarn was not yet available.
> In a production environment I would now recommend to always run one
> Flink-Yarn cluster per job to get good isolation between different jobs.
>
> Cheers,
> Aljoscha
>
> On Wed, 29 Jun 2016 at 09:18 Longda Feng <[hidden email]>
> wrote:
>
>>   hi ,
>>   Sorry for asking the quest here? Any answer will be apprecated.
>>   Why different job's tasks can run in the single process. (There are some
>>   different job's tasks  in one TaskManager).It seems Flink-on-Yarn can let
>>   different job  run on different process. But for standalone mode, this
>>   problem still exists.
>>   Why design Flink like this?The advantage What I can thought is as
>>   following:(1) All task can share bigger memory pool.(2) The communication
>>   between the tasks in the same process will be fast.
>>   But this design will impact to the stability. Flink provide
>>   User-Define-Function interface, if one of the User-Define-Function crash,
>>   It maybe crack the whole JVM, If the TaskManager crash, all other job's
>>   task in this TaskManager will be impacted. Even if the JVM don't crash, but
>>   maybe lead to some other unexpected problem, what's more this will make the
>>   code too sophisticated。Normal framework like Spark/Storm/Samza won't run
>>   different job's tasks in the same process。As one normal user, stability has
>>   the highest priority.
>>
>>   ThanksLongda
>>
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Why different job's tasks can run in the single process.

伍翀(云邪)
In reply to this post by Kevin Jacobs
As standalone mode has the disadvantage that the TaskManager JVM can’t isolate different jobs. Does we have any plan to improve this ?

- Jark Wu

> 在 2016年6月30日,下午4:19,Kevin Jacobs <[hidden email]> 写道:
>
> In my opinion the streaming process can be perfectly simulated on a single node. You can setup a message distribution system like Kafka on a single node, you can run Spark on a single node and the only thing you need to change when running it on a cluster is that you need to change the environment. So there is no need to setup a cluster when testing the streaming process.
>
> Regards,
> Kevin
>
> On 30-06-16 09:54, Longda Feng wrote:
>>
>> This means Standalone mode is just for prototype.
>> But I think we need a lightweight solution for streaming process, standalone is the best solution. Some times, we need setup a flink cluster on a small cluster. setup a yarn cluster isn't convenient.
>> (1) in small company, the number of machine is small (2) When a data center is small, but we still need do some computing in this data center(3) some machines are in the while-list, they have been authorited to access some special data or machine, but the number of these machine is small.(4) some machine has critical data, they can't be shared with others, but the number of these machine is small.(5) when a team start to learn flink, he will setup a small cluster firstly, maybe he wo't want to setup a huge system, perfer to a small system
>>
>>
>> regardsLongda
>>
>> ------------------------------------------------------------------From:Aljoscha Krettek <[hidden email]>Send Time:2016年6月29日(星期三) 21:48To:封仲淹(纪君祥) <[hidden email]>; dev <[hidden email]>Subject:Re: [Discuss] Why different job's tasks can run in the single process.
>> Hi,
>> yes, you are definitely right that allowing to run multiple user code tasks
>> in the same TaskManager JVM is not good for stability. This mode is still
>> there from the very early days of Flink where Yarn was not yet available.
>> In a production environment I would now recommend to always run one
>> Flink-Yarn cluster per job to get good isolation between different jobs.
>>
>> Cheers,
>> Aljoscha
>>
>> On Wed, 29 Jun 2016 at 09:18 Longda Feng <[hidden email]>
>> wrote:
>>
>>>  hi ,
>>>  Sorry for asking the quest here? Any answer will be apprecated.
>>>  Why different job's tasks can run in the single process. (There are some
>>>  different job's tasks  in one TaskManager).It seems Flink-on-Yarn can let
>>>  different job  run on different process. But for standalone mode, this
>>>  problem still exists.
>>>  Why design Flink like this?The advantage What I can thought is as
>>>  following:(1) All task can share bigger memory pool.(2) The communication
>>>  between the tasks in the same process will be fast.
>>>  But this design will impact to the stability. Flink provide
>>>  User-Define-Function interface, if one of the User-Define-Function crash,
>>>  It maybe crack the whole JVM, If the TaskManager crash, all other job's
>>>  task in this TaskManager will be impacted. Even if the JVM don't crash, but
>>>  maybe lead to some other unexpected problem, what's more this will make the
>>>  code too sophisticated。Normal framework like Spark/Storm/Samza won't run
>>>  different job's tasks in the same process。As one normal user, stability has
>>>  the highest priority.
>>>
>>>  ThanksLongda
>>>
>>>
>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] Why different job's tasks can run in the single process.

Aljoscha Krettek-2
I'm not aware of any plans to change this. It would be possible though, to
add a mode where a standalone cluster only accepts a single job.

On Fri, 1 Jul 2016 at 04:50 Jark Wu <[hidden email]> wrote:

> As standalone mode has the disadvantage that the TaskManager JVM can’t
> isolate different jobs. Does we have any plan to improve this ?
>
> - Jark Wu
>
> > 在 2016年6月30日,下午4:19,Kevin Jacobs <[hidden email]> 写道:
> >
> > In my opinion the streaming process can be perfectly simulated on a
> single node. You can setup a message distribution system like Kafka on a
> single node, you can run Spark on a single node and the only thing you need
> to change when running it on a cluster is that you need to change the
> environment. So there is no need to setup a cluster when testing the
> streaming process.
> >
> > Regards,
> > Kevin
> >
> > On 30-06-16 09:54, Longda Feng wrote:
> >>
> >> This means Standalone mode is just for prototype.
> >> But I think we need a lightweight solution for streaming process,
> standalone is the best solution. Some times, we need setup a flink cluster
> on a small cluster. setup a yarn cluster isn't convenient.
> >> (1) in small company, the number of machine is small (2) When a data
> center is small, but we still need do some computing in this data center(3)
> some machines are in the while-list, they have been authorited to access
> some special data or machine, but the number of these machine is small.(4)
> some machine has critical data, they can't be shared with others, but the
> number of these machine is small.(5) when a team start to learn flink, he
> will setup a small cluster firstly, maybe he wo't want to setup a huge
> system, perfer to a small system
> >>
> >>
> >> regardsLongda
> >>
> >>
> ------------------------------------------------------------------From:Aljoscha
> Krettek <[hidden email]>Send Time:2016年6月29日(星期三) 21:48To:封仲淹(纪君祥) <
> [hidden email]>; dev <[hidden email]>Subject:Re:
> [Discuss] Why different job's tasks can run in the single process.
> >> Hi,
> >> yes, you are definitely right that allowing to run multiple user code
> tasks
> >> in the same TaskManager JVM is not good for stability. This mode is
> still
> >> there from the very early days of Flink where Yarn was not yet
> available.
> >> In a production environment I would now recommend to always run one
> >> Flink-Yarn cluster per job to get good isolation between different jobs.
> >>
> >> Cheers,
> >> Aljoscha
> >>
> >> On Wed, 29 Jun 2016 at 09:18 Longda Feng <[hidden email]
> >
> >> wrote:
> >>
> >>>  hi ,
> >>>  Sorry for asking the quest here? Any answer will be apprecated.
> >>>  Why different job's tasks can run in the single process. (There are
> some
> >>>  different job's tasks  in one TaskManager).It seems Flink-on-Yarn can
> let
> >>>  different job  run on different process. But for standalone mode, this
> >>>  problem still exists.
> >>>  Why design Flink like this?The advantage What I can thought is as
> >>>  following:(1) All task can share bigger memory pool.(2) The
> communication
> >>>  between the tasks in the same process will be fast.
> >>>  But this design will impact to the stability. Flink provide
> >>>  User-Define-Function interface, if one of the User-Define-Function
> crash,
> >>>  It maybe crack the whole JVM, If the TaskManager crash, all other
> job's
> >>>  task in this TaskManager will be impacted. Even if the JVM don't
> crash, but
> >>>  maybe lead to some other unexpected problem, what's more this will
> make the
> >>>  code too sophisticated。Normal framework like Spark/Storm/Samza won't
> run
> >>>  different job's tasks in the same process。As one normal user,
> stability has
> >>>  the highest priority.
> >>>
> >>>  ThanksLongda
> >>>
> >>>
> >>>
> >>>
>
>