Hi All,
I have a question on scaling-up/scaling-down flink cluster. As per the documentation, in order to scale-up the cluster, I can add a new taskmanager on the fly and jobmanager can assign work to it. Assuming, I have Flink HA , so in the event of master JobManager failure, how is this taskmanager detail is going to get transferred ? I believe new master will just read the contents from slaves config file. Can anyone give more clarity on how this is done ? Or, Is it union of slaves and the taskmanager's that are added on the fly ? -- Thanks, Deepak |
Hi Deepak,
The job manager doesn't have to know about task managers. They will simply register at the job manager using the provided configuration. In HA mode, they will lookup the currently leading job manager first and then connect to it. The job manager can then assign work. Cheers, Max On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha <[hidden email]> wrote: > Hi All, > I have a question on scaling-up/scaling-down flink cluster. > As per the documentation, in order to scale-up the cluster, I can add a new > taskmanager on the fly and jobmanager can assign work to it. Assuming, I > have Flink HA , so in the event of master JobManager failure, how is this > taskmanager detail is going to get transferred ? I believe new master will > just read the contents from slaves config file. Can anyone give more > clarity on how this is done ? Or, Is it union of slaves and the > taskmanager's that are added on the fly ? > > -- > Thanks, > Deepak |
Hi Deepak!
The "slaves" file is only used by the SSH script to start a standalone cluster. As Max said, TaskManagers register dynamically at the JobManager. Discovery works via: - config in non-HA mode - ZooKeeper in HA mode On Wed, Feb 17, 2016 at 10:11 AM, Maximilian Michels <[hidden email]> wrote: > Hi Deepak, > > The job manager doesn't have to know about task managers. They will > simply register at the job manager using the provided configuration. > In HA mode, they will lookup the currently leading job manager first > and then connect to it. The job manager can then assign work. > > Cheers, > Max > > On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha <[hidden email]> wrote: > > Hi All, > > I have a question on scaling-up/scaling-down flink cluster. > > As per the documentation, in order to scale-up the cluster, I can add a > new > > taskmanager on the fly and jobmanager can assign work to it. Assuming, I > > have Flink HA , so in the event of master JobManager failure, how is this > > taskmanager detail is going to get transferred ? I believe new master > will > > just read the contents from slaves config file. Can anyone give more > > clarity on how this is done ? Or, Is it union of slaves and the > > taskmanager's that are added on the fly ? > > > > -- > > Thanks, > > Deepak > |
Thanks Max and Steven for the response.
On Wednesday, February 17, 2016, Stephan Ewen <[hidden email]> wrote: > Hi Deepak! > > The "slaves" file is only used by the SSH script to start a standalone > cluster. > > As Max said, TaskManagers register dynamically at the JobManager. > > Discovery works via: > - config in non-HA mode > - ZooKeeper in HA mode > > > > On Wed, Feb 17, 2016 at 10:11 AM, Maximilian Michels <[hidden email] > <javascript:;>> wrote: > > > Hi Deepak, > > > > The job manager doesn't have to know about task managers. They will > > simply register at the job manager using the provided configuration. > > In HA mode, they will lookup the currently leading job manager first > > and then connect to it. The job manager can then assign work. > > > > Cheers, > > Max > > > > On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha <[hidden email] > <javascript:;>> wrote: > > > Hi All, > > > I have a question on scaling-up/scaling-down flink cluster. > > > As per the documentation, in order to scale-up the cluster, I can add a > > new > > > taskmanager on the fly and jobmanager can assign work to it. Assuming, > I > > > have Flink HA , so in the event of master JobManager failure, how is > this > > > taskmanager detail is going to get transferred ? I believe new master > > will > > > just read the contents from slaves config file. Can anyone give more > > > clarity on how this is done ? Or, Is it union of slaves and the > > > taskmanager's that are added on the fly ? > > > > > > -- > > > Thanks, > > > Deepak > > > -- Sent from Gmail Mobile |
Sorry for the typo Stephan
On Wednesday, February 17, 2016, Deepak Jha <[hidden email]> wrote: > Thanks Max and Steven for the response. > > On Wednesday, February 17, 2016, Stephan Ewen <[hidden email] > <javascript:_e(%7B%7D,'cvml','[hidden email]');>> wrote: > >> Hi Deepak! >> >> The "slaves" file is only used by the SSH script to start a standalone >> cluster. >> >> As Max said, TaskManagers register dynamically at the JobManager. >> >> Discovery works via: >> - config in non-HA mode >> - ZooKeeper in HA mode >> >> >> >> On Wed, Feb 17, 2016 at 10:11 AM, Maximilian Michels <[hidden email]> >> wrote: >> >> > Hi Deepak, >> > >> > The job manager doesn't have to know about task managers. They will >> > simply register at the job manager using the provided configuration. >> > In HA mode, they will lookup the currently leading job manager first >> > and then connect to it. The job manager can then assign work. >> > >> > Cheers, >> > Max >> > >> > On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha <[hidden email]> >> wrote: >> > > Hi All, >> > > I have a question on scaling-up/scaling-down flink cluster. >> > > As per the documentation, in order to scale-up the cluster, I can add >> a >> > new >> > > taskmanager on the fly and jobmanager can assign work to it. >> Assuming, I >> > > have Flink HA , so in the event of master JobManager failure, how is >> this >> > > taskmanager detail is going to get transferred ? I believe new master >> > will >> > > just read the contents from slaves config file. Can anyone give more >> > > clarity on how this is done ? Or, Is it union of slaves and the >> > > taskmanager's that are added on the fly ? >> > > >> > > -- >> > > Thanks, >> > > Deepak >> > >> > > > -- > Sent from Gmail Mobile > -- Sent from Gmail Mobile |
Hi Max and Stephan,
Does this mean that I can start Flink HA cluster without keeping any entry in "slaves" file ? I'm asking this because then I should not worry about copying public key for password-less ssh in Flink HA cluster.... On Wed, Feb 17, 2016 at 12:38 PM, Deepak Jha <[hidden email]> wrote: > Sorry for the typo Stephan > > > On Wednesday, February 17, 2016, Deepak Jha <[hidden email]> wrote: > >> Thanks Max and Steven for the response. >> >> On Wednesday, February 17, 2016, Stephan Ewen <[hidden email]> wrote: >> >>> Hi Deepak! >>> >>> The "slaves" file is only used by the SSH script to start a standalone >>> cluster. >>> >>> As Max said, TaskManagers register dynamically at the JobManager. >>> >>> Discovery works via: >>> - config in non-HA mode >>> - ZooKeeper in HA mode >>> >>> >>> >>> On Wed, Feb 17, 2016 at 10:11 AM, Maximilian Michels <[hidden email]> >>> wrote: >>> >>> > Hi Deepak, >>> > >>> > The job manager doesn't have to know about task managers. They will >>> > simply register at the job manager using the provided configuration. >>> > In HA mode, they will lookup the currently leading job manager first >>> > and then connect to it. The job manager can then assign work. >>> > >>> > Cheers, >>> > Max >>> > >>> > On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha <[hidden email]> >>> wrote: >>> > > Hi All, >>> > > I have a question on scaling-up/scaling-down flink cluster. >>> > > As per the documentation, in order to scale-up the cluster, I can >>> add a >>> > new >>> > > taskmanager on the fly and jobmanager can assign work to it. >>> Assuming, I >>> > > have Flink HA , so in the event of master JobManager failure, how is >>> this >>> > > taskmanager detail is going to get transferred ? I believe new master >>> > will >>> > > just read the contents from slaves config file. Can anyone give more >>> > > clarity on how this is done ? Or, Is it union of slaves and the >>> > > taskmanager's that are added on the fly ? >>> > > >>> > > -- >>> > > Thanks, >>> > > Deepak >>> > >>> >> >> >> -- >> Sent from Gmail Mobile >> > > > -- > Sent from Gmail Mobile > -- Thanks, Deepak Jha |
No, the "slaves" file is still used to ssh into the machines and start
the task manager processes in the start-cluster.sh script. So you still need password-less ssh into the machines if you want to use that. The task managers discover the job manager via ZooKeeper though (therefore you don't need to configure the jobmanager address in the config). In theory, you can also skip the "slaves" file if you ssh manually into the machines and start the task managers via the taskmanger.sh script, but I don't think that this is what you are looking for. Or are you? – Ufuk On Wed, Feb 17, 2016 at 10:27 PM, Deepak Jha <[hidden email]> wrote: > Hi Max and Stephan, > Does this mean that I can start Flink HA cluster without keeping any entry > in "slaves" file ? I'm asking this because then I should not worry about > copying public key for password-less ssh in Flink HA cluster.... > > On Wed, Feb 17, 2016 at 12:38 PM, Deepak Jha <[hidden email]> wrote: > >> Sorry for the typo Stephan >> >> >> On Wednesday, February 17, 2016, Deepak Jha <[hidden email]> wrote: >> >>> Thanks Max and Steven for the response. >>> >>> On Wednesday, February 17, 2016, Stephan Ewen <[hidden email]> wrote: >>> >>>> Hi Deepak! >>>> >>>> The "slaves" file is only used by the SSH script to start a standalone >>>> cluster. >>>> >>>> As Max said, TaskManagers register dynamically at the JobManager. >>>> >>>> Discovery works via: >>>> - config in non-HA mode >>>> - ZooKeeper in HA mode >>>> >>>> >>>> >>>> On Wed, Feb 17, 2016 at 10:11 AM, Maximilian Michels <[hidden email]> >>>> wrote: >>>> >>>> > Hi Deepak, >>>> > >>>> > The job manager doesn't have to know about task managers. They will >>>> > simply register at the job manager using the provided configuration. >>>> > In HA mode, they will lookup the currently leading job manager first >>>> > and then connect to it. The job manager can then assign work. >>>> > >>>> > Cheers, >>>> > Max >>>> > >>>> > On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha <[hidden email]> >>>> wrote: >>>> > > Hi All, >>>> > > I have a question on scaling-up/scaling-down flink cluster. >>>> > > As per the documentation, in order to scale-up the cluster, I can >>>> add a >>>> > new >>>> > > taskmanager on the fly and jobmanager can assign work to it. >>>> Assuming, I >>>> > > have Flink HA , so in the event of master JobManager failure, how is >>>> this >>>> > > taskmanager detail is going to get transferred ? I believe new master >>>> > will >>>> > > just read the contents from slaves config file. Can anyone give more >>>> > > clarity on how this is done ? Or, Is it union of slaves and the >>>> > > taskmanager's that are added on the fly ? >>>> > > >>>> > > -- >>>> > > Thanks, >>>> > > Deepak >>>> > >>>> >>> >>> >>> -- >>> Sent from Gmail Mobile >>> >> >> >> -- >> Sent from Gmail Mobile >> > > > > -- > Thanks, > Deepak Jha |
Hi Ufuk,
I'm planning to build Flink HA cluster and I may need to autoscale taskmanager based on the the requirement. I feel it makes more sense for me to start each taskmanager & jobmanager individually using taskmanager.sh and jobmanager.sh and let these taskmanager's discover jobmanager's using Zookeeper. The reason being that if I go with updating the slaves, then I have to push these changes in all the machines in the cluster to come to a consistent state.. I will get more flexibility if I can add taskmanager's on-the-fly without any update in rest of the machines in the cluster. I'm going to use Docker & Amazon ECS. So, if there is a change in my artifact, I will build a new cluster but as such each node of the cluster will be immutable. On Wed, Feb 17, 2016 at 11:54 PM, Ufuk Celebi <[hidden email]> wrote: > No, the "slaves" file is still used to ssh into the machines and start > the task manager processes in the start-cluster.sh script. So you > still need password-less ssh into the machines if you want to use > that. The task managers discover the job manager via ZooKeeper though > (therefore you don't need to configure the jobmanager address in the > config). > > In theory, you can also skip the "slaves" file if you ssh manually > into the machines and start the task managers via the taskmanger.sh > script, but I don't think that this is what you are looking for. Or > are you? > > – Ufuk > > > On Wed, Feb 17, 2016 at 10:27 PM, Deepak Jha <[hidden email]> wrote: > > Hi Max and Stephan, > > Does this mean that I can start Flink HA cluster without keeping any > entry > > in "slaves" file ? I'm asking this because then I should not worry about > > copying public key for password-less ssh in Flink HA cluster.... > > > > On Wed, Feb 17, 2016 at 12:38 PM, Deepak Jha <[hidden email]> > wrote: > > > >> Sorry for the typo Stephan > >> > >> > >> On Wednesday, February 17, 2016, Deepak Jha <[hidden email]> > wrote: > >> > >>> Thanks Max and Steven for the response. > >>> > >>> On Wednesday, February 17, 2016, Stephan Ewen <[hidden email]> > wrote: > >>> > >>>> Hi Deepak! > >>>> > >>>> The "slaves" file is only used by the SSH script to start a standalone > >>>> cluster. > >>>> > >>>> As Max said, TaskManagers register dynamically at the JobManager. > >>>> > >>>> Discovery works via: > >>>> - config in non-HA mode > >>>> - ZooKeeper in HA mode > >>>> > >>>> > >>>> > >>>> On Wed, Feb 17, 2016 at 10:11 AM, Maximilian Michels <[hidden email]> > >>>> wrote: > >>>> > >>>> > Hi Deepak, > >>>> > > >>>> > The job manager doesn't have to know about task managers. They will > >>>> > simply register at the job manager using the provided configuration. > >>>> > In HA mode, they will lookup the currently leading job manager first > >>>> > and then connect to it. The job manager can then assign work. > >>>> > > >>>> > Cheers, > >>>> > Max > >>>> > > >>>> > On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha <[hidden email]> > >>>> wrote: > >>>> > > Hi All, > >>>> > > I have a question on scaling-up/scaling-down flink cluster. > >>>> > > As per the documentation, in order to scale-up the cluster, I can > >>>> add a > >>>> > new > >>>> > > taskmanager on the fly and jobmanager can assign work to it. > >>>> Assuming, I > >>>> > > have Flink HA , so in the event of master JobManager failure, how > is > >>>> this > >>>> > > taskmanager detail is going to get transferred ? I believe new > master > >>>> > will > >>>> > > just read the contents from slaves config file. Can anyone give > more > >>>> > > clarity on how this is done ? Or, Is it union of slaves and the > >>>> > > taskmanager's that are added on the fly ? > >>>> > > > >>>> > > -- > >>>> > > Thanks, > >>>> > > Deepak > >>>> > > >>>> > >>> > >>> > >>> -- > >>> Sent from Gmail Mobile > >>> > >> > >> > >> -- > >> Sent from Gmail Mobile > >> > > > > > > > > -- > > Thanks, > > Deepak Jha > -- Thanks, Deepak Jha |
OK, nice! :-) Then you can just skip the "slaves" file and directly
work with the scripts. I'm curious to know if everything works as expected. If you encounter something that seems wrong, let us know. – Ufuk On Fri, Feb 19, 2016 at 9:02 PM, Deepak Jha <[hidden email]> wrote: > Hi Ufuk, > I'm planning to build Flink HA cluster and I may need to autoscale > taskmanager based on the the requirement. I feel it makes more sense for me > to start each taskmanager & jobmanager individually using taskmanager.sh > and jobmanager.sh and let these taskmanager's discover jobmanager's using > Zookeeper. The reason being that if I go with updating the slaves, then I > have to push these changes in all the machines in the cluster to come to a > consistent state.. I will get more flexibility if I can add taskmanager's > on-the-fly without any update in rest of the machines in the cluster. I'm > going to use Docker & Amazon ECS. So, if there is a change in my artifact, > I will build a new cluster but as such each node of the cluster will be > immutable. > > > > On Wed, Feb 17, 2016 at 11:54 PM, Ufuk Celebi <[hidden email]> wrote: > >> No, the "slaves" file is still used to ssh into the machines and start >> the task manager processes in the start-cluster.sh script. So you >> still need password-less ssh into the machines if you want to use >> that. The task managers discover the job manager via ZooKeeper though >> (therefore you don't need to configure the jobmanager address in the >> config). >> >> In theory, you can also skip the "slaves" file if you ssh manually >> into the machines and start the task managers via the taskmanger.sh >> script, but I don't think that this is what you are looking for. Or >> are you? >> >> – Ufuk >> >> >> On Wed, Feb 17, 2016 at 10:27 PM, Deepak Jha <[hidden email]> wrote: >> > Hi Max and Stephan, >> > Does this mean that I can start Flink HA cluster without keeping any >> entry >> > in "slaves" file ? I'm asking this because then I should not worry about >> > copying public key for password-less ssh in Flink HA cluster.... >> > >> > On Wed, Feb 17, 2016 at 12:38 PM, Deepak Jha <[hidden email]> >> wrote: >> > >> >> Sorry for the typo Stephan >> >> >> >> >> >> On Wednesday, February 17, 2016, Deepak Jha <[hidden email]> >> wrote: >> >> >> >>> Thanks Max and Steven for the response. >> >>> >> >>> On Wednesday, February 17, 2016, Stephan Ewen <[hidden email]> >> wrote: >> >>> >> >>>> Hi Deepak! >> >>>> >> >>>> The "slaves" file is only used by the SSH script to start a standalone >> >>>> cluster. >> >>>> >> >>>> As Max said, TaskManagers register dynamically at the JobManager. >> >>>> >> >>>> Discovery works via: >> >>>> - config in non-HA mode >> >>>> - ZooKeeper in HA mode >> >>>> >> >>>> >> >>>> >> >>>> On Wed, Feb 17, 2016 at 10:11 AM, Maximilian Michels <[hidden email]> >> >>>> wrote: >> >>>> >> >>>> > Hi Deepak, >> >>>> > >> >>>> > The job manager doesn't have to know about task managers. They will >> >>>> > simply register at the job manager using the provided configuration. >> >>>> > In HA mode, they will lookup the currently leading job manager first >> >>>> > and then connect to it. The job manager can then assign work. >> >>>> > >> >>>> > Cheers, >> >>>> > Max >> >>>> > >> >>>> > On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha <[hidden email]> >> >>>> wrote: >> >>>> > > Hi All, >> >>>> > > I have a question on scaling-up/scaling-down flink cluster. >> >>>> > > As per the documentation, in order to scale-up the cluster, I can >> >>>> add a >> >>>> > new >> >>>> > > taskmanager on the fly and jobmanager can assign work to it. >> >>>> Assuming, I >> >>>> > > have Flink HA , so in the event of master JobManager failure, how >> is >> >>>> this >> >>>> > > taskmanager detail is going to get transferred ? I believe new >> master >> >>>> > will >> >>>> > > just read the contents from slaves config file. Can anyone give >> more >> >>>> > > clarity on how this is done ? Or, Is it union of slaves and the >> >>>> > > taskmanager's that are added on the fly ? >> >>>> > > >> >>>> > > -- >> >>>> > > Thanks, >> >>>> > > Deepak >> >>>> > >> >>>> >> >>> >> >>> >> >>> -- >> >>> Sent from Gmail Mobile >> >>> >> >> >> >> >> >> -- >> >> Sent from Gmail Mobile >> >> >> > >> > >> > >> > -- >> > Thanks, >> > Deepak Jha >> > > > > -- > Thanks, > Deepak Jha |
Hi Ufuk,
Sure... I will let you know. I'm planning to use centralized zookeeper. That way Flink and ZK will have the separation. On Fri, Feb 19, 2016 at 12:06 PM, Ufuk Celebi <[hidden email]> wrote: > OK, nice! :-) Then you can just skip the "slaves" file and directly > work with the scripts. > > I'm curious to know if everything works as expected. If you encounter > something that seems wrong, let us know. > > – Ufuk > > > On Fri, Feb 19, 2016 at 9:02 PM, Deepak Jha <[hidden email]> wrote: > > Hi Ufuk, > > I'm planning to build Flink HA cluster and I may need to autoscale > > taskmanager based on the the requirement. I feel it makes more sense for > me > > to start each taskmanager & jobmanager individually using taskmanager.sh > > and jobmanager.sh and let these taskmanager's discover jobmanager's using > > Zookeeper. The reason being that if I go with updating the slaves, then I > > have to push these changes in all the machines in the cluster to come to > a > > consistent state.. I will get more flexibility if I can add taskmanager's > > on-the-fly without any update in rest of the machines in the cluster. I'm > > going to use Docker & Amazon ECS. So, if there is a change in my > artifact, > > I will build a new cluster but as such each node of the cluster will be > > immutable. > > > > > > > > On Wed, Feb 17, 2016 at 11:54 PM, Ufuk Celebi <[hidden email]> wrote: > > > >> No, the "slaves" file is still used to ssh into the machines and start > >> the task manager processes in the start-cluster.sh script. So you > >> still need password-less ssh into the machines if you want to use > >> that. The task managers discover the job manager via ZooKeeper though > >> (therefore you don't need to configure the jobmanager address in the > >> config). > >> > >> In theory, you can also skip the "slaves" file if you ssh manually > >> into the machines and start the task managers via the taskmanger.sh > >> script, but I don't think that this is what you are looking for. Or > >> are you? > >> > >> – Ufuk > >> > >> > >> On Wed, Feb 17, 2016 at 10:27 PM, Deepak Jha <[hidden email]> > wrote: > >> > Hi Max and Stephan, > >> > Does this mean that I can start Flink HA cluster without keeping any > >> entry > >> > in "slaves" file ? I'm asking this because then I should not worry > about > >> > copying public key for password-less ssh in Flink HA cluster.... > >> > > >> > On Wed, Feb 17, 2016 at 12:38 PM, Deepak Jha <[hidden email]> > >> wrote: > >> > > >> >> Sorry for the typo Stephan > >> >> > >> >> > >> >> On Wednesday, February 17, 2016, Deepak Jha <[hidden email]> > >> wrote: > >> >> > >> >>> Thanks Max and Steven for the response. > >> >>> > >> >>> On Wednesday, February 17, 2016, Stephan Ewen <[hidden email]> > >> wrote: > >> >>> > >> >>>> Hi Deepak! > >> >>>> > >> >>>> The "slaves" file is only used by the SSH script to start a > standalone > >> >>>> cluster. > >> >>>> > >> >>>> As Max said, TaskManagers register dynamically at the JobManager. > >> >>>> > >> >>>> Discovery works via: > >> >>>> - config in non-HA mode > >> >>>> - ZooKeeper in HA mode > >> >>>> > >> >>>> > >> >>>> > >> >>>> On Wed, Feb 17, 2016 at 10:11 AM, Maximilian Michels < > [hidden email]> > >> >>>> wrote: > >> >>>> > >> >>>> > Hi Deepak, > >> >>>> > > >> >>>> > The job manager doesn't have to know about task managers. They > will > >> >>>> > simply register at the job manager using the provided > configuration. > >> >>>> > In HA mode, they will lookup the currently leading job manager > first > >> >>>> > and then connect to it. The job manager can then assign work. > >> >>>> > > >> >>>> > Cheers, > >> >>>> > Max > >> >>>> > > >> >>>> > On Tue, Feb 16, 2016 at 10:41 PM, Deepak Jha < > [hidden email]> > >> >>>> wrote: > >> >>>> > > Hi All, > >> >>>> > > I have a question on scaling-up/scaling-down flink cluster. > >> >>>> > > As per the documentation, in order to scale-up the cluster, I > can > >> >>>> add a > >> >>>> > new > >> >>>> > > taskmanager on the fly and jobmanager can assign work to it. > >> >>>> Assuming, I > >> >>>> > > have Flink HA , so in the event of master JobManager failure, > how > >> is > >> >>>> this > >> >>>> > > taskmanager detail is going to get transferred ? I believe new > >> master > >> >>>> > will > >> >>>> > > just read the contents from slaves config file. Can anyone give > >> more > >> >>>> > > clarity on how this is done ? Or, Is it union of slaves and the > >> >>>> > > taskmanager's that are added on the fly ? > >> >>>> > > > >> >>>> > > -- > >> >>>> > > Thanks, > >> >>>> > > Deepak > >> >>>> > > >> >>>> > >> >>> > >> >>> > >> >>> -- > >> >>> Sent from Gmail Mobile > >> >>> > >> >> > >> >> > >> >> -- > >> >> Sent from Gmail Mobile > >> >> > >> > > >> > > >> > > >> > -- > >> > Thanks, > >> > Deepak Jha > >> > > > > > > > > -- > > Thanks, > > Deepak Jha > -- Thanks, Deepak Jha |
Free forum by Nabble | Edit this page |