Hi All,
We would like to start a discussion thread about a new feature called Flink Driver. A brief summary is following. As mentioned in the discussion of Interactive Programming, user applications might consist of multiple jobs and take long to finish. Currently, when Flink runs applications with multiple jobs, the application will run in a local process which is responsible for submitting the jobs. That local process will not exit until the whole application has finished. Users have to keep eyes on the local process in case it is killed due to connection lost, session timeout, local operating system problem, etc. To solve the problem, we would like to introduce the Flink Driver. Users can submit applications using driver mode. A Flink driver job will be submitted to take care of the job submissions in the user application. For more details about flink driver, please refer to the doc: https://docs.google.com/document/d/1dJnDOgRae9FzoBzXcsGoutGB1YjTi1iqG6d1Ht61EnY Any comments and suggestions will be highly appreciated. Best Regards, Shuiqiang |
Thanks for proposing this design document Shuiqiang. It is a very
interesting idea how to solve the problem of running multiple Flink jobs as part of a single application. I like the idea since it does not require many runtime changes apart from a session concept on the Dispatcher and it would work in all environments. One thing which is not fully clear to me is the failover behavior in case of driver job faults. Would it be possible to recover the driver job or would a driver job fault directly transition the application into a globally terminal state? Apart from that, I left some minor comments in the design document. Cheers, Till On Tue, Apr 23, 2019 at 10:04 AM Shuiqiang Chen <[hidden email]> wrote: > Hi All, > > We would like to start a discussion thread about a new feature called Flink > Driver. A brief summary is following. > > As mentioned in the discussion of Interactive Programming, user > applications might consist of multiple jobs and take long to finish. > Currently, when Flink runs applications with multiple jobs, the application > will run in a local process which is responsible for submitting the jobs. > That local process will not exit until the whole application has finished. > Users have to keep eyes on the local process in case it is killed due to > connection lost, session timeout, local operating system problem, etc. > > To solve the problem, we would like to introduce the Flink Driver. Users > can submit applications using driver mode. A Flink driver job will be > submitted to take care of the job submissions in the user application. > > For more details about flink driver, please refer to the doc: > > https://docs.google.com/document/d/1dJnDOgRae9FzoBzXcsGoutGB1YjTi1iqG6d1Ht61EnY > > Any comments and suggestions will be highly appreciated. > > Best Regards, > Shuiqiang > |
Hi Till,
Thanks for your insightful and valuable comments! The introduction of drive dispatcher is the functionality extension of current existing dispatchers, and has somewhat change to the runtime. It is mainly to manage the ephemeral session cluster lifecycle when user detachedly submits an application which might consist of one or more jobs without providing a cluster id. In terms of failover, currently, we will directly transit the whole application into a globally terminal state if the driver job faults. As @[hidden email] <[hidden email]> mentioned in the doc, there are two possible options to do the failover recovery. One is relying on the state, the other is to use event logs. Unfortunately, neither of them are powerful enough to support driver failover at this point. So we decided to postpone this to future works. Best, Shuiqiang Till Rohrmann <[hidden email]> 于2019年4月23日周二 下午8:03写道: > Thanks for proposing this design document Shuiqiang. It is a very > interesting idea how to solve the problem of running multiple Flink jobs as > part of a single application. I like the idea since it does not require > many runtime changes apart from a session concept on the Dispatcher and it > would work in all environments. > > One thing which is not fully clear to me is the failover behavior in case > of driver job faults. Would it be possible to recover the driver job or > would a driver job fault directly transition the application into a > globally terminal state? Apart from that, I left some minor comments in the > design document. > > Cheers, > Till > > On Tue, Apr 23, 2019 at 10:04 AM Shuiqiang Chen <[hidden email]> > wrote: > > > Hi All, > > > > We would like to start a discussion thread about a new feature called > Flink > > Driver. A brief summary is following. > > > > As mentioned in the discussion of Interactive Programming, user > > applications might consist of multiple jobs and take long to finish. > > Currently, when Flink runs applications with multiple jobs, the > application > > will run in a local process which is responsible for submitting the jobs. > > That local process will not exit until the whole application has > finished. > > Users have to keep eyes on the local process in case it is killed due to > > connection lost, session timeout, local operating system problem, etc. > > > > To solve the problem, we would like to introduce the Flink Driver. Users > > can submit applications using driver mode. A Flink driver job will be > > submitted to take care of the job submissions in the user application. > > > > For more details about flink driver, please refer to the doc: > > > > > https://docs.google.com/document/d/1dJnDOgRae9FzoBzXcsGoutGB1YjTi1iqG6d1Ht61EnY > > > > Any comments and suggestions will be highly appreciated. > > > > Best Regards, > > Shuiqiang > > > |
Free forum by Nabble | Edit this page |