Hi Folks,
When I read the flink client api code, the concept of session is a little vague and unclear to me. It looks like the session concept is only applied in batch mode (I only see it in ExecutionEnvironment but not in StreamExecutionEnvironment). But for local mode (LocalExecutionEnvironment), starting one new session is starting one new MiniCluster, but in remote mode (RemoteExecutionEnvironment), starting one new session is just starting one new ClusterClient instead of one new cluster. So I am confused what does flink session really mean. Could anyone help me understand this ? Thanks. -- Best Regards Jeff Zhang |
Hi Jeff,
the session functionality which you find in Flink's client are the remnants of an uncompleted feature which was abandoned. The idea was that one could submit multiple parts of a job to the same cluster where these parts are added to the same ExecutionGraph. That way we wanted to allow to reuse computed results when using a notebook for ad-hoc queries, for example. But as I said, this feature has never been completed. Cheers, Till On Sun, Jun 2, 2019 at 3:20 PM Jeff Zhang <[hidden email]> wrote: > > Hi Folks, > > > When I read the flink client api code, the concept of session is a little > vague and unclear to me. It looks like the session concept is only applied > in batch mode (I only see it in ExecutionEnvironment but not in > StreamExecutionEnvironment). But for local mode > (LocalExecutionEnvironment), starting one new session is starting one new > MiniCluster, but in remote mode (RemoteExecutionEnvironment), starting one > new session is just starting one new ClusterClient instead of one new > cluster. So I am confused what does flink session really mean. Could anyone > help me understand this ? Thanks. > > > > > -- > Best Regards > > Jeff Zhang > |
Thanks for the reply, @Till Rohrmann <[hidden email]>. Regarding
reuse computed results. I think JM keep all the metadata of intermediate data, and interactive programming is also trying to reuse computed results. It looks like it may not be necessary to introduce the session concept as long as we can achieve reusing computed results. Let me if I understand it correctly. Till Rohrmann <[hidden email]> 于2019年6月4日周二 下午4:03写道: > Hi Jeff, > > the session functionality which you find in Flink's client are the > remnants of an uncompleted feature which was abandoned. The idea was that > one could submit multiple parts of a job to the same cluster where these > parts are added to the same ExecutionGraph. That way we wanted to allow to > reuse computed results when using a notebook for ad-hoc queries, for > example. But as I said, this feature has never been completed. > > Cheers, > Till > > On Sun, Jun 2, 2019 at 3:20 PM Jeff Zhang <[hidden email]> wrote: > >> >> Hi Folks, >> >> >> When I read the flink client api code, the concept of session is a little >> vague and unclear to me. It looks like the session concept is only applied >> in batch mode (I only see it in ExecutionEnvironment but not in >> StreamExecutionEnvironment). But for local mode >> (LocalExecutionEnvironment), starting one new session is starting one new >> MiniCluster, but in remote mode (RemoteExecutionEnvironment), starting one >> new session is just starting one new ClusterClient instead of one new >> cluster. So I am confused what does flink session really mean. Could anyone >> help me understand this ? Thanks. >> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > -- Best Regards Jeff Zhang |
Yes, interactive programming solves the problem by storing the meta
information on the client whereas in the past we thought whether to keep the information on the JM. But this would then not allow to share results between different clusters. Thus, the interactive programming approach is a bit more general, I think. Cheers, Till On Tue, Jun 4, 2019 at 11:13 AM Jeff Zhang <[hidden email]> wrote: > Thanks for the reply, @Till Rohrmann <[hidden email]>. Regarding > reuse computed results. I think JM keep all the metadata of intermediate > data, and interactive programming is also trying to reuse computed results. > It looks like it may not be necessary to introduce the session concept as > long as we can achieve reusing computed results. Let me if I understand it > correctly. > > > > Till Rohrmann <[hidden email]> 于2019年6月4日周二 下午4:03写道: > >> Hi Jeff, >> >> the session functionality which you find in Flink's client are the >> remnants of an uncompleted feature which was abandoned. The idea was that >> one could submit multiple parts of a job to the same cluster where these >> parts are added to the same ExecutionGraph. That way we wanted to allow to >> reuse computed results when using a notebook for ad-hoc queries, for >> example. But as I said, this feature has never been completed. >> >> Cheers, >> Till >> >> On Sun, Jun 2, 2019 at 3:20 PM Jeff Zhang <[hidden email]> wrote: >> >>> >>> Hi Folks, >>> >>> >>> When I read the flink client api code, the concept of session is a >>> little vague and unclear to me. It looks like the session concept is only >>> applied in batch mode (I only see it in ExecutionEnvironment but not in >>> StreamExecutionEnvironment). But for local mode >>> (LocalExecutionEnvironment), starting one new session is starting one new >>> MiniCluster, but in remote mode (RemoteExecutionEnvironment), starting one >>> new session is just starting one new ClusterClient instead of one new >>> cluster. So I am confused what does flink session really mean. Could anyone >>> help me understand this ? Thanks. >>> >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> > > -- > Best Regards > > Jeff Zhang > |
Free forum by Nabble | Edit this page |