Hi everyone!
What do you think about making the streaming execution mode of the system explicit? That means that people start a Flink cluster explicitly in Batch mode or in Streaming mode. The rational behind this idea is that I am not sure how batch and streaming clusters are really shared in a meaningful way, since streaming programs basically run forever. There are also further differences: - Memory Management: Streaming jobs do not use the managed memory currently (see [1] and [2]) - Are streaming jobs inherently single user? Initially, I would say yes, because you need to know that you provisioned enough compute power to keep up with your ingestion rate and that not some other job starts eating shared resources from you (network / disk) - High Availability will probably look a bit different for a streaming master and a batch master Once we figured the co-existence between streaming and batch in the same cluster out better, we can remove this separation. This does not affect any user programs, only the "ops" of the cluster. Greetings, Stephan [1] https://issues.apache.org/jira/browse/FLINK-1368 [2] https://issues.apache.org/jira/browse/FLINK-1323 |
sounds like a good idea to me.
+1 2015-02-17 11:28 GMT+01:00 Stephan Ewen <[hidden email]>: > Hi everyone! > > What do you think about making the streaming execution mode of the system > explicit? That means that people start a Flink cluster explicitly in Batch > mode or in Streaming mode. > > The rational behind this idea is that I am not sure how batch and streaming > clusters are really shared in a meaningful way, since streaming programs > basically run forever. There are also further differences: > > - Memory Management: Streaming jobs do not use the managed memory > currently (see [1] and [2]) > > - Are streaming jobs inherently single user? Initially, I would say yes, > because you need to know that you provisioned enough compute power to keep > up with your ingestion rate and that not some other job starts eating > shared resources from you (network / disk) > > - High Availability will probably look a bit different for a streaming > master and a batch master > > Once we figured the co-existence between streaming and batch in the same > cluster out better, we can remove this separation. This does not affect any > user programs, only the "ops" of the cluster. > > Greetings, > Stephan > > > [1] https://issues.apache.org/jira/browse/FLINK-1368 > [2] https://issues.apache.org/jira/browse/FLINK-1323 > |
I think this separation reflects the way that Flink is used currently
anyways. I would be in favor of it as well. - What about the ongoing efforts (I think by Gyula) to combine both the batch and stream processing APIs? I assume that this would only effect the performance and wouldn't pose a fundamental problem there, would it? |
In reply to this post by Fabian Hueske-2
+1
Let's do this soon to avoid performance issues for streaming. On Tue, Feb 17, 2015 at 11:39 AM, Fabian Hueske <[hidden email]> wrote: > sounds like a good idea to me. > +1 > > 2015-02-17 11:28 GMT+01:00 Stephan Ewen <[hidden email]>: > > > Hi everyone! > > > > What do you think about making the streaming execution mode of the system > > explicit? That means that people start a Flink cluster explicitly in > Batch > > mode or in Streaming mode. > > > > The rational behind this idea is that I am not sure how batch and > streaming > > clusters are really shared in a meaningful way, since streaming programs > > basically run forever. There are also further differences: > > > > - Memory Management: Streaming jobs do not use the managed memory > > currently (see [1] and [2]) > > > > - Are streaming jobs inherently single user? Initially, I would say > yes, > > because you need to know that you provisioned enough compute power to > keep > > up with your ingestion rate and that not some other job starts eating > > shared resources from you (network / disk) > > > > - High Availability will probably look a bit different for a streaming > > master and a batch master > > > > Once we figured the co-existence between streaming and batch in the same > > cluster out better, we can remove this separation. This does not affect > any > > user programs, only the "ops" of the cluster. > > > > Greetings, > > Stephan > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-1368 > > [2] https://issues.apache.org/jira/browse/FLINK-1323 > > > |
In reply to this post by Ufuk Celebi-2
So the current setup is to share results between the two apis by files. So
I dont see any reason why this couldnt work with the 2 cluster setup. It makes deployment a little trickier but still feasible. On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote: > I think this separation reflects the way that Flink is used currently > anyways. I would be in favor of it as well. > > - What about the ongoing efforts (I think by Gyula) to combine both the > batch and stream processing APIs? I assume that this would only effect the > performance and wouldn't pose a fundamental problem there, would it? > |
In reply to this post by Ufuk Celebi-2
When it comes to the current use cases I'm for this separation.
@Ufuk: As Gyula has already pointed out with the current design of integration it should not be a problem. Even if we submitted programs to the wrong clusters it would only cause performance issues. Eventually it would be nice to have an integrated cluster. On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote: > I think this separation reflects the way that Flink is used currently > anyways. I would be in favor of it as well. > > - What about the ongoing efforts (I think by Gyula) to combine both the > batch and stream processing APIs? I assume that this would only effect the > performance and wouldn't pose a fundamental problem there, would it? > |
+1
On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]> wrote: > When it comes to the current use cases I'm for this separation. > @Ufuk: As Gyula has already pointed out with the current design of > integration it should not be a problem. Even if we submitted programs to > the wrong clusters it would only cause performance issues. > > Eventually it would be nice to have an integrated cluster. > > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote: > > > I think this separation reflects the way that Flink is used currently > > anyways. I would be in favor of it as well. > > > > - What about the ongoing efforts (I think by Gyula) to combine both the > > batch and stream processing APIs? I assume that this would only effect > the > > performance and wouldn't pose a fundamental problem there, would it? > > > |
+1
On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote: > +1 > > On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]> > wrote: > > > When it comes to the current use cases I'm for this separation. > > @Ufuk: As Gyula has already pointed out with the current design of > > integration it should not be a problem. Even if we submitted programs to > > the wrong clusters it would only cause performance issues. > > > > Eventually it would be nice to have an integrated cluster. > > > > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote: > > > > > I think this separation reflects the way that Flink is used currently > > > anyways. I would be in favor of it as well. > > > > > > - What about the ongoing efforts (I think by Gyula) to combine both the > > > batch and stream processing APIs? I assume that this would only effect > > the > > > performance and wouldn't pose a fundamental problem there, would it? > > > > > > |
+1
On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]> wrote: > +1 > > On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote: > >> +1 >> >> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]> >> wrote: >> >> > When it comes to the current use cases I'm for this separation. >> > @Ufuk: As Gyula has already pointed out with the current design of >> > integration it should not be a problem. Even if we submitted programs to >> > the wrong clusters it would only cause performance issues. >> > >> > Eventually it would be nice to have an integrated cluster. >> > >> > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote: >> > >> > > I think this separation reflects the way that Flink is used currently >> > > anyways. I would be in favor of it as well. >> > > >> > > - What about the ongoing efforts (I think by Gyula) to combine both the >> > > batch and stream processing APIs? I assume that this would only effect >> > the >> > > performance and wouldn't pose a fundamental problem there, would it? >> > > >> > >> |
+1
On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email]> wrote: > +1 > > On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]> wrote: >> +1 >> >> On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote: >> >>> +1 >>> >>> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]> >>> wrote: >>> >>> > When it comes to the current use cases I'm for this separation. >>> > @Ufuk: As Gyula has already pointed out with the current design of >>> > integration it should not be a problem. Even if we submitted programs to >>> > the wrong clusters it would only cause performance issues. >>> > >>> > Eventually it would be nice to have an integrated cluster. >>> > >>> > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote: >>> > >>> > > I think this separation reflects the way that Flink is used currently >>> > > anyways. I would be in favor of it as well. >>> > > >>> > > - What about the ongoing efforts (I think by Gyula) to combine both the >>> > > batch and stream processing APIs? I assume that this would only effect >>> > the >>> > > performance and wouldn't pose a fundamental problem there, would it? >>> > > >>> > >>> |
+1
I agree it’s a proper way to go. On 18 Feb 2015, at 10:41, Max Michels <[hidden email]<mailto:[hidden email]>> wrote: +1 On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email]<mailto:[hidden email]>> wrote: +1 On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]<mailto:[hidden email]>> wrote: +1 On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]<mailto:[hidden email]>> wrote: +1 On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]<mailto:[hidden email]>> wrote: When it comes to the current use cases I'm for this separation. @Ufuk: As Gyula has already pointed out with the current design of integration it should not be a problem. Even if we submitted programs to the wrong clusters it would only cause performance issues. Eventually it would be nice to have an integrated cluster. On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]<mailto:[hidden email]>> wrote: I think this separation reflects the way that Flink is used currently anyways. I would be in favor of it as well. - What about the ongoing efforts (I think by Gyula) to combine both the batch and stream processing APIs? I assume that this would only effect the performance and wouldn't pose a fundamental problem there, would it? |
Today we had a discussion with Robert on this issue. I would like to
eventually have the streaming grouped and the windowing buffers/state maybe along with the crucial state of the user in the managed memory. If we had this separating the two modes could became less important as streaming would also use this space. I do not propose to change the above decision for the current needs, this is just a heads up. On Wed, Feb 18, 2015 at 1:11 PM, Paris Carbone <[hidden email]> wrote: > +1 > > I agree it’s a proper way to go. > > On 18 Feb 2015, at 10:41, Max Michels <[hidden email]<mailto: > [hidden email]>> wrote: > > +1 > > On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email] > <mailto:[hidden email]>> wrote: > +1 > > On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email] > <mailto:[hidden email]>> wrote: > +1 > > On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email] > <mailto:[hidden email]>> wrote: > > +1 > > On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email] > <mailto:[hidden email]>> > wrote: > > When it comes to the current use cases I'm for this separation. > @Ufuk: As Gyula has already pointed out with the current design of > integration it should not be a problem. Even if we submitted programs to > the wrong clusters it would only cause performance issues. > > Eventually it would be nice to have an integrated cluster. > > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]<mailto: > [hidden email]>> wrote: > > I think this separation reflects the way that Flink is used currently > anyways. I would be in favor of it as well. > > - What about the ongoing efforts (I think by Gyula) to combine both the > batch and stream processing APIs? I assume that this would only effect > the > performance and wouldn't pose a fundamental problem there, would it? > > > > > |
Free forum by Nabble | Edit this page |