[DISCUSS] Dedicated streaming mode and start scripts

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Dedicated streaming mode and start scripts

Stephan Ewen
Hi everyone!

What do you think about making the streaming execution mode of the system
explicit? That means that people start a Flink cluster explicitly in Batch
mode or in Streaming mode.

The rational behind this idea is that I am not sure how batch and streaming
clusters are really shared in a meaningful way, since streaming programs
basically run forever. There are also further differences:

  - Memory Management: Streaming jobs do not use the managed memory
currently (see [1] and [2])

  - Are streaming jobs inherently single user? Initially, I would say yes,
because you need to know that you provisioned enough compute power to keep
up with your ingestion rate and that not some other job starts eating
shared resources from you (network / disk)

  - High Availability will probably look a bit different for a streaming
master and a batch master

Once we figured the co-existence between streaming and batch in the same
cluster out better, we can remove this separation. This does not affect any
user programs, only the "ops" of the cluster.

Greetings,
Stephan


[1] https://issues.apache.org/jira/browse/FLINK-1368
[2] https://issues.apache.org/jira/browse/FLINK-1323
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Fabian Hueske-2
sounds like a good idea to me.
+1

2015-02-17 11:28 GMT+01:00 Stephan Ewen <[hidden email]>:

> Hi everyone!
>
> What do you think about making the streaming execution mode of the system
> explicit? That means that people start a Flink cluster explicitly in Batch
> mode or in Streaming mode.
>
> The rational behind this idea is that I am not sure how batch and streaming
> clusters are really shared in a meaningful way, since streaming programs
> basically run forever. There are also further differences:
>
>   - Memory Management: Streaming jobs do not use the managed memory
> currently (see [1] and [2])
>
>   - Are streaming jobs inherently single user? Initially, I would say yes,
> because you need to know that you provisioned enough compute power to keep
> up with your ingestion rate and that not some other job starts eating
> shared resources from you (network / disk)
>
>   - High Availability will probably look a bit different for a streaming
> master and a batch master
>
> Once we figured the co-existence between streaming and batch in the same
> cluster out better, we can remove this separation. This does not affect any
> user programs, only the "ops" of the cluster.
>
> Greetings,
> Stephan
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-1368
> [2] https://issues.apache.org/jira/browse/FLINK-1323
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Ufuk Celebi-2
I think this separation reflects the way that Flink is used currently
anyways. I would be in favor of it as well.

- What about the ongoing efforts (I think by Gyula) to combine both the
batch and stream processing APIs? I assume that this would only effect the
performance and wouldn't pose a fundamental problem there, would it?
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Gyula Fóra-2
In reply to this post by Fabian Hueske-2
+1
Let's do this soon to avoid performance issues for streaming.

On Tue, Feb 17, 2015 at 11:39 AM, Fabian Hueske <[hidden email]> wrote:

> sounds like a good idea to me.
> +1
>
> 2015-02-17 11:28 GMT+01:00 Stephan Ewen <[hidden email]>:
>
> > Hi everyone!
> >
> > What do you think about making the streaming execution mode of the system
> > explicit? That means that people start a Flink cluster explicitly in
> Batch
> > mode or in Streaming mode.
> >
> > The rational behind this idea is that I am not sure how batch and
> streaming
> > clusters are really shared in a meaningful way, since streaming programs
> > basically run forever. There are also further differences:
> >
> >   - Memory Management: Streaming jobs do not use the managed memory
> > currently (see [1] and [2])
> >
> >   - Are streaming jobs inherently single user? Initially, I would say
> yes,
> > because you need to know that you provisioned enough compute power to
> keep
> > up with your ingestion rate and that not some other job starts eating
> > shared resources from you (network / disk)
> >
> >   - High Availability will probably look a bit different for a streaming
> > master and a batch master
> >
> > Once we figured the co-existence between streaming and batch in the same
> > cluster out better, we can remove this separation. This does not affect
> any
> > user programs, only the "ops" of the cluster.
> >
> > Greetings,
> > Stephan
> >
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-1368
> > [2] https://issues.apache.org/jira/browse/FLINK-1323
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Gyula Fóra-2
In reply to this post by Ufuk Celebi-2
So the current setup is to share results between the two apis by files. So
I dont see any reason why this couldnt work with the 2 cluster setup. It
makes deployment a little trickier but still feasible.

On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:

> I think this separation reflects the way that Flink is used currently
> anyways. I would be in favor of it as well.
>
> - What about the ongoing efforts (I think by Gyula) to combine both the
> batch and stream processing APIs? I assume that this would only effect the
> performance and wouldn't pose a fundamental problem there, would it?
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Márton Balassi-3
In reply to this post by Ufuk Celebi-2
When it comes to the current use cases I'm for this separation.
@Ufuk: As Gyula has already pointed out with the current design of
integration it should not be a problem. Even if we submitted programs to
the wrong clusters it would only cause performance issues.

Eventually it would be nice to have an integrated cluster.

On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:

> I think this separation reflects the way that Flink is used currently
> anyways. I would be in favor of it as well.
>
> - What about the ongoing efforts (I think by Gyula) to combine both the
> batch and stream processing APIs? I assume that this would only effect the
> performance and wouldn't pose a fundamental problem there, would it?
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Kostas Tzoumas-2
+1

On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]>
wrote:

> When it comes to the current use cases I'm for this separation.
> @Ufuk: As Gyula has already pointed out with the current design of
> integration it should not be a problem. Even if we submitted programs to
> the wrong clusters it would only cause performance issues.
>
> Eventually it would be nice to have an integrated cluster.
>
> On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:
>
> > I think this separation reflects the way that Flink is used currently
> > anyways. I would be in favor of it as well.
> >
> > - What about the ongoing efforts (I think by Gyula) to combine both the
> > batch and stream processing APIs? I assume that this would only effect
> the
> > performance and wouldn't pose a fundamental problem there, would it?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Till Rohrmann
+1

On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote:

> +1
>
> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]>
> wrote:
>
> > When it comes to the current use cases I'm for this separation.
> > @Ufuk: As Gyula has already pointed out with the current design of
> > integration it should not be a problem. Even if we submitted programs to
> > the wrong clusters it would only cause performance issues.
> >
> > Eventually it would be nice to have an integrated cluster.
> >
> > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:
> >
> > > I think this separation reflects the way that Flink is used currently
> > > anyways. I would be in favor of it as well.
> > >
> > > - What about the ongoing efforts (I think by Gyula) to combine both the
> > > batch and stream processing APIs? I assume that this would only effect
> > the
> > > performance and wouldn't pose a fundamental problem there, would it?
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Aljoscha Krettek-2
+1

On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]> wrote:

> +1
>
> On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote:
>
>> +1
>>
>> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]>
>> wrote:
>>
>> > When it comes to the current use cases I'm for this separation.
>> > @Ufuk: As Gyula has already pointed out with the current design of
>> > integration it should not be a problem. Even if we submitted programs to
>> > the wrong clusters it would only cause performance issues.
>> >
>> > Eventually it would be nice to have an integrated cluster.
>> >
>> > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:
>> >
>> > > I think this separation reflects the way that Flink is used currently
>> > > anyways. I would be in favor of it as well.
>> > >
>> > > - What about the ongoing efforts (I think by Gyula) to combine both the
>> > > batch and stream processing APIs? I assume that this would only effect
>> > the
>> > > performance and wouldn't pose a fundamental problem there, would it?
>> > >
>> >
>>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

mxm
+1

On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email]> wrote:

> +1
>
> On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]> wrote:
>> +1
>>
>> On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote:
>>
>>> +1
>>>
>>> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]>
>>> wrote:
>>>
>>> > When it comes to the current use cases I'm for this separation.
>>> > @Ufuk: As Gyula has already pointed out with the current design of
>>> > integration it should not be a problem. Even if we submitted programs to
>>> > the wrong clusters it would only cause performance issues.
>>> >
>>> > Eventually it would be nice to have an integrated cluster.
>>> >
>>> > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:
>>> >
>>> > > I think this separation reflects the way that Flink is used currently
>>> > > anyways. I would be in favor of it as well.
>>> > >
>>> > > - What about the ongoing efforts (I think by Gyula) to combine both the
>>> > > batch and stream processing APIs? I assume that this would only effect
>>> > the
>>> > > performance and wouldn't pose a fundamental problem there, would it?
>>> > >
>>> >
>>>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Paris Carbone
+1

I agree it’s a proper way to go.

On 18 Feb 2015, at 10:41, Max Michels <[hidden email]<mailto:[hidden email]>> wrote:

+1

On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email]<mailto:[hidden email]>> wrote:
+1

On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]<mailto:[hidden email]>> wrote:
+1

On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]<mailto:[hidden email]>> wrote:

+1

On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]<mailto:[hidden email]>>
wrote:

When it comes to the current use cases I'm for this separation.
@Ufuk: As Gyula has already pointed out with the current design of
integration it should not be a problem. Even if we submitted programs to
the wrong clusters it would only cause performance issues.

Eventually it would be nice to have an integrated cluster.

On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]<mailto:[hidden email]>> wrote:

I think this separation reflects the way that Flink is used currently
anyways. I would be in favor of it as well.

- What about the ongoing efforts (I think by Gyula) to combine both the
batch and stream processing APIs? I assume that this would only effect
the
performance and wouldn't pose a fundamental problem there, would it?




Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Dedicated streaming mode and start scripts

Márton Balassi
Today we had a discussion with Robert on this issue. I would like to
eventually have the streaming grouped and the windowing buffers/state maybe
along with the crucial state of the user in the managed memory. If we had
this separating the two modes could became less important as streaming
would also use this space.

I do not propose to change the above decision for the current needs, this
is just a heads up.

On Wed, Feb 18, 2015 at 1:11 PM, Paris Carbone <[hidden email]> wrote:

> +1
>
> I agree it’s a proper way to go.
>
> On 18 Feb 2015, at 10:41, Max Michels <[hidden email]<mailto:
> [hidden email]>> wrote:
>
> +1
>
> On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email]
> <mailto:[hidden email]>> wrote:
> +1
>
> On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]
> <mailto:[hidden email]>> wrote:
> +1
>
> On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> +1
>
> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]
> <mailto:[hidden email]>>
> wrote:
>
> When it comes to the current use cases I'm for this separation.
> @Ufuk: As Gyula has already pointed out with the current design of
> integration it should not be a problem. Even if we submitted programs to
> the wrong clusters it would only cause performance issues.
>
> Eventually it would be nice to have an integrated cluster.
>
> On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]<mailto:
> [hidden email]>> wrote:
>
> I think this separation reflects the way that Flink is used currently
> anyways. I would be in favor of it as well.
>
> - What about the ongoing efforts (I think by Gyula) to combine both the
> batch and stream processing APIs? I assume that this would only effect
> the
> performance and wouldn't pose a fundamental problem there, would it?
>
>
>
>
>