(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Dedicated streaming mode and start scripts

Classic

List

Threaded

12 messages Options

Stephan Ewen

[DISCUSS] Dedicated streaming mode and start scripts

Hi everyone!

What do you think about making the streaming execution mode of the system
explicit? That means that people start a Flink cluster explicitly in Batch
mode or in Streaming mode.

The rational behind this idea is that I am not sure how batch and streaming
clusters are really shared in a meaningful way, since streaming programs
basically run forever. There are also further differences:

- Memory Management: Streaming jobs do not use the managed memory
currently (see [1] and [2])

- Are streaming jobs inherently single user? Initially, I would say yes,
because you need to know that you provisioned enough compute power to keep
up with your ingestion rate and that not some other job starts eating
shared resources from you (network / disk)

- High Availability will probably look a bit different for a streaming
master and a batch master

Once we figured the co-existence between streaming and batch in the same
cluster out better, we can remove this separation. This does not affect any
user programs, only the "ops" of the cluster.

Greetings,
Stephan

[1] https://issues.apache.org/jira/browse/FLINK-1368
[2] https://issues.apache.org/jira/browse/FLINK-1323

Fabian Hueske-2

Re: [DISCUSS] Dedicated streaming mode and start scripts

sounds like a good idea to me.
+1

2015-02-17 11:28 GMT+01:00 Stephan Ewen <[hidden email]>:

> Hi everyone!
>
> What do you think about making the streaming execution mode of the system
> explicit? That means that people start a Flink cluster explicitly in Batch
> mode or in Streaming mode.
>
> The rational behind this idea is that I am not sure how batch and streaming
> clusters are really shared in a meaningful way, since streaming programs
> basically run forever. There are also further differences:
>
> - Memory Management: Streaming jobs do not use the managed memory
> currently (see [1] and [2])
>
> - Are streaming jobs inherently single user? Initially, I would say yes,
> because you need to know that you provisioned enough compute power to keep
> up with your ingestion rate and that not some other job starts eating
> shared resources from you (network / disk)
>
> - High Availability will probably look a bit different for a streaming
> master and a batch master
>
> Once we figured the co-existence between streaming and batch in the same
> cluster out better, we can remove this separation. This does not affect any
> user programs, only the "ops" of the cluster.
>
> Greetings,
> Stephan
>
>
> [1] https://issues.apache.org/jira/browse/FLINK-1368
> [2] https://issues.apache.org/jira/browse/FLINK-1323
>

Ufuk Celebi-2

Re: [DISCUSS] Dedicated streaming mode and start scripts

I think this separation reflects the way that Flink is used currently
anyways. I would be in favor of it as well.

- What about the ongoing efforts (I think by Gyula) to combine both the
batch and stream processing APIs? I assume that this would only effect the
performance and wouldn't pose a fundamental problem there, would it?

Gyula Fóra-2

Re: [DISCUSS] Dedicated streaming mode and start scripts

In reply to this post by Fabian Hueske-2

+1
Let's do this soon to avoid performance issues for streaming.

On Tue, Feb 17, 2015 at 11:39 AM, Fabian Hueske <[hidden email]> wrote:

> sounds like a good idea to me.
> +1
>
> 2015-02-17 11:28 GMT+01:00 Stephan Ewen <[hidden email]>:
>
> > Hi everyone!
> >
> > What do you think about making the streaming execution mode of the system
> > explicit? That means that people start a Flink cluster explicitly in
> Batch
> > mode or in Streaming mode.
> >
> > The rational behind this idea is that I am not sure how batch and
> streaming
> > clusters are really shared in a meaningful way, since streaming programs
> > basically run forever. There are also further differences:
> >
> > - Memory Management: Streaming jobs do not use the managed memory
> > currently (see [1] and [2])
> >
> > - Are streaming jobs inherently single user? Initially, I would say
> yes,
> > because you need to know that you provisioned enough compute power to
> keep
> > up with your ingestion rate and that not some other job starts eating
> > shared resources from you (network / disk)
> >
> > - High Availability will probably look a bit different for a streaming
> > master and a batch master
> >
> > Once we figured the co-existence between streaming and batch in the same
> > cluster out better, we can remove this separation. This does not affect
> any
> > user programs, only the "ops" of the cluster.
> >
> > Greetings,
> > Stephan
> >
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-1368
> > [2] https://issues.apache.org/jira/browse/FLINK-1323
> >
>

Gyula Fóra-2

Re: [DISCUSS] Dedicated streaming mode and start scripts

In reply to this post by Ufuk Celebi-2

So the current setup is to share results between the two apis by files. So
I dont see any reason why this couldnt work with the 2 cluster setup. It
makes deployment a little trickier but still feasible.

On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:

> I think this separation reflects the way that Flink is used currently
> anyways. I would be in favor of it as well.
>
> - What about the ongoing efforts (I think by Gyula) to combine both the
> batch and stream processing APIs? I assume that this would only effect the
> performance and wouldn't pose a fundamental problem there, would it?
>

Márton Balassi-3

Re: [DISCUSS] Dedicated streaming mode and start scripts

In reply to this post by Ufuk Celebi-2

When it comes to the current use cases I'm for this separation.
@Ufuk: As Gyula has already pointed out with the current design of
integration it should not be a problem. Even if we submitted programs to
the wrong clusters it would only cause performance issues.

Eventually it would be nice to have an integrated cluster.

On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:

> I think this separation reflects the way that Flink is used currently
> anyways. I would be in favor of it as well.
>
> - What about the ongoing efforts (I think by Gyula) to combine both the
> batch and stream processing APIs? I assume that this would only effect the
> performance and wouldn't pose a fundamental problem there, would it?
>

Kostas Tzoumas-2

Re: [DISCUSS] Dedicated streaming mode and start scripts

+1

On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]>
wrote:

> When it comes to the current use cases I'm for this separation.
> @Ufuk: As Gyula has already pointed out with the current design of
> integration it should not be a problem. Even if we submitted programs to
> the wrong clusters it would only cause performance issues.
>
> Eventually it would be nice to have an integrated cluster.
>
> On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:
>
> > I think this separation reflects the way that Flink is used currently
> > anyways. I would be in favor of it as well.
> >
> > - What about the ongoing efforts (I think by Gyula) to combine both the
> > batch and stream processing APIs? I assume that this would only effect
> the
> > performance and wouldn't pose a fundamental problem there, would it?
> >
>

Till Rohrmann

Re: [DISCUSS] Dedicated streaming mode and start scripts

+1

On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote:

> +1
>
> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]>
> wrote:
>
> > When it comes to the current use cases I'm for this separation.
> > @Ufuk: As Gyula has already pointed out with the current design of
> > integration it should not be a problem. Even if we submitted programs to
> > the wrong clusters it would only cause performance issues.
> >
> > Eventually it would be nice to have an integrated cluster.
> >
> > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:
> >
> > > I think this separation reflects the way that Flink is used currently
> > > anyways. I would be in favor of it as well.
> > >
> > > - What about the ongoing efforts (I think by Gyula) to combine both the
> > > batch and stream processing APIs? I assume that this would only effect
> > the
> > > performance and wouldn't pose a fundamental problem there, would it?
> > >
> >
>

Aljoscha Krettek-2

Re: [DISCUSS] Dedicated streaming mode and start scripts

+1

On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]> wrote:

> +1
>
> On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote:
>
>> +1
>>
>> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]>
>> wrote:
>>
>> > When it comes to the current use cases I'm for this separation.
>> > @Ufuk: As Gyula has already pointed out with the current design of
>> > integration it should not be a problem. Even if we submitted programs to
>> > the wrong clusters it would only cause performance issues.
>> >
>> > Eventually it would be nice to have an integrated cluster.
>> >
>> > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:
>> >
>> > > I think this separation reflects the way that Flink is used currently
>> > > anyways. I would be in favor of it as well.
>> > >
>> > > - What about the ongoing efforts (I think by Gyula) to combine both the
>> > > batch and stream processing APIs? I assume that this would only effect
>> > the
>> > > performance and wouldn't pose a fundamental problem there, would it?
>> > >
>> >
>>

mxm

Re: [DISCUSS] Dedicated streaming mode and start scripts

+1

On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email]> wrote:

> +1
>
> On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]> wrote:
>> +1
>>
>> On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]> wrote:
>>
>>> +1
>>>
>>> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]>
>>> wrote:
>>>
>>> > When it comes to the current use cases I'm for this separation.
>>> > @Ufuk: As Gyula has already pointed out with the current design of
>>> > integration it should not be a problem. Even if we submitted programs to
>>> > the wrong clusters it would only cause performance issues.
>>> >
>>> > Eventually it would be nice to have an integrated cluster.
>>> >
>>> > On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]> wrote:
>>> >
>>> > > I think this separation reflects the way that Flink is used currently
>>> > > anyways. I would be in favor of it as well.
>>> > >
>>> > > - What about the ongoing efforts (I think by Gyula) to combine both the
>>> > > batch and stream processing APIs? I assume that this would only effect
>>> > the
>>> > > performance and wouldn't pose a fundamental problem there, would it?
>>> > >
>>> >
>>>

Paris Carbone

Re: [DISCUSS] Dedicated streaming mode and start scripts

+1

I agree it’s a proper way to go.

On 18 Feb 2015, at 10:41, Max Michels <[hidden email]<mailto:[hidden email]>> wrote:

+1

On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email]<mailto:[hidden email]>> wrote:
+1

On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]<mailto:[hidden email]>> wrote:
+1

On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]<mailto:[hidden email]>> wrote:

+1

On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]<mailto:[hidden email]>>
wrote:

When it comes to the current use cases I'm for this separation.
@Ufuk: As Gyula has already pointed out with the current design of
integration it should not be a problem. Even if we submitted programs to
the wrong clusters it would only cause performance issues.

Eventually it would be nice to have an integrated cluster.

On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]<mailto:[hidden email]>> wrote:

I think this separation reflects the way that Flink is used currently
anyways. I would be in favor of it as well.

- What about the ongoing efforts (I think by Gyula) to combine both the
batch and stream processing APIs? I assume that this would only effect
the
performance and wouldn't pose a fundamental problem there, would it?

Márton Balassi

Re: [DISCUSS] Dedicated streaming mode and start scripts

Today we had a discussion with Robert on this issue. I would like to
eventually have the streaming grouped and the windowing buffers/state maybe
along with the crucial state of the user in the managed memory. If we had
this separating the two modes could became less important as streaming
would also use this space.

I do not propose to change the above decision for the current needs, this
is just a heads up.

On Wed, Feb 18, 2015 at 1:11 PM, Paris Carbone <[hidden email]> wrote:

> +1
>
> I agree it’s a proper way to go.
>
> On 18 Feb 2015, at 10:41, Max Michels <[hidden email]<mailto:
> [hidden email]>> wrote:
>
> +1
>
> On Tue, Feb 17, 2015 at 2:40 PM, Aljoscha Krettek <[hidden email]
> <mailto:[hidden email]>> wrote:
> +1
>
> On Tue, Feb 17, 2015 at 1:34 PM, Till Rohrmann <[hidden email]
> <mailto:[hidden email]>> wrote:
> +1
>
> On Tue, Feb 17, 2015 at 1:34 PM, Kostas Tzoumas <[hidden email]
> <mailto:[hidden email]>> wrote:
>
> +1
>
> On Tue, Feb 17, 2015 at 12:14 PM, Márton Balassi <[hidden email]
> <mailto:[hidden email]>>
> wrote:
>
> When it comes to the current use cases I'm for this separation.
> @Ufuk: As Gyula has already pointed out with the current design of
> integration it should not be a problem. Even if we submitted programs to
> the wrong clusters it would only cause performance issues.
>
> Eventually it would be nice to have an integrated cluster.
>
> On Tue, Feb 17, 2015 at 11:55 AM, Ufuk Celebi <[hidden email]<mailto:
> [hidden email]>> wrote:
>
> I think this separation reflects the way that Flink is used currently
> anyways. I would be in favor of it as well.
>
> - What about the ongoing efforts (I think by Gyula) to combine both the
> batch and stream processing APIs? I assume that this would only effect
> the
> performance and wouldn't pose a fundamental problem there, would it?
>
>
>
>
>