Fixing the ExecutionConfig

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Fixing the ExecutionConfig

Stephan Ewen
Hi all!

The ExecutionConfig is a bit of a strange thing right now. It looks like it
became the place where everyone just put the stuff they want to somehow
push from the client to runtime, plus a random assortment of conflig flags.

As a result:

  - The ExecutionConfig is available in batch and streaming, but has a
number of fields that are very streaming specific, like the watermark
interval, etc.

  - Several fields that are purely pre-flight time relevant are in there,
like whether to use the closure cleaner, or whether to force Avro or Kryo
serializers for POJOs.

Any interest in cleaning this up? Because these messy classes simply grow
ever more messy unless we establish a proper definition of what its
concerns and non-concerns are...

Greetings,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: Fixing the ExecutionConfig

Robert Metzger
I think now (before the 1.0 release) is the right time to clean it up.

Are you suggesting to have two execution configs for batch and streaming?

I'm not sure if we need to distinguish between pre-flight and runtime
options: From a user's perspective, it doesn't matter. For example the
serializer settings are evaluated during pre-flight but they have a impact
during execution.






On Wed, Nov 11, 2015 at 11:59 AM, Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> The ExecutionConfig is a bit of a strange thing right now. It looks like it
> became the place where everyone just put the stuff they want to somehow
> push from the client to runtime, plus a random assortment of conflig flags.
>
> As a result:
>
>   - The ExecutionConfig is available in batch and streaming, but has a
> number of fields that are very streaming specific, like the watermark
> interval, etc.
>
>   - Several fields that are purely pre-flight time relevant are in there,
> like whether to use the closure cleaner, or whether to force Avro or Kryo
> serializers for POJOs.
>
> Any interest in cleaning this up? Because these messy classes simply grow
> ever more messy unless we establish a proper definition of what its
> concerns and non-concerns are...
>
> Greetings,
> Stephan
>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: Fixing the ExecutionConfig

mxm
+1 for separating concerns by having a StreamExecutionConfig and a
BatchExecutionConfig with inheritance from ExecutionConfig for general
options. Not sure about the pre-flight and runtime options. I think
they are ok in one config.

On Wed, Nov 11, 2015 at 1:24 PM, Robert Metzger <[hidden email]> wrote:

> I think now (before the 1.0 release) is the right time to clean it up.
>
> Are you suggesting to have two execution configs for batch and streaming?
>
> I'm not sure if we need to distinguish between pre-flight and runtime
> options: From a user's perspective, it doesn't matter. For example the
> serializer settings are evaluated during pre-flight but they have a impact
> during execution.
>
>
>
>
>
>
> On Wed, Nov 11, 2015 at 11:59 AM, Stephan Ewen <[hidden email]> wrote:
>
>> Hi all!
>>
>> The ExecutionConfig is a bit of a strange thing right now. It looks like it
>> became the place where everyone just put the stuff they want to somehow
>> push from the client to runtime, plus a random assortment of conflig flags.
>>
>> As a result:
>>
>>   - The ExecutionConfig is available in batch and streaming, but has a
>> number of fields that are very streaming specific, like the watermark
>> interval, etc.
>>
>>   - Several fields that are purely pre-flight time relevant are in there,
>> like whether to use the closure cleaner, or whether to force Avro or Kryo
>> serializers for POJOs.
>>
>> Any interest in cleaning this up? Because these messy classes simply grow
>> ever more messy unless we establish a proper definition of what its
>> concerns and non-concerns are...
>>
>> Greetings,
>> Stephan
>>
Reply | Threaded
Open this post in threaded view
|

Re: Fixing the ExecutionConfig

Aljoscha Krettek-2
IMHO it’s not possible to have streaming/batch specific ExecutionConfig since the user functions share a common interface, i.e. getRuntimeContext().getExecutionConfig() simply returns the same type for both.

What could be done is to migrate batch/streaming specific stuff to the ExecutionEnvironment and keep the ExecutionConfig strictly for stuff that applies to both execution modes.

> On 12 Nov 2015, at 11:35, Maximilian Michels <[hidden email]> wrote:
>
> +1 for separating concerns by having a StreamExecutionConfig and a
> BatchExecutionConfig with inheritance from ExecutionConfig for general
> options. Not sure about the pre-flight and runtime options. I think
> they are ok in one config.
>
> On Wed, Nov 11, 2015 at 1:24 PM, Robert Metzger <[hidden email]> wrote:
>> I think now (before the 1.0 release) is the right time to clean it up.
>>
>> Are you suggesting to have two execution configs for batch and streaming?
>>
>> I'm not sure if we need to distinguish between pre-flight and runtime
>> options: From a user's perspective, it doesn't matter. For example the
>> serializer settings are evaluated during pre-flight but they have a impact
>> during execution.
>>
>>
>>
>>
>>
>>
>> On Wed, Nov 11, 2015 at 11:59 AM, Stephan Ewen <[hidden email]> wrote:
>>
>>> Hi all!
>>>
>>> The ExecutionConfig is a bit of a strange thing right now. It looks like it
>>> became the place where everyone just put the stuff they want to somehow
>>> push from the client to runtime, plus a random assortment of conflig flags.
>>>
>>> As a result:
>>>
>>>  - The ExecutionConfig is available in batch and streaming, but has a
>>> number of fields that are very streaming specific, like the watermark
>>> interval, etc.
>>>
>>>  - Several fields that are purely pre-flight time relevant are in there,
>>> like whether to use the closure cleaner, or whether to force Avro or Kryo
>>> serializers for POJOs.
>>>
>>> Any interest in cleaning this up? Because these messy classes simply grow
>>> ever more messy unless we establish a proper definition of what its
>>> concerns and non-concerns are...
>>>
>>> Greetings,
>>> Stephan
>>>

Reply | Threaded
Open this post in threaded view
|

Re: Fixing the ExecutionConfig

Stephan Ewen
I had pretty much in mind what Aljoscha suggested.

On Thu, Nov 12, 2015 at 11:37 AM, Aljoscha Krettek <[hidden email]>
wrote:

> IMHO it’s not possible to have streaming/batch specific ExecutionConfig
> since the user functions share a common interface, i.e.
> getRuntimeContext().getExecutionConfig() simply returns the same type for
> both.
>
> What could be done is to migrate batch/streaming specific stuff to the
> ExecutionEnvironment and keep the ExecutionConfig strictly for stuff that
> applies to both execution modes.
> > On 12 Nov 2015, at 11:35, Maximilian Michels <[hidden email]> wrote:
> >
> > +1 for separating concerns by having a StreamExecutionConfig and a
> > BatchExecutionConfig with inheritance from ExecutionConfig for general
> > options. Not sure about the pre-flight and runtime options. I think
> > they are ok in one config.
> >
> > On Wed, Nov 11, 2015 at 1:24 PM, Robert Metzger <[hidden email]>
> wrote:
> >> I think now (before the 1.0 release) is the right time to clean it up.
> >>
> >> Are you suggesting to have two execution configs for batch and
> streaming?
> >>
> >> I'm not sure if we need to distinguish between pre-flight and runtime
> >> options: From a user's perspective, it doesn't matter. For example the
> >> serializer settings are evaluated during pre-flight but they have a
> impact
> >> during execution.
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Nov 11, 2015 at 11:59 AM, Stephan Ewen <[hidden email]>
> wrote:
> >>
> >>> Hi all!
> >>>
> >>> The ExecutionConfig is a bit of a strange thing right now. It looks
> like it
> >>> became the place where everyone just put the stuff they want to somehow
> >>> push from the client to runtime, plus a random assortment of conflig
> flags.
> >>>
> >>> As a result:
> >>>
> >>>  - The ExecutionConfig is available in batch and streaming, but has a
> >>> number of fields that are very streaming specific, like the watermark
> >>> interval, etc.
> >>>
> >>>  - Several fields that are purely pre-flight time relevant are in
> there,
> >>> like whether to use the closure cleaner, or whether to force Avro or
> Kryo
> >>> serializers for POJOs.
> >>>
> >>> Any interest in cleaning this up? Because these messy classes simply
> grow
> >>> ever more messy unless we establish a proper definition of what its
> >>> concerns and non-concerns are...
> >>>
> >>> Greetings,
> >>> Stephan
> >>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Fixing the ExecutionConfig

Ufuk Celebi-2
I like this idea. +1

> On 18 Nov 2015, at 15:25, Stephan Ewen <[hidden email]> wrote:
>
> I had pretty much in mind what Aljoscha suggested.
>
> On Thu, Nov 12, 2015 at 11:37 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>
>> IMHO it’s not possible to have streaming/batch specific ExecutionConfig
>> since the user functions share a common interface, i.e.
>> getRuntimeContext().getExecutionConfig() simply returns the same type for
>> both.
>>
>> What could be done is to migrate batch/streaming specific stuff to the
>> ExecutionEnvironment and keep the ExecutionConfig strictly for stuff that
>> applies to both execution modes.
>>> On 12 Nov 2015, at 11:35, Maximilian Michels <[hidden email]> wrote:
>>>
>>> +1 for separating concerns by having a StreamExecutionConfig and a
>>> BatchExecutionConfig with inheritance from ExecutionConfig for general
>>> options. Not sure about the pre-flight and runtime options. I think
>>> they are ok in one config.
>>>
>>> On Wed, Nov 11, 2015 at 1:24 PM, Robert Metzger <[hidden email]>
>> wrote:
>>>> I think now (before the 1.0 release) is the right time to clean it up.
>>>>
>>>> Are you suggesting to have two execution configs for batch and
>> streaming?
>>>>
>>>> I'm not sure if we need to distinguish between pre-flight and runtime
>>>> options: From a user's perspective, it doesn't matter. For example the
>>>> serializer settings are evaluated during pre-flight but they have a
>> impact
>>>> during execution.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Nov 11, 2015 at 11:59 AM, Stephan Ewen <[hidden email]>
>> wrote:
>>>>
>>>>> Hi all!
>>>>>
>>>>> The ExecutionConfig is a bit of a strange thing right now. It looks
>> like it
>>>>> became the place where everyone just put the stuff they want to somehow
>>>>> push from the client to runtime, plus a random assortment of conflig
>> flags.
>>>>>
>>>>> As a result:
>>>>>
>>>>> - The ExecutionConfig is available in batch and streaming, but has a
>>>>> number of fields that are very streaming specific, like the watermark
>>>>> interval, etc.
>>>>>
>>>>> - Several fields that are purely pre-flight time relevant are in
>> there,
>>>>> like whether to use the closure cleaner, or whether to force Avro or
>> Kryo
>>>>> serializers for POJOs.
>>>>>
>>>>> Any interest in cleaning this up? Because these messy classes simply
>> grow
>>>>> ever more messy unless we establish a proper definition of what its
>>>>> concerns and non-concerns are...
>>>>>
>>>>> Greetings,
>>>>> Stephan
>>>>>
>>
>>