[DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Stephan Ewen
Hi all!

The ExecutionConfig has some very old settings: forceAvro() and
forceKryo(), which are actually misleadingly named. They cause POJOs to use
Avro or Kryo rather than the POJO serializer.

I think we do not have a good case any more to use Avro for POJOs. POJOs
that are also Avro types go through the Avro serializer anyways.

There may be a case to use Kryo for POJOs if you don't like the Flink POJO
serializer.

I would suggest to remove the "forceAvro()" option completely.
For "forceKryo()", I am torn between removing it completely or renaming it
to "setUseKryoForPOJOs()".

What are the opinion on that out there?

Best,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Yun Tang
Hi Stephan

I prefer to remove 'enableForceKryo' since Kryo serializer does not work out-of-the-box well for schema evolution stories due to its mutable properties, and our built-in POJO serializer has already supported schema evolution.

On the other hand, what's the backward compatibility plan for enableForceAvro() and enableForceKryo()?  I think if https://issues.apache.org/jira/browse/FLINK-11917 merged, we could support to migrate state which was POJO but serialized using Kryo.

Best
Yun Tang
________________________________
From: Stephan Ewen <[hidden email]>
Sent: Tuesday, March 26, 2019 2:31
To: dev; user
Subject: [DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Hi all!

The ExecutionConfig has some very old settings: forceAvro() and forceKryo(), which are actually misleadingly named. They cause POJOs to use Avro or Kryo rather than the POJO serializer.

I think we do not have a good case any more to use Avro for POJOs. POJOs that are also Avro types go through the Avro serializer anyways.

There may be a case to use Kryo for POJOs if you don't like the Flink POJO serializer.

I would suggest to remove the "forceAvro()" option completely.
For "forceKryo()", I am torn between removing it completely or renaming it to "setUseKryoForPOJOs()".

What are the opinion on that out there?

Best,
Stephan

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Stephan Ewen
Compatibility is really important for checkpointed state.
For that, you can always directly specify GenericTypeInfo or AvroTypeInfo
if you want to continue to treat a type via Kryo or Avro.

Alternatively, once https://issues.apache.org/jira/browse/FLINK-11917 is
implemented, this should happen automatically.

On Tue, Mar 26, 2019 at 8:33 AM Yun Tang <[hidden email]> wrote:

> Hi Stephan
>
> I prefer to remove 'enableForceKryo' since Kryo serializer does not work
> out-of-the-box well for schema evolution stories due to its mutable
> properties, and our built-in POJO serializer has already supported schema
> evolution.
>
> On the other hand, what's the backward compatibility plan for
> enableForceAvro() and enableForceKryo()?  I think if
> https://issues.apache.org/jira/browse/FLINK-11917 merged, we could
> support to migrate state which was POJO but serialized using Kryo.
>
> Best
> Yun Tang
> ------------------------------
> *From:* Stephan Ewen <[hidden email]>
> *Sent:* Tuesday, March 26, 2019 2:31
> *To:* dev; user
> *Subject:* [DISCUSS] Remove forceAvro() and forceKryo() from the
> ExecutionConfig
>
> Hi all!
>
> The ExecutionConfig has some very old settings: forceAvro() and
> forceKryo(), which are actually misleadingly named. They cause POJOs to use
> Avro or Kryo rather than the POJO serializer.
>
> I think we do not have a good case any more to use Avro for POJOs. POJOs
> that are also Avro types go through the Avro serializer anyways.
>
> There may be a case to use Kryo for POJOs if you don't like the Flink POJO
> serializer.
>
> I would suggest to remove the "forceAvro()" option completely.
> For "forceKryo()", I am torn between removing it completely or renaming it
> to "setUseKryoForPOJOs()".
>
> What are the opinion on that out there?
>
> Best,
> Stephan
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Konstantin Knauf-3
Hi Stephan,

I am in favor of renaming forceKryo() instead of removing it, because users
might plugin their Protobuf/Thrift serializers via Kryo as advertised in
our documentation [1]. For this, Kryo needs to be used for POJO types as
well, if I am not mistaken.

Cheers,

Konstantin

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/custom_serializers.html


On Tue, Mar 26, 2019 at 10:03 AM Stephan Ewen <[hidden email]> wrote:

> Compatibility is really important for checkpointed state.
> For that, you can always directly specify GenericTypeInfo or AvroTypeInfo
> if you want to continue to treat a type via Kryo or Avro.
>
> Alternatively, once https://issues.apache.org/jira/browse/FLINK-11917 is
> implemented, this should happen automatically.
>
> On Tue, Mar 26, 2019 at 8:33 AM Yun Tang <[hidden email]> wrote:
>
>> Hi Stephan
>>
>> I prefer to remove 'enableForceKryo' since Kryo serializer does not work
>> out-of-the-box well for schema evolution stories due to its mutable
>> properties, and our built-in POJO serializer has already supported schema
>> evolution.
>>
>> On the other hand, what's the backward compatibility plan for
>> enableForceAvro() and enableForceKryo()?  I think if
>> https://issues.apache.org/jira/browse/FLINK-11917 merged, we could
>> support to migrate state which was POJO but serialized using Kryo.
>>
>> Best
>> Yun Tang
>> ------------------------------
>> *From:* Stephan Ewen <[hidden email]>
>> *Sent:* Tuesday, March 26, 2019 2:31
>> *To:* dev; user
>> *Subject:* [DISCUSS] Remove forceAvro() and forceKryo() from the
>> ExecutionConfig
>>
>> Hi all!
>>
>> The ExecutionConfig has some very old settings: forceAvro() and
>> forceKryo(), which are actually misleadingly named. They cause POJOs to use
>> Avro or Kryo rather than the POJO serializer.
>>
>> I think we do not have a good case any more to use Avro for POJOs. POJOs
>> that are also Avro types go through the Avro serializer anyways.
>>
>> There may be a case to use Kryo for POJOs if you don't like the Flink
>> POJO serializer.
>>
>> I would suggest to remove the "forceAvro()" option completely.
>> For "forceKryo()", I am torn between removing it completely or renaming
>> it to "setUseKryoForPOJOs()".
>>
>> What are the opinion on that out there?
>>
>> Best,
>> Stephan
>>
>>

--

Konstantin Knauf | Solutions Architect

+49 160 91394525

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Stephan Ewen
Good point, Konstantin, that makes sense.

On Tue, Mar 26, 2019 at 10:37 AM Konstantin Knauf <[hidden email]>
wrote:

> Hi Stephan,
>
> I am in favor of renaming forceKryo() instead of removing it, because users
> might plugin their Protobuf/Thrift serializers via Kryo as advertised in
> our documentation [1]. For this, Kryo needs to be used for POJO types as
> well, if I am not mistaken.
>
> Cheers,
>
> Konstantin
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/custom_serializers.html
>
>
> On Tue, Mar 26, 2019 at 10:03 AM Stephan Ewen <[hidden email]> wrote:
>
> > Compatibility is really important for checkpointed state.
> > For that, you can always directly specify GenericTypeInfo or AvroTypeInfo
> > if you want to continue to treat a type via Kryo or Avro.
> >
> > Alternatively, once https://issues.apache.org/jira/browse/FLINK-11917 is
> > implemented, this should happen automatically.
> >
> > On Tue, Mar 26, 2019 at 8:33 AM Yun Tang <[hidden email]> wrote:
> >
> >> Hi Stephan
> >>
> >> I prefer to remove 'enableForceKryo' since Kryo serializer does not work
> >> out-of-the-box well for schema evolution stories due to its mutable
> >> properties, and our built-in POJO serializer has already supported
> schema
> >> evolution.
> >>
> >> On the other hand, what's the backward compatibility plan for
> >> enableForceAvro() and enableForceKryo()?  I think if
> >> https://issues.apache.org/jira/browse/FLINK-11917 merged, we could
> >> support to migrate state which was POJO but serialized using Kryo.
> >>
> >> Best
> >> Yun Tang
> >> ------------------------------
> >> *From:* Stephan Ewen <[hidden email]>
> >> *Sent:* Tuesday, March 26, 2019 2:31
> >> *To:* dev; user
> >> *Subject:* [DISCUSS] Remove forceAvro() and forceKryo() from the
> >> ExecutionConfig
> >>
> >> Hi all!
> >>
> >> The ExecutionConfig has some very old settings: forceAvro() and
> >> forceKryo(), which are actually misleadingly named. They cause POJOs to
> use
> >> Avro or Kryo rather than the POJO serializer.
> >>
> >> I think we do not have a good case any more to use Avro for POJOs. POJOs
> >> that are also Avro types go through the Avro serializer anyways.
> >>
> >> There may be a case to use Kryo for POJOs if you don't like the Flink
> >> POJO serializer.
> >>
> >> I would suggest to remove the "forceAvro()" option completely.
> >> For "forceKryo()", I am torn between removing it completely or renaming
> >> it to "setUseKryoForPOJOs()".
> >>
> >> What are the opinion on that out there?
> >>
> >> Best,
> >> Stephan
> >>
> >>
>
> --
>
> Konstantin Knauf | Solutions Architect
>
> +49 160 91394525
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Data Artisans GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Yun Tang
In reply to this post by Konstantin Knauf-3
Hi Konstantin

I think there is no direct relationship between registering chill-protobuf/chill-thrift for Protobuf/Thrift type with Kryo and enforcing POJO to use Kryo.
For both Protobuf and Thrift types, they will be extracted as GenericTypeInfo within TypeExtractor which would use Kryoserializer by default. If we do not register chill-protobuf and chill-thrift within kryo as [1] suggested, we would meet unexpected exceptions. However, PojoTypeInfo would only use KryoSerializer when we enableForceKryo(). They are actually two different aspects and use different fields within ExecutionConfig.

Best
Yun Tang

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/custom_serializers.html
________________________________
From: Konstantin Knauf <[hidden email]>
Sent: Tuesday, March 26, 2019 17:37
To: Stephan Ewen
Cc: Yun Tang; dev; user
Subject: Re: [DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Hi Stephan,

I am in favor of renaming forceKryo() instead of removing it, because users might plugin their Protobuf/Thrift serializers via Kryo as advertised in our documentation [1]. For this, Kryo needs to be used for POJO types as well, if I am not mistaken.

Cheers,

Konstantin

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/custom_serializers.html


On Tue, Mar 26, 2019 at 10:03 AM Stephan Ewen <[hidden email]<mailto:[hidden email]>> wrote:
Compatibility is really important for checkpointed state.
For that, you can always directly specify GenericTypeInfo or AvroTypeInfo if you want to continue to treat a type via Kryo or Avro.

Alternatively, once https://issues.apache.org/jira/browse/FLINK-11917 is implemented, this should happen automatically.

On Tue, Mar 26, 2019 at 8:33 AM Yun Tang <[hidden email]<mailto:[hidden email]>> wrote:
Hi Stephan

I prefer to remove 'enableForceKryo' since Kryo serializer does not work out-of-the-box well for schema evolution stories due to its mutable properties, and our built-in POJO serializer has already supported schema evolution.

On the other hand, what's the backward compatibility plan for enableForceAvro() and enableForceKryo()?  I think if https://issues.apache.org/jira/browse/FLINK-11917 merged, we could support to migrate state which was POJO but serialized using Kryo.

Best
Yun Tang
________________________________
From: Stephan Ewen <[hidden email]<mailto:[hidden email]>>
Sent: Tuesday, March 26, 2019 2:31
To: dev; user
Subject: [DISCUSS] Remove forceAvro() and forceKryo() from the ExecutionConfig

Hi all!

The ExecutionConfig has some very old settings: forceAvro() and forceKryo(), which are actually misleadingly named. They cause POJOs to use Avro or Kryo rather than the POJO serializer.

I think we do not have a good case any more to use Avro for POJOs. POJOs that are also Avro types go through the Avro serializer anyways.

There may be a case to use Kryo for POJOs if you don't like the Flink POJO serializer.

I would suggest to remove the "forceAvro()" option completely.
For "forceKryo()", I am torn between removing it completely or renaming it to "setUseKryoForPOJOs()".

What are the opinion on that out there?

Best,
Stephan



--

Konstantin Knauf | Solutions Architect

+49 160 91394525

[https://lh4.googleusercontent.com/1RRzA12SK12Xaowkag-W37QDs5LHrfw4R0tMwVNjKLDKoIu69ld1qtA2hSDn1LSJe9w2THG1A9igK_nXPrNeIqRF87FjbEQoBnZJJgyPXCkKPFYuYc_Vh419P9EOO36ERgdnX5wG]<https://www.ververica.com/>


Follow us @VervericaData

--

Join Flink Forward<https://flink-forward.org/> - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen