Serialization problem in CollectionEnvironment

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Serialization problem in CollectionEnvironment

Martin Junghanns
Hi,

While building IT tests which extend MultipleProgramsTestBase, I
encountered a problem with serialization:

I posted a minimal example here:
https://gist.github.com/s1ck/566796df5f35ee1de6f9

This runs fine with LocalEnvironment. However, when executing this in
CollectionEnvironment, it leads to the following Exception:

Exception in thread "main" com.esotericsoftware.kryo.KryoException:
Class cannot be created (missing no-arg constructor): java.util.UUID
Serialization trace:
uuid (ObjectInTuple$ID)
        at
com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.newInstantiatorOf(Kryo.java:1228)

I tried to manually register a UUIDSerializer (which should not be
necessary as Flink has a dependency to such "default" serializers), but
this did not fix the problem.

What I don't understand in general is why the LocalEnvironment and the
CollectionEnvironment use different strategies (e.g. in serialization
and also in workflow execution).

Thanks for your help!

Best,
Martin
Reply | Threaded
Open this post in threaded view
|

Re: Serialization problem in CollectionEnvironment

Martin Junghanns
Hi,

I looked further into the problem and discovered, that the serialization
also fails for TestEnvironment and LocalEnvironment if a groupBy
operation is involved.

Please have a look at the updated Gist

https://gist.github.com/s1ck/566796df5f35ee1de6f9

Best,
Martin

On 27.11.2015 10:20, Martin Junghanns wrote:

> Hi,
>
> While building IT tests which extend MultipleProgramsTestBase, I
> encountered a problem with serialization:
>
> I posted a minimal example here:
> https://gist.github.com/s1ck/566796df5f35ee1de6f9
>
> This runs fine with LocalEnvironment. However, when executing this in
> CollectionEnvironment, it leads to the following Exception:
>
> Exception in thread "main" com.esotericsoftware.kryo.KryoException:
> Class cannot be created (missing no-arg constructor): java.util.UUID
> Serialization trace:
> uuid (ObjectInTuple$ID)
>      at
> com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.newInstantiatorOf(Kryo.java:1228)
>
>
> I tried to manually register a UUIDSerializer (which should not be
> necessary as Flink has a dependency to such "default" serializers), but
> this did not fix the problem.
>
> What I don't understand in general is why the LocalEnvironment and the
> CollectionEnvironment use different strategies (e.g. in serialization
> and also in workflow execution).
>
> Thanks for your help!
>
> Best,
> Martin
Reply | Threaded
Open this post in threaded view
|

Re: Serialization problem in CollectionEnvironment

Till Rohrmann
Hi Martin,

I think the problem is that the the WritableSerializer, WritableComparator,
ValueComparator, ValueSerializer and the AvroSerializer all use Kryo to
copy objects. However, in some cases, e.g. missing no-arg constructor, Kryo
is not able to copy the object. In these cases, one should try to copy the
objects via serialization as a fallback strategy. That’s a bug in these
components. I’ll file an issue and open a PR for it.

Thanks a lot for finding this problem Martin!

Cheers,
Till


On Fri, Nov 27, 2015 at 11:59 AM, Martin Junghanns <[hidden email]>
wrote:

> Hi,
>
> I looked further into the problem and discovered, that the serialization
> also fails for TestEnvironment and LocalEnvironment if a groupBy operation
> is involved.
>
> Please have a look at the updated Gist
>
> https://gist.github.com/s1ck/566796df5f35ee1de6f9
>
> Best,
> Martin
>
>
> On 27.11.2015 10:20, Martin Junghanns wrote:
>
>> Hi,
>>
>> While building IT tests which extend MultipleProgramsTestBase, I
>> encountered a problem with serialization:
>>
>> I posted a minimal example here:
>> https://gist.github.com/s1ck/566796df5f35ee1de6f9
>>
>> This runs fine with LocalEnvironment. However, when executing this in
>> CollectionEnvironment, it leads to the following Exception:
>>
>> Exception in thread "main" com.esotericsoftware.kryo.KryoException:
>> Class cannot be created (missing no-arg constructor): java.util.UUID
>> Serialization trace:
>> uuid (ObjectInTuple$ID)
>>      at
>>
>> com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.newInstantiatorOf(Kryo.java:1228)
>>
>>
>> I tried to manually register a UUIDSerializer (which should not be
>> necessary as Flink has a dependency to such "default" serializers), but
>> this did not fix the problem.
>>
>> What I don't understand in general is why the LocalEnvironment and the
>> CollectionEnvironment use different strategies (e.g. in serialization
>> and also in workflow execution).
>>
>> Thanks for your help!
>>
>> Best,
>> Martin
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Serialization problem in CollectionEnvironment

Till Rohrmann
The issue is https://issues.apache.org/jira/browse/FLINK-3088.

On Fri, Nov 27, 2015 at 12:21 PM, Till Rohrmann <[hidden email]>
wrote:

> Hi Martin,
>
> I think the problem is that the the WritableSerializer, WritableComparator,
> ValueComparator, ValueSerializer and the AvroSerializer all use Kryo to
> copy objects. However, in some cases, e.g. missing no-arg constructor,
> Kryo is not able to copy the object. In these cases, one should try to
> copy the objects via serialization as a fallback strategy. That’s a bug in
> these components. I’ll file an issue and open a PR for it.
>
> Thanks a lot for finding this problem Martin!
>
> Cheers,
> Till
> ​
>
> On Fri, Nov 27, 2015 at 11:59 AM, Martin Junghanns <
> [hidden email]> wrote:
>
>> Hi,
>>
>> I looked further into the problem and discovered, that the serialization
>> also fails for TestEnvironment and LocalEnvironment if a groupBy operation
>> is involved.
>>
>> Please have a look at the updated Gist
>>
>> https://gist.github.com/s1ck/566796df5f35ee1de6f9
>>
>> Best,
>> Martin
>>
>>
>> On 27.11.2015 10:20, Martin Junghanns wrote:
>>
>>> Hi,
>>>
>>> While building IT tests which extend MultipleProgramsTestBase, I
>>> encountered a problem with serialization:
>>>
>>> I posted a minimal example here:
>>> https://gist.github.com/s1ck/566796df5f35ee1de6f9
>>>
>>> This runs fine with LocalEnvironment. However, when executing this in
>>> CollectionEnvironment, it leads to the following Exception:
>>>
>>> Exception in thread "main" com.esotericsoftware.kryo.KryoException:
>>> Class cannot be created (missing no-arg constructor): java.util.UUID
>>> Serialization trace:
>>> uuid (ObjectInTuple$ID)
>>>      at
>>>
>>> com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.newInstantiatorOf(Kryo.java:1228)
>>>
>>>
>>> I tried to manually register a UUIDSerializer (which should not be
>>> necessary as Flink has a dependency to such "default" serializers), but
>>> this did not fix the problem.
>>>
>>> What I don't understand in general is why the LocalEnvironment and the
>>> CollectionEnvironment use different strategies (e.g. in serialization
>>> and also in workflow execution).
>>>
>>> Thanks for your help!
>>>
>>> Best,
>>> Martin
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Serialization problem in CollectionEnvironment

Martin Junghanns
Hi Till!

Thanks for looking into it. Is it correct to assume that when a user
defined class that implements Writable is used in Flink datasets, it's
always serialized using Kryo even if the members of that class could be
serialized by Flink's own serialization mechanism?

Again, thanks.

Best,
Martin

On 27.11.2015 12:36, Till Rohrmann wrote:

> The issue is https://issues.apache.org/jira/browse/FLINK-3088.
>
> On Fri, Nov 27, 2015 at 12:21 PM, Till Rohrmann <[hidden email]>
> wrote:
>
>> Hi Martin,
>>
>> I think the problem is that the the WritableSerializer, WritableComparator,
>> ValueComparator, ValueSerializer and the AvroSerializer all use Kryo to
>> copy objects. However, in some cases, e.g. missing no-arg constructor,
>> Kryo is not able to copy the object. In these cases, one should try to
>> copy the objects via serialization as a fallback strategy. That’s a bug in
>> these components. I’ll file an issue and open a PR for it.
>>
>> Thanks a lot for finding this problem Martin!
>>
>> Cheers,
>> Till
>> ​
>>
>> On Fri, Nov 27, 2015 at 11:59 AM, Martin Junghanns <
>> [hidden email]> wrote:
>>
>>> Hi,
>>>
>>> I looked further into the problem and discovered, that the serialization
>>> also fails for TestEnvironment and LocalEnvironment if a groupBy operation
>>> is involved.
>>>
>>> Please have a look at the updated Gist
>>>
>>> https://gist.github.com/s1ck/566796df5f35ee1de6f9
>>>
>>> Best,
>>> Martin
>>>
>>>
>>> On 27.11.2015 10:20, Martin Junghanns wrote:
>>>
>>>> Hi,
>>>>
>>>> While building IT tests which extend MultipleProgramsTestBase, I
>>>> encountered a problem with serialization:
>>>>
>>>> I posted a minimal example here:
>>>> https://gist.github.com/s1ck/566796df5f35ee1de6f9
>>>>
>>>> This runs fine with LocalEnvironment. However, when executing this in
>>>> CollectionEnvironment, it leads to the following Exception:
>>>>
>>>> Exception in thread "main" com.esotericsoftware.kryo.KryoException:
>>>> Class cannot be created (missing no-arg constructor): java.util.UUID
>>>> Serialization trace:
>>>> uuid (ObjectInTuple$ID)
>>>>      at
>>>>
>>>> com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.newInstantiatorOf(Kryo.java:1228)
>>>>
>>>>
>>>> I tried to manually register a UUIDSerializer (which should not be
>>>> necessary as Flink has a dependency to such "default" serializers), but
>>>> this did not fix the problem.
>>>>
>>>> What I don't understand in general is why the LocalEnvironment and the
>>>> CollectionEnvironment use different strategies (e.g. in serialization
>>>> and also in workflow execution).
>>>>
>>>> Thanks for your help!
>>>>
>>>> Best,
>>>> Martin
>>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Serialization problem in CollectionEnvironment

till.rohrmann
Hi Martin,

no, Kryo is actually only used for copying the element. For
serialization/deserialization the write and readFields methods are used.
I’ve changed it now that if Kryo fails, then it will serialize the object
into a byte array and deserialize it from there again in order to copy the
element.

Cheers,
Till


On Fri, Nov 27, 2015 at 3:01 PM, Martin Junghanns <[hidden email]>
wrote:

> Hi Till!
>
> Thanks for looking into it. Is it correct to assume that when a user
> defined class that implements Writable is used in Flink datasets, it's
> always serialized using Kryo even if the members of that class could be
> serialized by Flink's own serialization mechanism?
>
> Again, thanks.
>
> Best,
> Martin
>
> On 27.11.2015 12:36, Till Rohrmann wrote:
> > The issue is https://issues.apache.org/jira/browse/FLINK-3088.
> >
> > On Fri, Nov 27, 2015 at 12:21 PM, Till Rohrmann <[hidden email]>
> > wrote:
> >
> >> Hi Martin,
> >>
> >> I think the problem is that the the WritableSerializer,
> WritableComparator,
> >> ValueComparator, ValueSerializer and the AvroSerializer all use Kryo to
> >> copy objects. However, in some cases, e.g. missing no-arg constructor,
> >> Kryo is not able to copy the object. In these cases, one should try to
> >> copy the objects via serialization as a fallback strategy. That’s a bug
> in
> >> these components. I’ll file an issue and open a PR for it.
> >>
> >> Thanks a lot for finding this problem Martin!
> >>
> >> Cheers,
> >> Till
> >> ​
> >>
> >> On Fri, Nov 27, 2015 at 11:59 AM, Martin Junghanns <
> >> [hidden email]> wrote:
> >>
> >>> Hi,
> >>>
> >>> I looked further into the problem and discovered, that the
> serialization
> >>> also fails for TestEnvironment and LocalEnvironment if a groupBy
> operation
> >>> is involved.
> >>>
> >>> Please have a look at the updated Gist
> >>>
> >>> https://gist.github.com/s1ck/566796df5f35ee1de6f9
> >>>
> >>> Best,
> >>> Martin
> >>>
> >>>
> >>> On 27.11.2015 10:20, Martin Junghanns wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> While building IT tests which extend MultipleProgramsTestBase, I
> >>>> encountered a problem with serialization:
> >>>>
> >>>> I posted a minimal example here:
> >>>> https://gist.github.com/s1ck/566796df5f35ee1de6f9
> >>>>
> >>>> This runs fine with LocalEnvironment. However, when executing this in
> >>>> CollectionEnvironment, it leads to the following Exception:
> >>>>
> >>>> Exception in thread "main" com.esotericsoftware.kryo.KryoException:
> >>>> Class cannot be created (missing no-arg constructor): java.util.UUID
> >>>> Serialization trace:
> >>>> uuid (ObjectInTuple$ID)
> >>>>      at
> >>>>
> >>>>
> com.esotericsoftware.kryo.Kryo$DefaultInstantiatorStrategy.newInstantiatorOf(Kryo.java:1228)
> >>>>
> >>>>
> >>>> I tried to manually register a UUIDSerializer (which should not be
> >>>> necessary as Flink has a dependency to such "default" serializers),
> but
> >>>> this did not fix the problem.
> >>>>
> >>>> What I don't understand in general is why the LocalEnvironment and the
> >>>> CollectionEnvironment use different strategies (e.g. in serialization
> >>>> and also in workflow execution).
> >>>>
> >>>> Thanks for your help!
> >>>>
> >>>> Best,
> >>>> Martin
> >>>>
> >>>
> >>
> >
>