Kryo StackOverflowError

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Kryo StackOverflowError

Andrew Palumbo
Hi all,


I am working on a matrix multiplication operation for Mahout Flink Bindings that uses quite a few chained Flink Dataset operations,


When testing, I am getting the following error:


{...}

04/09/2016 22:30:35    CHAIN Reduce (Reduce at org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147)) -> FlatMap (FlatMap at org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1) switched to CANCELED
04/09/2016 22:30:35    CHAIN Partition -> Map (Map at org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240)) -> GroupCombine (GroupCombine at org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129)) -> Combine (Reduce at org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3) switched to FAILED
java.lang.StackOverflowError
    at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
    at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
    at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
    at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
    at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
    at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
{...}


I've seen similar issues on the dev@flink list (and other places), but I believe that they were from recursive calls and objects which pointed back to themselves somehow.


This is a relatively straightforward method, it just has several Flink operations before execution is triggered.   If I remove some operations, eg. a reduce, i can get the method to complete on a simple test however the it will then, of course be numerically incorrect.


I am wondering if there is any workaround for this type of problem?


Thank You,


Andy
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Stephan Ewen
Hi!

Sorry, I don't fully understand he diagnosis.
You say that this stack overflow is not from a recursive/object type?

Long graphs of operations in Flink usually do not cause
StackOverflowExceptions, because not the whole graph is recursively
processed.

Can you paste the entire Stack Trace (for example to a gist)?

Greetings,
Stephan


On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]> wrote:

> Hi all,
>
>
> I am working on a matrix multiplication operation for Mahout Flink
> Bindings that uses quite a few chained Flink Dataset operations,
>
>
> When testing, I am getting the following error:
>
>
> {...}
>
> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> -> FlatMap (FlatMap at
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> switched to CANCELED
> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> -> GroupCombine (GroupCombine at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> -> Combine (Reduce at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> switched to FAILED
> java.lang.StackOverflowError
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> {...}
>
>
> I've seen similar issues on the dev@flink list (and other places), but I
> believe that they were from recursive calls and objects which pointed back
> to themselves somehow.
>
>
> This is a relatively straightforward method, it just has several Flink
> operations before execution is triggered.   If I remove some operations,
> eg. a reduce, i can get the method to complete on a simple test however the
> it will then, of course be numerically incorrect.
>
>
> I am wondering if there is any workaround for this type of problem?
>
>
> Thank You,
>
>
> Andy
>
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Andrew Palumbo
Hi Stephan,

thanks for answering.

This not from a recursive object. (it is used in a recursive method in the test that is throwing this error, but the the depth is only 2 and there are no other Flink DataSet operations before execution is triggered so it is trivial.)

Gere is a Gist of the code, and the full output and stack trace:

https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419

The Error begins at line 178 of the "Output" file.

Thanks  

________________________________________
From: [hidden email] <[hidden email]> on behalf of Stephan Ewen <[hidden email]>
Sent: Sunday, April 10, 2016 9:39 AM
To: [hidden email]
Subject: Re: Kryo StackOverflowError

Hi!

Sorry, I don't fully understand he diagnosis.
You say that this stack overflow is not from a recursive/object type?

Long graphs of operations in Flink usually do not cause
StackOverflowExceptions, because not the whole graph is recursively
processed.

Can you paste the entire Stack Trace (for example to a gist)?

Greetings,
Stephan


On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]> wrote:

> Hi all,
>
>
> I am working on a matrix multiplication operation for Mahout Flink
> Bindings that uses quite a few chained Flink Dataset operations,
>
>
> When testing, I am getting the following error:
>
>
> {...}
>
> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> -> FlatMap (FlatMap at
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> switched to CANCELED
> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> -> GroupCombine (GroupCombine at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> -> Combine (Reduce at
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> switched to FAILED
> java.lang.StackOverflowError
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>     at
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>     at
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> {...}
>
>
> I've seen similar issues on the dev@flink list (and other places), but I
> believe that they were from recursive calls and objects which pointed back
> to themselves somehow.
>
>
> This is a relatively straightforward method, it just has several Flink
> operations before execution is triggered.   If I remove some operations,
> eg. a reduce, i can get the method to complete on a simple test however the
> it will then, of course be numerically incorrect.
>
>
> I am wondering if there is any workaround for this type of problem?
>
>
> Thank You,
>
>
> Andy
>
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Stephan Ewen
Hi!

Is it possible that some datatype has a recursive structure nonetheless?
Something like a linked list or so, which would create a large object graph?

There seems to be a large object graph that the Kryo serializer traverses,
which causes the StackOverflowError.

Greetings,
Stephan


On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <[hidden email]> wrote:

> Hi Stephan,
>
> thanks for answering.
>
> This not from a recursive object. (it is used in a recursive method in the
> test that is throwing this error, but the the depth is only 2 and there are
> no other Flink DataSet operations before execution is triggered so it is
> trivial.)
>
> Gere is a Gist of the code, and the full output and stack trace:
>
> https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
>
> The Error begins at line 178 of the "Output" file.
>
> Thanks
>
> ________________________________________
> From: [hidden email] <[hidden email]> on behalf of Stephan
> Ewen <[hidden email]>
> Sent: Sunday, April 10, 2016 9:39 AM
> To: [hidden email]
> Subject: Re: Kryo StackOverflowError
>
> Hi!
>
> Sorry, I don't fully understand he diagnosis.
> You say that this stack overflow is not from a recursive/object type?
>
> Long graphs of operations in Flink usually do not cause
> StackOverflowExceptions, because not the whole graph is recursively
> processed.
>
> Can you paste the entire Stack Trace (for example to a gist)?
>
> Greetings,
> Stephan
>
>
> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]>
> wrote:
>
> > Hi all,
> >
> >
> > I am working on a matrix multiplication operation for Mahout Flink
> > Bindings that uses quite a few chained Flink Dataset operations,
> >
> >
> > When testing, I am getting the following error:
> >
> >
> > {...}
> >
> > 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > -> FlatMap (FlatMap at
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > switched to CANCELED
> > 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > -> GroupCombine (GroupCombine at
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > -> Combine (Reduce at
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > switched to FAILED
> > java.lang.StackOverflowError
> >     at
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> >     at
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >     at
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >     at
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >     at
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >     at
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >     at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >     at
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >     at
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > {...}
> >
> >
> > I've seen similar issues on the dev@flink list (and other places), but I
> > believe that they were from recursive calls and objects which pointed
> back
> > to themselves somehow.
> >
> >
> > This is a relatively straightforward method, it just has several Flink
> > operations before execution is triggered.   If I remove some operations,
> > eg. a reduce, i can get the method to complete on a simple test however
> the
> > it will then, of course be numerically incorrect.
> >
> >
> > I am wondering if there is any workaround for this type of problem?
> >
> >
> > Thank You,
> >
> >
> > Andy
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Hilmi Yildirim
Hi,
I also had this problem and solved it.

In my case I had multiple objects which are created via anonymous
classes. When I broadcasted these objects, the serializer tried to
serialize the objects and for that it tried to serialize the anonymous
classes. This caused the problem.

For example,

class A{

  def createObjects() : Array[Object]{
            objects
         for{
             object = new Class{
             ...
             }
             objects.add(object)
         }
         return objects
     }
}

It tried to serialize "new Class". For that it tried to serialize the
method createObjects(). And then it tried to serialize class A. To
serialize class A it tried to serialize the method createObjects. Or
something like that, I do not remember the details. This caused the
recursion.

BR,
Hilmi

Am 10.04.2016 um 19:18 schrieb Stephan Ewen:

> Hi!
>
> Is it possible that some datatype has a recursive structure nonetheless?
> Something like a linked list or so, which would create a large object graph?
>
> There seems to be a large object graph that the Kryo serializer traverses,
> which causes the StackOverflowError.
>
> Greetings,
> Stephan
>
>
> On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <[hidden email]> wrote:
>
>> Hi Stephan,
>>
>> thanks for answering.
>>
>> This not from a recursive object. (it is used in a recursive method in the
>> test that is throwing this error, but the the depth is only 2 and there are
>> no other Flink DataSet operations before execution is triggered so it is
>> trivial.)
>>
>> Gere is a Gist of the code, and the full output and stack trace:
>>
>> https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
>>
>> The Error begins at line 178 of the "Output" file.
>>
>> Thanks
>>
>> ________________________________________
>> From: [hidden email] <[hidden email]> on behalf of Stephan
>> Ewen <[hidden email]>
>> Sent: Sunday, April 10, 2016 9:39 AM
>> To: [hidden email]
>> Subject: Re: Kryo StackOverflowError
>>
>> Hi!
>>
>> Sorry, I don't fully understand he diagnosis.
>> You say that this stack overflow is not from a recursive/object type?
>>
>> Long graphs of operations in Flink usually do not cause
>> StackOverflowExceptions, because not the whole graph is recursively
>> processed.
>>
>> Can you paste the entire Stack Trace (for example to a gist)?
>>
>> Greetings,
>> Stephan
>>
>>
>> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]>
>> wrote:
>>
>>> Hi all,
>>>
>>>
>>> I am working on a matrix multiplication operation for Mahout Flink
>>> Bindings that uses quite a few chained Flink Dataset operations,
>>>
>>>
>>> When testing, I am getting the following error:
>>>
>>>
>>> {...}
>>>
>>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
>>>
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
>>> -> FlatMap (FlatMap at
>>>
>> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
>>> switched to CANCELED
>>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
>>>
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
>>> -> GroupCombine (GroupCombine at
>>>
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
>>> -> Combine (Reduce at
>>>
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
>>> switched to FAILED
>>> java.lang.StackOverflowError
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>>> {...}
>>>
>>>
>>> I've seen similar issues on the dev@flink list (and other places), but I
>>> believe that they were from recursive calls and objects which pointed
>> back
>>> to themselves somehow.
>>>
>>>
>>> This is a relatively straightforward method, it just has several Flink
>>> operations before execution is triggered.   If I remove some operations,
>>> eg. a reduce, i can get the method to complete on a simple test however
>> the
>>> it will then, of course be numerically incorrect.
>>>
>>>
>>> I am wondering if there is any workaround for this type of problem?
>>>
>>>
>>> Thank You,
>>>
>>>
>>> Andy
>>>


--
==================================================================
Hilmi Yildirim, M.Sc.
Researcher

DFKI GmbH
Intelligente Analytik für Massendaten
DFKI Projektbüro Berlin
Alt-Moabit 91c
D-10559 Berlin
Phone: +49 30 23895 1814

E-Mail: [hidden email]

-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

RE: Kryo StackOverflowError

Lisonbee, Todd
Hi,

I also got this error message when I had private inner classes:

public class A {
    private class B {
    }
}

I was able to fix by making the inner classes public static:

public class A {
    public static class B {
    }
}

When I was trying to debug it seemed this error message can be caused by several different things.

Thanks,

Todd


-----Original Message-----
From: Hilmi Yildirim [mailto:[hidden email]]
Sent: Sunday, April 10, 2016 11:36 AM
To: [hidden email]
Subject: Re: Kryo StackOverflowError

Hi,
I also had this problem and solved it.

In my case I had multiple objects which are created via anonymous classes. When I broadcasted these objects, the serializer tried to serialize the objects and for that it tried to serialize the anonymous classes. This caused the problem.

For example,

class A{

  def createObjects() : Array[Object]{
            objects
         for{
             object = new Class{
             ...
             }
             objects.add(object)
         }
         return objects
     }
}

It tried to serialize "new Class". For that it tried to serialize the method createObjects(). And then it tried to serialize class A. To serialize class A it tried to serialize the method createObjects. Or something like that, I do not remember the details. This caused the recursion.

BR,
Hilmi

Am 10.04.2016 um 19:18 schrieb Stephan Ewen:

> Hi!
>
> Is it possible that some datatype has a recursive structure nonetheless?
> Something like a linked list or so, which would create a large object graph?
>
> There seems to be a large object graph that the Kryo serializer traverses,
> which causes the StackOverflowError.
>
> Greetings,
> Stephan
>
>
> On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <[hidden email]> wrote:
>
>> Hi Stephan,
>>
>> thanks for answering.
>>
>> This not from a recursive object. (it is used in a recursive method in the
>> test that is throwing this error, but the the depth is only 2 and there are
>> no other Flink DataSet operations before execution is triggered so it is
>> trivial.)
>>
>> Gere is a Gist of the code, and the full output and stack trace:
>>
>> https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
>>
>> The Error begins at line 178 of the "Output" file.
>>
>> Thanks
>>
>> ________________________________________
>> From: [hidden email] <[hidden email]> on behalf of Stephan
>> Ewen <[hidden email]>
>> Sent: Sunday, April 10, 2016 9:39 AM
>> To: [hidden email]
>> Subject: Re: Kryo StackOverflowError
>>
>> Hi!
>>
>> Sorry, I don't fully understand he diagnosis.
>> You say that this stack overflow is not from a recursive/object type?
>>
>> Long graphs of operations in Flink usually do not cause
>> StackOverflowExceptions, because not the whole graph is recursively
>> processed.
>>
>> Can you paste the entire Stack Trace (for example to a gist)?
>>
>> Greetings,
>> Stephan
>>
>>
>> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]>
>> wrote:
>>
>>> Hi all,
>>>
>>>
>>> I am working on a matrix multiplication operation for Mahout Flink
>>> Bindings that uses quite a few chained Flink Dataset operations,
>>>
>>>
>>> When testing, I am getting the following error:
>>>
>>>
>>> {...}
>>>
>>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
>>>
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
>>> -> FlatMap (FlatMap at
>>>
>> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
>>> switched to CANCELED
>>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
>>>
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
>>> -> GroupCombine (GroupCombine at
>>>
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
>>> -> Combine (Reduce at
>>>
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
>>> switched to FAILED
>>> java.lang.StackOverflowError
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>>>      at
>>>
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>>> {...}
>>>
>>>
>>> I've seen similar issues on the dev@flink list (and other places), but I
>>> believe that they were from recursive calls and objects which pointed
>> back
>>> to themselves somehow.
>>>
>>>
>>> This is a relatively straightforward method, it just has several Flink
>>> operations before execution is triggered.   If I remove some operations,
>>> eg. a reduce, i can get the method to complete on a simple test however
>> the
>>> it will then, of course be numerically incorrect.
>>>
>>>
>>> I am wondering if there is any workaround for this type of problem?
>>>
>>>
>>> Thank You,
>>>
>>>
>>> Andy
>>>


--
==================================================================
Hilmi Yildirim, M.Sc.
Researcher

DFKI GmbH
Intelligente Analytik für Massendaten
DFKI Projektbüro Berlin
Alt-Moabit 91c
D-10559 Berlin
Phone: +49 30 23895 1814

E-Mail: [hidden email]

-------------------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern

Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff

Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes

Amtsgericht Kaiserslautern, HRB 2313
-------------------------------------------------------------

Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Till Rohrmann
Hey guys,

I have a suspicion which could be the culprit: Could change the line
KryoSerializer.java:328 to kryo.setReferences(true) and try if the error
still remains? We deactivated the reference tracking and now Kryo shouldn’t
be able to resolve cyclic references properly.

Cheers,
Till


On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <[hidden email]>
wrote:

> Hi,
>
> I also got this error message when I had private inner classes:
>
> public class A {
>     private class B {
>     }
> }
>
> I was able to fix by making the inner classes public static:
>
> public class A {
>     public static class B {
>     }
> }
>
> When I was trying to debug it seemed this error message can be caused by
> several different things.
>
> Thanks,
>
> Todd
>
>
> -----Original Message-----
> From: Hilmi Yildirim [mailto:[hidden email]]
> Sent: Sunday, April 10, 2016 11:36 AM
> To: [hidden email]
> Subject: Re: Kryo StackOverflowError
>
> Hi,
> I also had this problem and solved it.
>
> In my case I had multiple objects which are created via anonymous classes.
> When I broadcasted these objects, the serializer tried to serialize the
> objects and for that it tried to serialize the anonymous classes. This
> caused the problem.
>
> For example,
>
> class A{
>
>   def createObjects() : Array[Object]{
>             objects
>          for{
>              object = new Class{
>              ...
>              }
>              objects.add(object)
>          }
>          return objects
>      }
> }
>
> It tried to serialize "new Class". For that it tried to serialize the
> method createObjects(). And then it tried to serialize class A. To
> serialize class A it tried to serialize the method createObjects. Or
> something like that, I do not remember the details. This caused the
> recursion.
>
> BR,
> Hilmi
>
> Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > Hi!
> >
> > Is it possible that some datatype has a recursive structure nonetheless?
> > Something like a linked list or so, which would create a large object
> graph?
> >
> > There seems to be a large object graph that the Kryo serializer
> traverses,
> > which causes the StackOverflowError.
> >
> > Greetings,
> > Stephan
> >
> >
> > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <[hidden email]>
> wrote:
> >
> >> Hi Stephan,
> >>
> >> thanks for answering.
> >>
> >> This not from a recursive object. (it is used in a recursive method in
> the
> >> test that is throwing this error, but the the depth is only 2 and there
> are
> >> no other Flink DataSet operations before execution is triggered so it is
> >> trivial.)
> >>
> >> Gere is a Gist of the code, and the full output and stack trace:
> >>
> >> https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> >>
> >> The Error begins at line 178 of the "Output" file.
> >>
> >> Thanks
> >>
> >> ________________________________________
> >> From: [hidden email] <[hidden email]> on behalf of
> Stephan
> >> Ewen <[hidden email]>
> >> Sent: Sunday, April 10, 2016 9:39 AM
> >> To: [hidden email]
> >> Subject: Re: Kryo StackOverflowError
> >>
> >> Hi!
> >>
> >> Sorry, I don't fully understand he diagnosis.
> >> You say that this stack overflow is not from a recursive/object type?
> >>
> >> Long graphs of operations in Flink usually do not cause
> >> StackOverflowExceptions, because not the whole graph is recursively
> >> processed.
> >>
> >> Can you paste the entire Stack Trace (for example to a gist)?
> >>
> >> Greetings,
> >> Stephan
> >>
> >>
> >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]>
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>>
> >>> I am working on a matrix multiplication operation for Mahout Flink
> >>> Bindings that uses quite a few chained Flink Dataset operations,
> >>>
> >>>
> >>> When testing, I am getting the following error:
> >>>
> >>>
> >>> {...}
> >>>
> >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> >>>
> >>
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> >>> -> FlatMap (FlatMap at
> >>>
> >>
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> >>> switched to CANCELED
> >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> >>>
> >>
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> >>> -> GroupCombine (GroupCombine at
> >>>
> >>
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> >>> -> Combine (Reduce at
> >>>
> >>
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> >>> switched to FAILED
> >>> java.lang.StackOverflowError
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> >>>      at
> >>>
> >>
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> >>> {...}
> >>>
> >>>
> >>> I've seen similar issues on the dev@flink list (and other places),
> but I
> >>> believe that they were from recursive calls and objects which pointed
> >> back
> >>> to themselves somehow.
> >>>
> >>>
> >>> This is a relatively straightforward method, it just has several Flink
> >>> operations before execution is triggered.   If I remove some
> operations,
> >>> eg. a reduce, i can get the method to complete on a simple test however
> >> the
> >>> it will then, of course be numerically incorrect.
> >>>
> >>>
> >>> I am wondering if there is any workaround for this type of problem?
> >>>
> >>>
> >>> Thank You,
> >>>
> >>>
> >>> Andy
> >>>
>
>
> --
> ==================================================================
> Hilmi Yildirim, M.Sc.
> Researcher
>
> DFKI GmbH
> Intelligente Analytik für Massendaten
> DFKI Projektbüro Berlin
> Alt-Moabit 91c
> D-10559 Berlin
> Phone: +49 30 23895 1814
>
> E-Mail: [hidden email]
>
> -------------------------------------------------------------
> Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>
> Geschaeftsfuehrung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> Dr. Walter Olthoff
>
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
>
> Amtsgericht Kaiserslautern, HRB 2313
> -------------------------------------------------------------
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Robert Metzger
Good catch Till!

I just checked it with the Mahout source code and the issues is gone with
reference tracking enabled.

I would just re-enable it again in Flink.

On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <[hidden email]>
wrote:

> Hey guys,
>
> I have a suspicion which could be the culprit: Could change the line
> KryoSerializer.java:328 to kryo.setReferences(true) and try if the error
> still remains? We deactivated the reference tracking and now Kryo shouldn’t
> be able to resolve cyclic references properly.
>
> Cheers,
> Till
> ​
>
> On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <[hidden email]>
> wrote:
>
> > Hi,
> >
> > I also got this error message when I had private inner classes:
> >
> > public class A {
> >     private class B {
> >     }
> > }
> >
> > I was able to fix by making the inner classes public static:
> >
> > public class A {
> >     public static class B {
> >     }
> > }
> >
> > When I was trying to debug it seemed this error message can be caused by
> > several different things.
> >
> > Thanks,
> >
> > Todd
> >
> >
> > -----Original Message-----
> > From: Hilmi Yildirim [mailto:[hidden email]]
> > Sent: Sunday, April 10, 2016 11:36 AM
> > To: [hidden email]
> > Subject: Re: Kryo StackOverflowError
> >
> > Hi,
> > I also had this problem and solved it.
> >
> > In my case I had multiple objects which are created via anonymous
> classes.
> > When I broadcasted these objects, the serializer tried to serialize the
> > objects and for that it tried to serialize the anonymous classes. This
> > caused the problem.
> >
> > For example,
> >
> > class A{
> >
> >   def createObjects() : Array[Object]{
> >             objects
> >          for{
> >              object = new Class{
> >              ...
> >              }
> >              objects.add(object)
> >          }
> >          return objects
> >      }
> > }
> >
> > It tried to serialize "new Class". For that it tried to serialize the
> > method createObjects(). And then it tried to serialize class A. To
> > serialize class A it tried to serialize the method createObjects. Or
> > something like that, I do not remember the details. This caused the
> > recursion.
> >
> > BR,
> > Hilmi
> >
> > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > > Hi!
> > >
> > > Is it possible that some datatype has a recursive structure
> nonetheless?
> > > Something like a linked list or so, which would create a large object
> > graph?
> > >
> > > There seems to be a large object graph that the Kryo serializer
> > traverses,
> > > which causes the StackOverflowError.
> > >
> > > Greetings,
> > > Stephan
> > >
> > >
> > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <[hidden email]>
> > wrote:
> > >
> > >> Hi Stephan,
> > >>
> > >> thanks for answering.
> > >>
> > >> This not from a recursive object. (it is used in a recursive method in
> > the
> > >> test that is throwing this error, but the the depth is only 2 and
> there
> > are
> > >> no other Flink DataSet operations before execution is triggered so it
> is
> > >> trivial.)
> > >>
> > >> Gere is a Gist of the code, and the full output and stack trace:
> > >>
> > >>
> https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> > >>
> > >> The Error begins at line 178 of the "Output" file.
> > >>
> > >> Thanks
> > >>
> > >> ________________________________________
> > >> From: [hidden email] <[hidden email]> on behalf of
> > Stephan
> > >> Ewen <[hidden email]>
> > >> Sent: Sunday, April 10, 2016 9:39 AM
> > >> To: [hidden email]
> > >> Subject: Re: Kryo StackOverflowError
> > >>
> > >> Hi!
> > >>
> > >> Sorry, I don't fully understand he diagnosis.
> > >> You say that this stack overflow is not from a recursive/object type?
> > >>
> > >> Long graphs of operations in Flink usually do not cause
> > >> StackOverflowExceptions, because not the whole graph is recursively
> > >> processed.
> > >>
> > >> Can you paste the entire Stack Trace (for example to a gist)?
> > >>
> > >> Greetings,
> > >> Stephan
> > >>
> > >>
> > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]>
> > >> wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>>
> > >>> I am working on a matrix multiplication operation for Mahout Flink
> > >>> Bindings that uses quite a few chained Flink Dataset operations,
> > >>>
> > >>>
> > >>> When testing, I am getting the following error:
> > >>>
> > >>>
> > >>> {...}
> > >>>
> > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> > >>>
> > >>
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > >>> -> FlatMap (FlatMap at
> > >>>
> > >>
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > >>> switched to CANCELED
> > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> > >>>
> > >>
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > >>> -> GroupCombine (GroupCombine at
> > >>>
> > >>
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > >>> -> Combine (Reduce at
> > >>>
> > >>
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > >>> switched to FAILED
> > >>> java.lang.StackOverflowError
> > >>>      at
> > >>>
> > >>
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> > >>>      at
> > >>>
> > >>
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > >>>      at
> > >>>
> > >>
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > >>>      at
> > >>>
> > >>
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > >>>      at
> > >>>
> > >>
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > >>>      at
> > >>>
> > >>
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > >>>      at
> > >>>
> > >>
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > >>>      at
> > >>>
> > >>
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > >>> {...}
> > >>>
> > >>>
> > >>> I've seen similar issues on the dev@flink list (and other places),
> > but I
> > >>> believe that they were from recursive calls and objects which pointed
> > >> back
> > >>> to themselves somehow.
> > >>>
> > >>>
> > >>> This is a relatively straightforward method, it just has several
> Flink
> > >>> operations before execution is triggered.   If I remove some
> > operations,
> > >>> eg. a reduce, i can get the method to complete on a simple test
> however
> > >> the
> > >>> it will then, of course be numerically incorrect.
> > >>>
> > >>>
> > >>> I am wondering if there is any workaround for this type of problem?
> > >>>
> > >>>
> > >>> Thank You,
> > >>>
> > >>>
> > >>> Andy
> > >>>
> >
> >
> > --
> > ==================================================================
> > Hilmi Yildirim, M.Sc.
> > Researcher
> >
> > DFKI GmbH
> > Intelligente Analytik für Massendaten
> > DFKI Projektbüro Berlin
> > Alt-Moabit 91c
> > D-10559 Berlin
> > Phone: +49 30 23895 1814
> >
> > E-Mail: [hidden email]
> >
> > -------------------------------------------------------------
> > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> >
> > Geschaeftsfuehrung:
> > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> > Dr. Walter Olthoff
> >
> > Vorsitzender des Aufsichtsrats:
> > Prof. Dr. h.c. Hans A. Aukes
> >
> > Amtsgericht Kaiserslautern, HRB 2313
> > -------------------------------------------------------------
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Till Rohrmann
+1

On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <[hidden email]> wrote:

> Good catch Till!
>
> I just checked it with the Mahout source code and the issues is gone with
> reference tracking enabled.
>
> I would just re-enable it again in Flink.
>
> On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <[hidden email]>
> wrote:
>
> > Hey guys,
> >
> > I have a suspicion which could be the culprit: Could change the line
> > KryoSerializer.java:328 to kryo.setReferences(true) and try if the error
> > still remains? We deactivated the reference tracking and now Kryo
> shouldn’t
> > be able to resolve cyclic references properly.
> >
> > Cheers,
> > Till
> > ​
> >
> > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <
> [hidden email]>
> > wrote:
> >
> > > Hi,
> > >
> > > I also got this error message when I had private inner classes:
> > >
> > > public class A {
> > >     private class B {
> > >     }
> > > }
> > >
> > > I was able to fix by making the inner classes public static:
> > >
> > > public class A {
> > >     public static class B {
> > >     }
> > > }
> > >
> > > When I was trying to debug it seemed this error message can be caused
> by
> > > several different things.
> > >
> > > Thanks,
> > >
> > > Todd
> > >
> > >
> > > -----Original Message-----
> > > From: Hilmi Yildirim [mailto:[hidden email]]
> > > Sent: Sunday, April 10, 2016 11:36 AM
> > > To: [hidden email]
> > > Subject: Re: Kryo StackOverflowError
> > >
> > > Hi,
> > > I also had this problem and solved it.
> > >
> > > In my case I had multiple objects which are created via anonymous
> > classes.
> > > When I broadcasted these objects, the serializer tried to serialize the
> > > objects and for that it tried to serialize the anonymous classes. This
> > > caused the problem.
> > >
> > > For example,
> > >
> > > class A{
> > >
> > >   def createObjects() : Array[Object]{
> > >             objects
> > >          for{
> > >              object = new Class{
> > >              ...
> > >              }
> > >              objects.add(object)
> > >          }
> > >          return objects
> > >      }
> > > }
> > >
> > > It tried to serialize "new Class". For that it tried to serialize the
> > > method createObjects(). And then it tried to serialize class A. To
> > > serialize class A it tried to serialize the method createObjects. Or
> > > something like that, I do not remember the details. This caused the
> > > recursion.
> > >
> > > BR,
> > > Hilmi
> > >
> > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > > > Hi!
> > > >
> > > > Is it possible that some datatype has a recursive structure
> > nonetheless?
> > > > Something like a linked list or so, which would create a large object
> > > graph?
> > > >
> > > > There seems to be a large object graph that the Kryo serializer
> > > traverses,
> > > > which causes the StackOverflowError.
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <[hidden email]>
> > > wrote:
> > > >
> > > >> Hi Stephan,
> > > >>
> > > >> thanks for answering.
> > > >>
> > > >> This not from a recursive object. (it is used in a recursive method
> in
> > > the
> > > >> test that is throwing this error, but the the depth is only 2 and
> > there
> > > are
> > > >> no other Flink DataSet operations before execution is triggered so
> it
> > is
> > > >> trivial.)
> > > >>
> > > >> Gere is a Gist of the code, and the full output and stack trace:
> > > >>
> > > >>
> > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> > > >>
> > > >> The Error begins at line 178 of the "Output" file.
> > > >>
> > > >> Thanks
> > > >>
> > > >> ________________________________________
> > > >> From: [hidden email] <[hidden email]> on behalf of
> > > Stephan
> > > >> Ewen <[hidden email]>
> > > >> Sent: Sunday, April 10, 2016 9:39 AM
> > > >> To: [hidden email]
> > > >> Subject: Re: Kryo StackOverflowError
> > > >>
> > > >> Hi!
> > > >>
> > > >> Sorry, I don't fully understand he diagnosis.
> > > >> You say that this stack overflow is not from a recursive/object
> type?
> > > >>
> > > >> Long graphs of operations in Flink usually do not cause
> > > >> StackOverflowExceptions, because not the whole graph is recursively
> > > >> processed.
> > > >>
> > > >> Can you paste the entire Stack Trace (for example to a gist)?
> > > >>
> > > >> Greetings,
> > > >> Stephan
> > > >>
> > > >>
> > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]
> >
> > > >> wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>>
> > > >>> I am working on a matrix multiplication operation for Mahout Flink
> > > >>> Bindings that uses quite a few chained Flink Dataset operations,
> > > >>>
> > > >>>
> > > >>> When testing, I am getting the following error:
> > > >>>
> > > >>>
> > > >>> {...}
> > > >>>
> > > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > > >>> -> FlatMap (FlatMap at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > > >>> switched to CANCELED
> > > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > > >>> -> GroupCombine (GroupCombine at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > > >>> -> Combine (Reduce at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > > >>> switched to FAILED
> > > >>> java.lang.StackOverflowError
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>> {...}
> > > >>>
> > > >>>
> > > >>> I've seen similar issues on the dev@flink list (and other places),
> > > but I
> > > >>> believe that they were from recursive calls and objects which
> pointed
> > > >> back
> > > >>> to themselves somehow.
> > > >>>
> > > >>>
> > > >>> This is a relatively straightforward method, it just has several
> > Flink
> > > >>> operations before execution is triggered.   If I remove some
> > > operations,
> > > >>> eg. a reduce, i can get the method to complete on a simple test
> > however
> > > >> the
> > > >>> it will then, of course be numerically incorrect.
> > > >>>
> > > >>>
> > > >>> I am wondering if there is any workaround for this type of problem?
> > > >>>
> > > >>>
> > > >>> Thank You,
> > > >>>
> > > >>>
> > > >>> Andy
> > > >>>
> > >
> > >
> > > --
> > > ==================================================================
> > > Hilmi Yildirim, M.Sc.
> > > Researcher
> > >
> > > DFKI GmbH
> > > Intelligente Analytik für Massendaten
> > > DFKI Projektbüro Berlin
> > > Alt-Moabit 91c
> > > D-10559 Berlin
> > > Phone: +49 30 23895 1814
> > >
> > > E-Mail: [hidden email]
> > >
> > > -------------------------------------------------------------
> > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> > >
> > > Geschaeftsfuehrung:
> > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> > > Dr. Walter Olthoff
> > >
> > > Vorsitzender des Aufsichtsrats:
> > > Prof. Dr. h.c. Hans A. Aukes
> > >
> > > Amtsgericht Kaiserslautern, HRB 2313
> > > -------------------------------------------------------------
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Andrew Palumbo

Hi,

Great! Do you think that this is something that you'll be enabling in your upcoming 1.0.2 release?  We plan on putting out a maintenance Mahout Release relatively soon and this would allow us to speed up Matrix Multiplication greatly.

Thanks,

Andy  
________________________________________
From: Till Rohrmann <[hidden email]>
Sent: Tuesday, April 12, 2016 11:18 AM
To: [hidden email]
Subject: Re: Kryo StackOverflowError

+1

On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <[hidden email]> wrote:

> Good catch Till!
>
> I just checked it with the Mahout source code and the issues is gone with
> reference tracking enabled.
>
> I would just re-enable it again in Flink.
>
> On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <[hidden email]>
> wrote:
>
> > Hey guys,
> >
> > I have a suspicion which could be the culprit: Could change the line
> > KryoSerializer.java:328 to kryo.setReferences(true) and try if the error
> > still remains? We deactivated the reference tracking and now Kryo
> shouldn’t
> > be able to resolve cyclic references properly.
> >
> > Cheers,
> > Till
> > ​
> >
> > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <
> [hidden email]>
> > wrote:
> >
> > > Hi,
> > >
> > > I also got this error message when I had private inner classes:
> > >
> > > public class A {
> > >     private class B {
> > >     }
> > > }
> > >
> > > I was able to fix by making the inner classes public static:
> > >
> > > public class A {
> > >     public static class B {
> > >     }
> > > }
> > >
> > > When I was trying to debug it seemed this error message can be caused
> by
> > > several different things.
> > >
> > > Thanks,
> > >
> > > Todd
> > >
> > >
> > > -----Original Message-----
> > > From: Hilmi Yildirim [mailto:[hidden email]]
> > > Sent: Sunday, April 10, 2016 11:36 AM
> > > To: [hidden email]
> > > Subject: Re: Kryo StackOverflowError
> > >
> > > Hi,
> > > I also had this problem and solved it.
> > >
> > > In my case I had multiple objects which are created via anonymous
> > classes.
> > > When I broadcasted these objects, the serializer tried to serialize the
> > > objects and for that it tried to serialize the anonymous classes. This
> > > caused the problem.
> > >
> > > For example,
> > >
> > > class A{
> > >
> > >   def createObjects() : Array[Object]{
> > >             objects
> > >          for{
> > >              object = new Class{
> > >              ...
> > >              }
> > >              objects.add(object)
> > >          }
> > >          return objects
> > >      }
> > > }
> > >
> > > It tried to serialize "new Class". For that it tried to serialize the
> > > method createObjects(). And then it tried to serialize class A. To
> > > serialize class A it tried to serialize the method createObjects. Or
> > > something like that, I do not remember the details. This caused the
> > > recursion.
> > >
> > > BR,
> > > Hilmi
> > >
> > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > > > Hi!
> > > >
> > > > Is it possible that some datatype has a recursive structure
> > nonetheless?
> > > > Something like a linked list or so, which would create a large object
> > > graph?
> > > >
> > > > There seems to be a large object graph that the Kryo serializer
> > > traverses,
> > > > which causes the StackOverflowError.
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > > >
> > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <[hidden email]>
> > > wrote:
> > > >
> > > >> Hi Stephan,
> > > >>
> > > >> thanks for answering.
> > > >>
> > > >> This not from a recursive object. (it is used in a recursive method
> in
> > > the
> > > >> test that is throwing this error, but the the depth is only 2 and
> > there
> > > are
> > > >> no other Flink DataSet operations before execution is triggered so
> it
> > is
> > > >> trivial.)
> > > >>
> > > >> Gere is a Gist of the code, and the full output and stack trace:
> > > >>
> > > >>
> > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> > > >>
> > > >> The Error begins at line 178 of the "Output" file.
> > > >>
> > > >> Thanks
> > > >>
> > > >> ________________________________________
> > > >> From: [hidden email] <[hidden email]> on behalf of
> > > Stephan
> > > >> Ewen <[hidden email]>
> > > >> Sent: Sunday, April 10, 2016 9:39 AM
> > > >> To: [hidden email]
> > > >> Subject: Re: Kryo StackOverflowError
> > > >>
> > > >> Hi!
> > > >>
> > > >> Sorry, I don't fully understand he diagnosis.
> > > >> You say that this stack overflow is not from a recursive/object
> type?
> > > >>
> > > >> Long graphs of operations in Flink usually do not cause
> > > >> StackOverflowExceptions, because not the whole graph is recursively
> > > >> processed.
> > > >>
> > > >> Can you paste the entire Stack Trace (for example to a gist)?
> > > >>
> > > >> Greetings,
> > > >> Stephan
> > > >>
> > > >>
> > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <[hidden email]
> >
> > > >> wrote:
> > > >>
> > > >>> Hi all,
> > > >>>
> > > >>>
> > > >>> I am working on a matrix multiplication operation for Mahout Flink
> > > >>> Bindings that uses quite a few chained Flink Dataset operations,
> > > >>>
> > > >>>
> > > >>> When testing, I am getting the following error:
> > > >>>
> > > >>>
> > > >>> {...}
> > > >>>
> > > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > > >>> -> FlatMap (FlatMap at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > > >>> switched to CANCELED
> > > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > > >>> -> GroupCombine (GroupCombine at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > > >>> -> Combine (Reduce at
> > > >>>
> > > >>
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > > >>> switched to FAILED
> > > >>> java.lang.StackOverflowError
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > >>>      at
> > > >>>
> > > >>
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > >>> {...}
> > > >>>
> > > >>>
> > > >>> I've seen similar issues on the dev@flink list (and other places),
> > > but I
> > > >>> believe that they were from recursive calls and objects which
> pointed
> > > >> back
> > > >>> to themselves somehow.
> > > >>>
> > > >>>
> > > >>> This is a relatively straightforward method, it just has several
> > Flink
> > > >>> operations before execution is triggered.   If I remove some
> > > operations,
> > > >>> eg. a reduce, i can get the method to complete on a simple test
> > however
> > > >> the
> > > >>> it will then, of course be numerically incorrect.
> > > >>>
> > > >>>
> > > >>> I am wondering if there is any workaround for this type of problem?
> > > >>>
> > > >>>
> > > >>> Thank You,
> > > >>>
> > > >>>
> > > >>> Andy
> > > >>>
> > >
> > >
> > > --
> > > ==================================================================
> > > Hilmi Yildirim, M.Sc.
> > > Researcher
> > >
> > > DFKI GmbH
> > > Intelligente Analytik für Massendaten
> > > DFKI Projektbüro Berlin
> > > Alt-Moabit 91c
> > > D-10559 Berlin
> > > Phone: +49 30 23895 1814
> > >
> > > E-Mail: [hidden email]
> > >
> > > -------------------------------------------------------------
> > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> > >
> > > Geschaeftsfuehrung:
> > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> > > Dr. Walter Olthoff
> > >
> > > Vorsitzender des Aufsichtsrats:
> > > Prof. Dr. h.c. Hans A. Aukes
> > >
> > > Amtsgericht Kaiserslautern, HRB 2313
> > > -------------------------------------------------------------
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

Stephan Ewen
+1 to add this to 1.0.2


On Wed, Apr 13, 2016 at 1:57 AM, Andrew Palumbo <[hidden email]> wrote:

>
> Hi,
>
> Great! Do you think that this is something that you'll be enabling in your
> upcoming 1.0.2 release?  We plan on putting out a maintenance Mahout
> Release relatively soon and this would allow us to speed up Matrix
> Multiplication greatly.
>
> Thanks,
>
> Andy
> ________________________________________
> From: Till Rohrmann <[hidden email]>
> Sent: Tuesday, April 12, 2016 11:18 AM
> To: [hidden email]
> Subject: Re: Kryo StackOverflowError
>
> +1
>
> On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <[hidden email]>
> wrote:
>
> > Good catch Till!
> >
> > I just checked it with the Mahout source code and the issues is gone with
> > reference tracking enabled.
> >
> > I would just re-enable it again in Flink.
> >
> > On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <[hidden email]>
> > wrote:
> >
> > > Hey guys,
> > >
> > > I have a suspicion which could be the culprit: Could change the line
> > > KryoSerializer.java:328 to kryo.setReferences(true) and try if the
> error
> > > still remains? We deactivated the reference tracking and now Kryo
> > shouldn’t
> > > be able to resolve cyclic references properly.
> > >
> > > Cheers,
> > > Till
> > > ​
> > >
> > > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <
> > [hidden email]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I also got this error message when I had private inner classes:
> > > >
> > > > public class A {
> > > >     private class B {
> > > >     }
> > > > }
> > > >
> > > > I was able to fix by making the inner classes public static:
> > > >
> > > > public class A {
> > > >     public static class B {
> > > >     }
> > > > }
> > > >
> > > > When I was trying to debug it seemed this error message can be caused
> > by
> > > > several different things.
> > > >
> > > > Thanks,
> > > >
> > > > Todd
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Hilmi Yildirim [mailto:[hidden email]]
> > > > Sent: Sunday, April 10, 2016 11:36 AM
> > > > To: [hidden email]
> > > > Subject: Re: Kryo StackOverflowError
> > > >
> > > > Hi,
> > > > I also had this problem and solved it.
> > > >
> > > > In my case I had multiple objects which are created via anonymous
> > > classes.
> > > > When I broadcasted these objects, the serializer tried to serialize
> the
> > > > objects and for that it tried to serialize the anonymous classes.
> This
> > > > caused the problem.
> > > >
> > > > For example,
> > > >
> > > > class A{
> > > >
> > > >   def createObjects() : Array[Object]{
> > > >             objects
> > > >          for{
> > > >              object = new Class{
> > > >              ...
> > > >              }
> > > >              objects.add(object)
> > > >          }
> > > >          return objects
> > > >      }
> > > > }
> > > >
> > > > It tried to serialize "new Class". For that it tried to serialize the
> > > > method createObjects(). And then it tried to serialize class A. To
> > > > serialize class A it tried to serialize the method createObjects. Or
> > > > something like that, I do not remember the details. This caused the
> > > > recursion.
> > > >
> > > > BR,
> > > > Hilmi
> > > >
> > > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > > > > Hi!
> > > > >
> > > > > Is it possible that some datatype has a recursive structure
> > > nonetheless?
> > > > > Something like a linked list or so, which would create a large
> object
> > > > graph?
> > > > >
> > > > > There seems to be a large object graph that the Kryo serializer
> > > > traverses,
> > > > > which causes the StackOverflowError.
> > > > >
> > > > > Greetings,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <
> [hidden email]>
> > > > wrote:
> > > > >
> > > > >> Hi Stephan,
> > > > >>
> > > > >> thanks for answering.
> > > > >>
> > > > >> This not from a recursive object. (it is used in a recursive
> method
> > in
> > > > the
> > > > >> test that is throwing this error, but the the depth is only 2 and
> > > there
> > > > are
> > > > >> no other Flink DataSet operations before execution is triggered so
> > it
> > > is
> > > > >> trivial.)
> > > > >>
> > > > >> Gere is a Gist of the code, and the full output and stack trace:
> > > > >>
> > > > >>
> > > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> > > > >>
> > > > >> The Error begins at line 178 of the "Output" file.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> ________________________________________
> > > > >> From: [hidden email] <[hidden email]> on behalf of
> > > > Stephan
> > > > >> Ewen <[hidden email]>
> > > > >> Sent: Sunday, April 10, 2016 9:39 AM
> > > > >> To: [hidden email]
> > > > >> Subject: Re: Kryo StackOverflowError
> > > > >>
> > > > >> Hi!
> > > > >>
> > > > >> Sorry, I don't fully understand he diagnosis.
> > > > >> You say that this stack overflow is not from a recursive/object
> > type?
> > > > >>
> > > > >> Long graphs of operations in Flink usually do not cause
> > > > >> StackOverflowExceptions, because not the whole graph is
> recursively
> > > > >> processed.
> > > > >>
> > > > >> Can you paste the entire Stack Trace (for example to a gist)?
> > > > >>
> > > > >> Greetings,
> > > > >> Stephan
> > > > >>
> > > > >>
> > > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <
> [hidden email]
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi all,
> > > > >>>
> > > > >>>
> > > > >>> I am working on a matrix multiplication operation for Mahout
> Flink
> > > > >>> Bindings that uses quite a few chained Flink Dataset operations,
> > > > >>>
> > > > >>>
> > > > >>> When testing, I am getting the following error:
> > > > >>>
> > > > >>>
> > > > >>> {...}
> > > > >>>
> > > > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > > > >>> -> FlatMap (FlatMap at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > > > >>> switched to CANCELED
> > > > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > > > >>> -> GroupCombine (GroupCombine at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > > > >>> -> Combine (Reduce at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > > > >>> switched to FAILED
> > > > >>> java.lang.StackOverflowError
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>> {...}
> > > > >>>
> > > > >>>
> > > > >>> I've seen similar issues on the dev@flink list (and other
> places),
> > > > but I
> > > > >>> believe that they were from recursive calls and objects which
> > pointed
> > > > >> back
> > > > >>> to themselves somehow.
> > > > >>>
> > > > >>>
> > > > >>> This is a relatively straightforward method, it just has several
> > > Flink
> > > > >>> operations before execution is triggered.   If I remove some
> > > > operations,
> > > > >>> eg. a reduce, i can get the method to complete on a simple test
> > > however
> > > > >> the
> > > > >>> it will then, of course be numerically incorrect.
> > > > >>>
> > > > >>>
> > > > >>> I am wondering if there is any workaround for this type of
> problem?
> > > > >>>
> > > > >>>
> > > > >>> Thank You,
> > > > >>>
> > > > >>>
> > > > >>> Andy
> > > > >>>
> > > >
> > > >
> > > > --
> > > > ==================================================================
> > > > Hilmi Yildirim, M.Sc.
> > > > Researcher
> > > >
> > > > DFKI GmbH
> > > > Intelligente Analytik für Massendaten
> > > > DFKI Projektbüro Berlin
> > > > Alt-Moabit 91c
> > > > D-10559 Berlin
> > > > Phone: +49 30 23895 1814
> > > >
> > > > E-Mail: [hidden email]
> > > >
> > > > -------------------------------------------------------------
> > > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> > > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> > > >
> > > > Geschaeftsfuehrung:
> > > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> > > > Dr. Walter Olthoff
> > > >
> > > > Vorsitzender des Aufsichtsrats:
> > > > Prof. Dr. h.c. Hans A. Aukes
> > > >
> > > > Amtsgericht Kaiserslautern, HRB 2313
> > > > -------------------------------------------------------------
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: Kryo StackOverflowError

Andrew Palumbo
Do you want me to open a jira/pr for this?

-------- Original message --------
From: Stephan Ewen <[hidden email]>
Date: 04/13/2016 5:16 AM (GMT-05:00)
To: [hidden email]
Subject: Re: Kryo StackOverflowError

+1 to add this to 1.0.2


On Wed, Apr 13, 2016 at 1:57 AM, Andrew Palumbo <[hidden email]> wrote:

>
> Hi,
>
> Great! Do you think that this is something that you'll be enabling in your
> upcoming 1.0.2 release?  We plan on putting out a maintenance Mahout
> Release relatively soon and this would allow us to speed up Matrix
> Multiplication greatly.
>
> Thanks,
>
> Andy
> ________________________________________
> From: Till Rohrmann <[hidden email]>
> Sent: Tuesday, April 12, 2016 11:18 AM
> To: [hidden email]
> Subject: Re: Kryo StackOverflowError
>
> +1
>
> On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <[hidden email]>
> wrote:
>
> > Good catch Till!
> >
> > I just checked it with the Mahout source code and the issues is gone with
> > reference tracking enabled.
> >
> > I would just re-enable it again in Flink.
> >
> > On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <[hidden email]>
> > wrote:
> >
> > > Hey guys,
> > >
> > > I have a suspicion which could be the culprit: Could change the line
> > > KryoSerializer.java:328 to kryo.setReferences(true) and try if the
> error
> > > still remains? We deactivated the reference tracking and now Kryo
> > shouldn’t
> > > be able to resolve cyclic references properly.
> > >
> > > Cheers,
> > > Till
> > > ​
> > >
> > > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <
> > [hidden email]>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I also got this error message when I had private inner classes:
> > > >
> > > > public class A {
> > > >     private class B {
> > > >     }
> > > > }
> > > >
> > > > I was able to fix by making the inner classes public static:
> > > >
> > > > public class A {
> > > >     public static class B {
> > > >     }
> > > > }
> > > >
> > > > When I was trying to debug it seemed this error message can be caused
> > by
> > > > several different things.
> > > >
> > > > Thanks,
> > > >
> > > > Todd
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Hilmi Yildirim [mailto:[hidden email]]
> > > > Sent: Sunday, April 10, 2016 11:36 AM
> > > > To: [hidden email]
> > > > Subject: Re: Kryo StackOverflowError
> > > >
> > > > Hi,
> > > > I also had this problem and solved it.
> > > >
> > > > In my case I had multiple objects which are created via anonymous
> > > classes.
> > > > When I broadcasted these objects, the serializer tried to serialize
> the
> > > > objects and for that it tried to serialize the anonymous classes.
> This
> > > > caused the problem.
> > > >
> > > > For example,
> > > >
> > > > class A{
> > > >
> > > >   def createObjects() : Array[Object]{
> > > >             objects
> > > >          for{
> > > >              object = new Class{
> > > >              ...
> > > >              }
> > > >              objects.add(object)
> > > >          }
> > > >          return objects
> > > >      }
> > > > }
> > > >
> > > > It tried to serialize "new Class". For that it tried to serialize the
> > > > method createObjects(). And then it tried to serialize class A. To
> > > > serialize class A it tried to serialize the method createObjects. Or
> > > > something like that, I do not remember the details. This caused the
> > > > recursion.
> > > >
> > > > BR,
> > > > Hilmi
> > > >
> > > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
> > > > > Hi!
> > > > >
> > > > > Is it possible that some datatype has a recursive structure
> > > nonetheless?
> > > > > Something like a linked list or so, which would create a large
> object
> > > > graph?
> > > > >
> > > > > There seems to be a large object graph that the Kryo serializer
> > > > traverses,
> > > > > which causes the StackOverflowError.
> > > > >
> > > > > Greetings,
> > > > > Stephan
> > > > >
> > > > >
> > > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <
> [hidden email]>
> > > > wrote:
> > > > >
> > > > >> Hi Stephan,
> > > > >>
> > > > >> thanks for answering.
> > > > >>
> > > > >> This not from a recursive object. (it is used in a recursive
> method
> > in
> > > > the
> > > > >> test that is throwing this error, but the the depth is only 2 and
> > > there
> > > > are
> > > > >> no other Flink DataSet operations before execution is triggered so
> > it
> > > is
> > > > >> trivial.)
> > > > >>
> > > > >> Gere is a Gist of the code, and the full output and stack trace:
> > > > >>
> > > > >>
> > > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
> > > > >>
> > > > >> The Error begins at line 178 of the "Output" file.
> > > > >>
> > > > >> Thanks
> > > > >>
> > > > >> ________________________________________
> > > > >> From: [hidden email] <[hidden email]> on behalf of
> > > > Stephan
> > > > >> Ewen <[hidden email]>
> > > > >> Sent: Sunday, April 10, 2016 9:39 AM
> > > > >> To: [hidden email]
> > > > >> Subject: Re: Kryo StackOverflowError
> > > > >>
> > > > >> Hi!
> > > > >>
> > > > >> Sorry, I don't fully understand he diagnosis.
> > > > >> You say that this stack overflow is not from a recursive/object
> > type?
> > > > >>
> > > > >> Long graphs of operations in Flink usually do not cause
> > > > >> StackOverflowExceptions, because not the whole graph is
> recursively
> > > > >> processed.
> > > > >>
> > > > >> Can you paste the entire Stack Trace (for example to a gist)?
> > > > >>
> > > > >> Greetings,
> > > > >> Stephan
> > > > >>
> > > > >>
> > > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <
> [hidden email]
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> Hi all,
> > > > >>>
> > > > >>>
> > > > >>> I am working on a matrix multiplication operation for Mahout
> Flink
> > > > >>> Bindings that uses quite a few chained Flink Dataset operations,
> > > > >>>
> > > > >>>
> > > > >>> When testing, I am getting the following error:
> > > > >>>
> > > > >>>
> > > > >>> {...}
> > > > >>>
> > > > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
> > > > >>> -> FlatMap (FlatMap at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
> > > > >>> switched to CANCELED
> > > > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
> > > > >>> -> GroupCombine (GroupCombine at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
> > > > >>> -> Combine (Reduce at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
> > > > >>> switched to FAILED
> > > > >>> java.lang.StackOverflowError
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
> > > > >>>      at
> > > > >>>
> > > > >>
> > > >
> > >
> >
> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
> > > > >>> {...}
> > > > >>>
> > > > >>>
> > > > >>> I've seen similar issues on the dev@flink list (and other
> places),
> > > > but I
> > > > >>> believe that they were from recursive calls and objects which
> > pointed
> > > > >> back
> > > > >>> to themselves somehow.
> > > > >>>
> > > > >>>
> > > > >>> This is a relatively straightforward method, it just has several
> > > Flink
> > > > >>> operations before execution is triggered.   If I remove some
> > > > operations,
> > > > >>> eg. a reduce, i can get the method to complete on a simple test
> > > however
> > > > >> the
> > > > >>> it will then, of course be numerically incorrect.
> > > > >>>
> > > > >>>
> > > > >>> I am wondering if there is any workaround for this type of
> problem?
> > > > >>>
> > > > >>>
> > > > >>> Thank You,
> > > > >>>
> > > > >>>
> > > > >>> Andy
> > > > >>>
> > > >
> > > >
> > > > --
> > > > ==================================================================
> > > > Hilmi Yildirim, M.Sc.
> > > > Researcher
> > > >
> > > > DFKI GmbH
> > > > Intelligente Analytik für Massendaten
> > > > DFKI Projektbüro Berlin
> > > > Alt-Moabit 91c
> > > > D-10559 Berlin
> > > > Phone: +49 30 23895 1814
> > > >
> > > > E-Mail: [hidden email]
> > > >
> > > > -------------------------------------------------------------
> > > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
> > > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
> > > >
> > > > Geschaeftsfuehrung:
> > > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
> > > > Dr. Walter Olthoff
> > > >
> > > > Vorsitzender des Aufsichtsrats:
> > > > Prof. Dr. h.c. Hans A. Aukes
> > > >
> > > > Amtsgericht Kaiserslautern, HRB 2313
> > > > -------------------------------------------------------------
> > > >
> > > >
> > >
> >
>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: Kryo StackOverflowError

mxm
Thanks for the PR, Andrew! This has been fixed in 1.0.2.

On Thu, Apr 14, 2016 at 7:04 PM, Andrew Palumbo <[hidden email]> wrote:

> Do you want me to open a jira/pr for this?
>
> -------- Original message --------
> From: Stephan Ewen <[hidden email]>
> Date: 04/13/2016 5:16 AM (GMT-05:00)
> To: [hidden email]
> Subject: Re: Kryo StackOverflowError
>
> +1 to add this to 1.0.2
>
>
> On Wed, Apr 13, 2016 at 1:57 AM, Andrew Palumbo <[hidden email]> wrote:
>
>>
>> Hi,
>>
>> Great! Do you think that this is something that you'll be enabling in your
>> upcoming 1.0.2 release?  We plan on putting out a maintenance Mahout
>> Release relatively soon and this would allow us to speed up Matrix
>> Multiplication greatly.
>>
>> Thanks,
>>
>> Andy
>> ________________________________________
>> From: Till Rohrmann <[hidden email]>
>> Sent: Tuesday, April 12, 2016 11:18 AM
>> To: [hidden email]
>> Subject: Re: Kryo StackOverflowError
>>
>> +1
>>
>> On Tue, Apr 12, 2016 at 1:13 PM, Robert Metzger <[hidden email]>
>> wrote:
>>
>> > Good catch Till!
>> >
>> > I just checked it with the Mahout source code and the issues is gone with
>> > reference tracking enabled.
>> >
>> > I would just re-enable it again in Flink.
>> >
>> > On Tue, Apr 12, 2016 at 10:20 AM, Till Rohrmann <[hidden email]>
>> > wrote:
>> >
>> > > Hey guys,
>> > >
>> > > I have a suspicion which could be the culprit: Could change the line
>> > > KryoSerializer.java:328 to kryo.setReferences(true) and try if the
>> error
>> > > still remains? We deactivated the reference tracking and now Kryo
>> > shouldn’t
>> > > be able to resolve cyclic references properly.
>> > >
>> > > Cheers,
>> > > Till
>> > >
>> > >
>> > > On Mon, Apr 11, 2016 at 11:42 PM, Lisonbee, Todd <
>> > [hidden email]>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I also got this error message when I had private inner classes:
>> > > >
>> > > > public class A {
>> > > >     private class B {
>> > > >     }
>> > > > }
>> > > >
>> > > > I was able to fix by making the inner classes public static:
>> > > >
>> > > > public class A {
>> > > >     public static class B {
>> > > >     }
>> > > > }
>> > > >
>> > > > When I was trying to debug it seemed this error message can be caused
>> > by
>> > > > several different things.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Todd
>> > > >
>> > > >
>> > > > -----Original Message-----
>> > > > From: Hilmi Yildirim [mailto:[hidden email]]
>> > > > Sent: Sunday, April 10, 2016 11:36 AM
>> > > > To: [hidden email]
>> > > > Subject: Re: Kryo StackOverflowError
>> > > >
>> > > > Hi,
>> > > > I also had this problem and solved it.
>> > > >
>> > > > In my case I had multiple objects which are created via anonymous
>> > > classes.
>> > > > When I broadcasted these objects, the serializer tried to serialize
>> the
>> > > > objects and for that it tried to serialize the anonymous classes.
>> This
>> > > > caused the problem.
>> > > >
>> > > > For example,
>> > > >
>> > > > class A{
>> > > >
>> > > >   def createObjects() : Array[Object]{
>> > > >             objects
>> > > >          for{
>> > > >              object = new Class{
>> > > >              ...
>> > > >              }
>> > > >              objects.add(object)
>> > > >          }
>> > > >          return objects
>> > > >      }
>> > > > }
>> > > >
>> > > > It tried to serialize "new Class". For that it tried to serialize the
>> > > > method createObjects(). And then it tried to serialize class A. To
>> > > > serialize class A it tried to serialize the method createObjects. Or
>> > > > something like that, I do not remember the details. This caused the
>> > > > recursion.
>> > > >
>> > > > BR,
>> > > > Hilmi
>> > > >
>> > > > Am 10.04.2016 um 19:18 schrieb Stephan Ewen:
>> > > > > Hi!
>> > > > >
>> > > > > Is it possible that some datatype has a recursive structure
>> > > nonetheless?
>> > > > > Something like a linked list or so, which would create a large
>> object
>> > > > graph?
>> > > > >
>> > > > > There seems to be a large object graph that the Kryo serializer
>> > > > traverses,
>> > > > > which causes the StackOverflowError.
>> > > > >
>> > > > > Greetings,
>> > > > > Stephan
>> > > > >
>> > > > >
>> > > > > On Sun, Apr 10, 2016 at 6:24 PM, Andrew Palumbo <
>> [hidden email]>
>> > > > wrote:
>> > > > >
>> > > > >> Hi Stephan,
>> > > > >>
>> > > > >> thanks for answering.
>> > > > >>
>> > > > >> This not from a recursive object. (it is used in a recursive
>> method
>> > in
>> > > > the
>> > > > >> test that is throwing this error, but the the depth is only 2 and
>> > > there
>> > > > are
>> > > > >> no other Flink DataSet operations before execution is triggered so
>> > it
>> > > is
>> > > > >> trivial.)
>> > > > >>
>> > > > >> Gere is a Gist of the code, and the full output and stack trace:
>> > > > >>
>> > > > >>
>> > > https://gist.github.com/andrewpalumbo/40c7422a5187a24cd03d7d81feb2a419
>> > > > >>
>> > > > >> The Error begins at line 178 of the "Output" file.
>> > > > >>
>> > > > >> Thanks
>> > > > >>
>> > > > >> ________________________________________
>> > > > >> From: [hidden email] <[hidden email]> on behalf of
>> > > > Stephan
>> > > > >> Ewen <[hidden email]>
>> > > > >> Sent: Sunday, April 10, 2016 9:39 AM
>> > > > >> To: [hidden email]
>> > > > >> Subject: Re: Kryo StackOverflowError
>> > > > >>
>> > > > >> Hi!
>> > > > >>
>> > > > >> Sorry, I don't fully understand he diagnosis.
>> > > > >> You say that this stack overflow is not from a recursive/object
>> > type?
>> > > > >>
>> > > > >> Long graphs of operations in Flink usually do not cause
>> > > > >> StackOverflowExceptions, because not the whole graph is
>> recursively
>> > > > >> processed.
>> > > > >>
>> > > > >> Can you paste the entire Stack Trace (for example to a gist)?
>> > > > >>
>> > > > >> Greetings,
>> > > > >> Stephan
>> > > > >>
>> > > > >>
>> > > > >> On Sun, Apr 10, 2016 at 4:42 AM, Andrew Palumbo <
>> [hidden email]
>> > >
>> > > > >> wrote:
>> > > > >>
>> > > > >>> Hi all,
>> > > > >>>
>> > > > >>>
>> > > > >>> I am working on a matrix multiplication operation for Mahout
>> Flink
>> > > > >>> Bindings that uses quite a few chained Flink Dataset operations,
>> > > > >>>
>> > > > >>>
>> > > > >>> When testing, I am getting the following error:
>> > > > >>>
>> > > > >>>
>> > > > >>> {...}
>> > > > >>>
>> > > > >>> 04/09/2016 22:30:35    CHAIN Reduce (Reduce at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))
>> > > > >>> -> FlatMap (FlatMap at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.drm.BlockifiedFlinkDrm.asRowWise(FlinkDrm.scala:93))(1/1)
>> > > > >>> switched to CANCELED
>> > > > >>> 04/09/2016 22:30:35    CHAIN Partition -> Map (Map at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.pairwiseApply(FlinkOpABt.scala:240))
>> > > > >>> -> GroupCombine (GroupCombine at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:129))
>> > > > >>> -> Combine (Reduce at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> org.apache.mahout.flinkbindings.blas.FlinkOpABt$.abt_nograph(FlinkOpABt.scala:147))(3/3)
>> > > > >>> switched to FAILED
>> > > > >>> java.lang.StackOverflowError
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:48)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>> > > > >>>      at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
>> > > > >>>      at
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
>> > > > >>> {...}
>> > > > >>>
>> > > > >>>
>> > > > >>> I've seen similar issues on the dev@flink list (and other
>> places),
>> > > > but I
>> > > > >>> believe that they were from recursive calls and objects which
>> > pointed
>> > > > >> back
>> > > > >>> to themselves somehow.
>> > > > >>>
>> > > > >>>
>> > > > >>> This is a relatively straightforward method, it just has several
>> > > Flink
>> > > > >>> operations before execution is triggered.   If I remove some
>> > > > operations,
>> > > > >>> eg. a reduce, i can get the method to complete on a simple test
>> > > however
>> > > > >> the
>> > > > >>> it will then, of course be numerically incorrect.
>> > > > >>>
>> > > > >>>
>> > > > >>> I am wondering if there is any workaround for this type of
>> problem?
>> > > > >>>
>> > > > >>>
>> > > > >>> Thank You,
>> > > > >>>
>> > > > >>>
>> > > > >>> Andy
>> > > > >>>
>> > > >
>> > > >
>> > > > --
>> > > > ==================================================================
>> > > > Hilmi Yildirim, M.Sc.
>> > > > Researcher
>> > > >
>> > > > DFKI GmbH
>> > > > Intelligente Analytik für Massendaten
>> > > > DFKI Projektbüro Berlin
>> > > > Alt-Moabit 91c
>> > > > D-10559 Berlin
>> > > > Phone: +49 30 23895 1814
>> > > >
>> > > > E-Mail: [hidden email]
>> > > >
>> > > > -------------------------------------------------------------
>> > > > Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
>> > > > Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
>> > > >
>> > > > Geschaeftsfuehrung:
>> > > > Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
>> > > > Dr. Walter Olthoff
>> > > >
>> > > > Vorsitzender des Aufsichtsrats:
>> > > > Prof. Dr. h.c. Hans A. Aukes
>> > > >
>> > > > Amtsgericht Kaiserslautern, HRB 2313
>> > > > -------------------------------------------------------------
>> > > >
>> > > >
>> > >
>> >
>>