NullPointerException in DeltaIteration when no ForwardedFileds annotation

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

NullPointerException in DeltaIteration when no ForwardedFileds annotation

Vasiliki Kalavri
Hello to my squirrels,

I've been getting a NullPointerException for a DeltaIteration program I'm
trying to implement and I could really use your help :-)
It seems that some of the input Tuples of the Join operator that I'm using
to create the next workset / solution set delta are null.
It also seems that adding ForwardedFields annotations solves the issue.

I managed to reproduce the behavior using the ConnectedComponents example,
by removing the "@ForwardedFieldsFirst("*")" annotation from
the ComponentIdFilter join.
The exception message is the following:

Caused by: java.lang.NullPointerException
at
org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
at
org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
at
org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
at
org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
at
org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
at
org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
at
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
at
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
at java.lang.Thread.run(Thread.java:745)

I get this error locally with any sufficiently big dataset (~10000 nodes).
When the annotation is in place, it works without problem.
I also generated the optimizer plans for the two cases:
- with annotation (working):
https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
- without annotation (failing):
https://gist.github.com/vasia/086faa45b980bf7f4c09

After visualizing the plans, the main difference I see is that in the
working case, the next workset node and the solution set delta nodes are
merged, while in the failing case they are separate.

Shouldn't this work with and without annotation (but be more efficient with
the annotation in place)? Or am I missing something here?

Thanks in advance for any help :))

Cheers,
- Vasia.
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

Fabian Hueske-2
That looks pretty much like a bug.

As you said, fwd fields annotations are optional and may improve the
performance of a program, but never change its semantics (if set correctly).

I'll have a look at it later.
Would be great if you could provide some data to reproduce the bug.
On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <[hidden email]>
wrote:

> Hello to my squirrels,
>
> I've been getting a NullPointerException for a DeltaIteration program I'm
> trying to implement and I could really use your help :-)
> It seems that some of the input Tuples of the Join operator that I'm using
> to create the next workset / solution set delta are null.
> It also seems that adding ForwardedFields annotations solves the issue.
>
> I managed to reproduce the behavior using the ConnectedComponents example,
> by removing the "@ForwardedFieldsFirst("*")" annotation from
> the ComponentIdFilter join.
> The exception message is the following:
>
> Caused by: java.lang.NullPointerException
> at
>
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> at
>
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> at
>
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> at
>
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> at
>
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> at
>
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> at
>
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> at
>
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> at java.lang.Thread.run(Thread.java:745)
>
> I get this error locally with any sufficiently big dataset (~10000 nodes).
> When the annotation is in place, it works without problem.
> I also generated the optimizer plans for the two cases:
> - with annotation (working):
> https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> - without annotation (failing):
> https://gist.github.com/vasia/086faa45b980bf7f4c09
>
> After visualizing the plans, the main difference I see is that in the
> working case, the next workset node and the solution set delta nodes are
> merged, while in the failing case they are separate.
>
> Shouldn't this work with and without annotation (but be more efficient with
> the annotation in place)? Or am I missing something here?
>
> Thanks in advance for any help :))
>
> Cheers,
> - Vasia.
>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

Vasiliki Kalavri
Hi Fabian,

I am using the dblp co-authorship dataset from SNAP:
http://snap.stanford.edu/data/com-DBLP.html
I also pushed my slightly modified version of ConnectedComponents, here:
https://github.com/vasia/flink/tree/cc-test. It basically generates the
vertex dataset from the edges, so that you don't need to create it
separately.
The annotation that creates the error is in line #172.

Thanks a lot :))

-Vasia.


On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote:

> That looks pretty much like a bug.
>
> As you said, fwd fields annotations are optional and may improve the
> performance of a program, but never change its semantics (if set
> correctly).
>
> I'll have a look at it later.
> Would be great if you could provide some data to reproduce the bug.
> On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <[hidden email]>
> wrote:
>
> > Hello to my squirrels,
> >
> > I've been getting a NullPointerException for a DeltaIteration program I'm
> > trying to implement and I could really use your help :-)
> > It seems that some of the input Tuples of the Join operator that I'm
> using
> > to create the next workset / solution set delta are null.
> > It also seems that adding ForwardedFields annotations solves the issue.
> >
> > I managed to reproduce the behavior using the ConnectedComponents
> example,
> > by removing the "@ForwardedFieldsFirst("*")" annotation from
> > the ComponentIdFilter join.
> > The exception message is the following:
> >
> > Caused by: java.lang.NullPointerException
> > at
> >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> > at
> >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> > at
> >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> > at
> >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> > at
> >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> > at
> >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > at
> >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> > at
> >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > I get this error locally with any sufficiently big dataset (~10000
> nodes).
> > When the annotation is in place, it works without problem.
> > I also generated the optimizer plans for the two cases:
> > - with annotation (working):
> > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> > - without annotation (failing):
> > https://gist.github.com/vasia/086faa45b980bf7f4c09
> >
> > After visualizing the plans, the main difference I see is that in the
> > working case, the next workset node and the solution set delta nodes are
> > merged, while in the failing case they are separate.
> >
> > Shouldn't this work with and without annotation (but be more efficient
> with
> > the annotation in place)? Or am I missing something here?
> >
> > Thanks in advance for any help :))
> >
> > Cheers,
> > - Vasia.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

Fabian Hueske-2
Thanks for the nice setup!
I could easily reproduce the exception you are facing.
But that's the only good news so far :-(

I checked the plans and both are valid and should compute the correct
result for the program.
The split-of solution set delta is required because the it needs to be
repartitioned (without the annotation, the optimizer does not know that it
is in fact already correctly partitioned). One thing that made me a bit
suspicious is that the solution set delta partitioning is marked with a
Pipeline-Breaker. The pipeline breaker shouldn't make a semantic
difference, but I am not sure if it is really required and also that part
of the codebase was recently worked on.

So, a closer look and more debugging is necessary to figure out what not
working correctly here...


2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <[hidden email]>:

> Hi Fabian,
>
> I am using the dblp co-authorship dataset from SNAP:
> http://snap.stanford.edu/data/com-DBLP.html
> I also pushed my slightly modified version of ConnectedComponents, here:
> https://github.com/vasia/flink/tree/cc-test. It basically generates the
> vertex dataset from the edges, so that you don't need to create it
> separately.
> The annotation that creates the error is in line #172.
>
> Thanks a lot :))
>
> -Vasia.
>
>
> On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote:
>
> > That looks pretty much like a bug.
> >
> > As you said, fwd fields annotations are optional and may improve the
> > performance of a program, but never change its semantics (if set
> > correctly).
> >
> > I'll have a look at it later.
> > Would be great if you could provide some data to reproduce the bug.
> > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <[hidden email]>
> > wrote:
> >
> > > Hello to my squirrels,
> > >
> > > I've been getting a NullPointerException for a DeltaIteration program
> I'm
> > > trying to implement and I could really use your help :-)
> > > It seems that some of the input Tuples of the Join operator that I'm
> > using
> > > to create the next workset / solution set delta are null.
> > > It also seems that adding ForwardedFields annotations solves the issue.
> > >
> > > I managed to reproduce the behavior using the ConnectedComponents
> > example,
> > > by removing the "@ForwardedFieldsFirst("*")" annotation from
> > > the ComponentIdFilter join.
> > > The exception message is the following:
> > >
> > > Caused by: java.lang.NullPointerException
> > > at
> > >
> > >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> > > at
> > >
> > >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> > > at
> > >
> > >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> > > at java.lang.Thread.run(Thread.java:745)
> > >
> > > I get this error locally with any sufficiently big dataset (~10000
> > nodes).
> > > When the annotation is in place, it works without problem.
> > > I also generated the optimizer plans for the two cases:
> > > - with annotation (working):
> > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> > > - without annotation (failing):
> > > https://gist.github.com/vasia/086faa45b980bf7f4c09
> > >
> > > After visualizing the plans, the main difference I see is that in the
> > > working case, the next workset node and the solution set delta nodes
> are
> > > merged, while in the failing case they are separate.
> > >
> > > Shouldn't this work with and without annotation (but be more efficient
> > with
> > > the annotation in place)? Or am I missing something here?
> > >
> > > Thanks in advance for any help :))
> > >
> > > Cheers,
> > > - Vasia.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

Vasiliki Kalavri
Hi Fabian,

thanks for looking into this.
Let me know if there's anything I can do to help!

Cheers,
V.

On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote:

> Thanks for the nice setup!
> I could easily reproduce the exception you are facing.
> But that's the only good news so far :-(
>
> I checked the plans and both are valid and should compute the correct
> result for the program.
> The split-of solution set delta is required because the it needs to be
> repartitioned (without the annotation, the optimizer does not know that it
> is in fact already correctly partitioned). One thing that made me a bit
> suspicious is that the solution set delta partitioning is marked with a
> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic
> difference, but I am not sure if it is really required and also that part
> of the codebase was recently worked on.
>
> So, a closer look and more debugging is necessary to figure out what not
> working correctly here...
>
>
> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <[hidden email]>:
>
> > Hi Fabian,
> >
> > I am using the dblp co-authorship dataset from SNAP:
> > http://snap.stanford.edu/data/com-DBLP.html
> > I also pushed my slightly modified version of ConnectedComponents, here:
> > https://github.com/vasia/flink/tree/cc-test. It basically generates the
> > vertex dataset from the edges, so that you don't need to create it
> > separately.
> > The annotation that creates the error is in line #172.
> >
> > Thanks a lot :))
> >
> > -Vasia.
> >
> >
> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote:
> >
> > > That looks pretty much like a bug.
> > >
> > > As you said, fwd fields annotations are optional and may improve the
> > > performance of a program, but never change its semantics (if set
> > > correctly).
> > >
> > > I'll have a look at it later.
> > > Would be great if you could provide some data to reproduce the bug.
> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <[hidden email]
> >
> > > wrote:
> > >
> > > > Hello to my squirrels,
> > > >
> > > > I've been getting a NullPointerException for a DeltaIteration program
> > I'm
> > > > trying to implement and I could really use your help :-)
> > > > It seems that some of the input Tuples of the Join operator that I'm
> > > using
> > > > to create the next workset / solution set delta are null.
> > > > It also seems that adding ForwardedFields annotations solves the
> issue.
> > > >
> > > > I managed to reproduce the behavior using the ConnectedComponents
> > > example,
> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from
> > > > the ComponentIdFilter join.
> > > > The exception message is the following:
> > > >
> > > > Caused by: java.lang.NullPointerException
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> > > > at
> > > >
> > > >
> > >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> > > > at java.lang.Thread.run(Thread.java:745)
> > > >
> > > > I get this error locally with any sufficiently big dataset (~10000
> > > nodes).
> > > > When the annotation is in place, it works without problem.
> > > > I also generated the optimizer plans for the two cases:
> > > > - with annotation (working):
> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> > > > - without annotation (failing):
> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09
> > > >
> > > > After visualizing the plans, the main difference I see is that in the
> > > > working case, the next workset node and the solution set delta nodes
> > are
> > > > merged, while in the failing case they are separate.
> > > >
> > > > Shouldn't this work with and without annotation (but be more
> efficient
> > > with
> > > > the annotation in place)? Or am I missing something here?
> > > >
> > > > Thanks in advance for any help :))
> > > >
> > > > Cheers,
> > > > - Vasia.
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

Vasiliki Kalavri
Hi,

I actually ran into this problem again with a different algorithm :/
Same exception and it looks like getMatchFor() in CompactingHashTable
returns a null record.
Not sure why or why the annotation prevents this from happening. Any
insight is highly welcome :-)

Shall I open an issue so that we don't forget about this?

-Vasia.


On 4 April 2015 at 14:44, Vasiliki Kalavri <[hidden email]>
wrote:

> Hi Fabian,
>
> thanks for looking into this.
> Let me know if there's anything I can do to help!
>
> Cheers,
> V.
>
> On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote:
>
>> Thanks for the nice setup!
>> I could easily reproduce the exception you are facing.
>> But that's the only good news so far :-(
>>
>> I checked the plans and both are valid and should compute the correct
>> result for the program.
>> The split-of solution set delta is required because the it needs to be
>> repartitioned (without the annotation, the optimizer does not know that it
>> is in fact already correctly partitioned). One thing that made me a bit
>> suspicious is that the solution set delta partitioning is marked with a
>> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic
>> difference, but I am not sure if it is really required and also that part
>> of the codebase was recently worked on.
>>
>> So, a closer look and more debugging is necessary to figure out what not
>> working correctly here...
>>
>>
>> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <[hidden email]>:
>>
>> > Hi Fabian,
>> >
>> > I am using the dblp co-authorship dataset from SNAP:
>> > http://snap.stanford.edu/data/com-DBLP.html
>> > I also pushed my slightly modified version of ConnectedComponents, here:
>> > https://github.com/vasia/flink/tree/cc-test. It basically generates the
>> > vertex dataset from the edges, so that you don't need to create it
>> > separately.
>> > The annotation that creates the error is in line #172.
>> >
>> > Thanks a lot :))
>> >
>> > -Vasia.
>> >
>> >
>> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote:
>> >
>> > > That looks pretty much like a bug.
>> > >
>> > > As you said, fwd fields annotations are optional and may improve the
>> > > performance of a program, but never change its semantics (if set
>> > > correctly).
>> > >
>> > > I'll have a look at it later.
>> > > Would be great if you could provide some data to reproduce the bug.
>> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <
>> [hidden email]>
>> > > wrote:
>> > >
>> > > > Hello to my squirrels,
>> > > >
>> > > > I've been getting a NullPointerException for a DeltaIteration
>> program
>> > I'm
>> > > > trying to implement and I could really use your help :-)
>> > > > It seems that some of the input Tuples of the Join operator that I'm
>> > > using
>> > > > to create the next workset / solution set delta are null.
>> > > > It also seems that adding ForwardedFields annotations solves the
>> issue.
>> > > >
>> > > > I managed to reproduce the behavior using the ConnectedComponents
>> > > example,
>> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from
>> > > > the ComponentIdFilter join.
>> > > > The exception message is the following:
>> > > >
>> > > > Caused by: java.lang.NullPointerException
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>> > > > at
>> > > >
>> > > >
>> > >
>> >
>> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
>> > > > at java.lang.Thread.run(Thread.java:745)
>> > > >
>> > > > I get this error locally with any sufficiently big dataset (~10000
>> > > nodes).
>> > > > When the annotation is in place, it works without problem.
>> > > > I also generated the optimizer plans for the two cases:
>> > > > - with annotation (working):
>> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
>> > > > - without annotation (failing):
>> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09
>> > > >
>> > > > After visualizing the plans, the main difference I see is that in
>> the
>> > > > working case, the next workset node and the solution set delta nodes
>> > are
>> > > > merged, while in the failing case they are separate.
>> > > >
>> > > > Shouldn't this work with and without annotation (but be more
>> efficient
>> > > with
>> > > > the annotation in place)? Or am I missing something here?
>> > > >
>> > > > Thanks in advance for any help :))
>> > > >
>> > > > Cheers,
>> > > > - Vasia.
>> > > >
>> > >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

Stephan Ewen
I think Fabian looked into this a while back...

@Fabian, do you have any insights what causes this?


On Sat, Apr 25, 2015 at 7:46 PM, Vasiliki Kalavri <[hidden email]
> wrote:

> Hi,
>
> I actually ran into this problem again with a different algorithm :/
> Same exception and it looks like getMatchFor() in CompactingHashTable
> returns a null record.
> Not sure why or why the annotation prevents this from happening. Any
> insight is highly welcome :-)
>
> Shall I open an issue so that we don't forget about this?
>
> -Vasia.
>
>
> On 4 April 2015 at 14:44, Vasiliki Kalavri <[hidden email]>
> wrote:
>
> > Hi Fabian,
> >
> > thanks for looking into this.
> > Let me know if there's anything I can do to help!
> >
> > Cheers,
> > V.
> >
> > On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote:
> >
> >> Thanks for the nice setup!
> >> I could easily reproduce the exception you are facing.
> >> But that's the only good news so far :-(
> >>
> >> I checked the plans and both are valid and should compute the correct
> >> result for the program.
> >> The split-of solution set delta is required because the it needs to be
> >> repartitioned (without the annotation, the optimizer does not know that
> it
> >> is in fact already correctly partitioned). One thing that made me a bit
> >> suspicious is that the solution set delta partitioning is marked with a
> >> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic
> >> difference, but I am not sure if it is really required and also that
> part
> >> of the codebase was recently worked on.
> >>
> >> So, a closer look and more debugging is necessary to figure out what not
> >> working correctly here...
> >>
> >>
> >> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <[hidden email]
> >:
> >>
> >> > Hi Fabian,
> >> >
> >> > I am using the dblp co-authorship dataset from SNAP:
> >> > http://snap.stanford.edu/data/com-DBLP.html
> >> > I also pushed my slightly modified version of ConnectedComponents,
> here:
> >> > https://github.com/vasia/flink/tree/cc-test. It basically generates
> the
> >> > vertex dataset from the edges, so that you don't need to create it
> >> > separately.
> >> > The annotation that creates the error is in line #172.
> >> >
> >> > Thanks a lot :))
> >> >
> >> > -Vasia.
> >> >
> >> >
> >> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote:
> >> >
> >> > > That looks pretty much like a bug.
> >> > >
> >> > > As you said, fwd fields annotations are optional and may improve the
> >> > > performance of a program, but never change its semantics (if set
> >> > > correctly).
> >> > >
> >> > > I'll have a look at it later.
> >> > > Would be great if you could provide some data to reproduce the bug.
> >> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <
> >> [hidden email]>
> >> > > wrote:
> >> > >
> >> > > > Hello to my squirrels,
> >> > > >
> >> > > > I've been getting a NullPointerException for a DeltaIteration
> >> program
> >> > I'm
> >> > > > trying to implement and I could really use your help :-)
> >> > > > It seems that some of the input Tuples of the Join operator that
> I'm
> >> > > using
> >> > > > to create the next workset / solution set delta are null.
> >> > > > It also seems that adding ForwardedFields annotations solves the
> >> issue.
> >> > > >
> >> > > > I managed to reproduce the behavior using the ConnectedComponents
> >> > > example,
> >> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from
> >> > > > the ComponentIdFilter join.
> >> > > > The exception message is the following:
> >> > > >
> >> > > > Caused by: java.lang.NullPointerException
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> >> > > > at
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> >> > > > at java.lang.Thread.run(Thread.java:745)
> >> > > >
> >> > > > I get this error locally with any sufficiently big dataset (~10000
> >> > > nodes).
> >> > > > When the annotation is in place, it works without problem.
> >> > > > I also generated the optimizer plans for the two cases:
> >> > > > - with annotation (working):
> >> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> >> > > > - without annotation (failing):
> >> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09
> >> > > >
> >> > > > After visualizing the plans, the main difference I see is that in
> >> the
> >> > > > working case, the next workset node and the solution set delta
> nodes
> >> > are
> >> > > > merged, while in the failing case they are separate.
> >> > > >
> >> > > > Shouldn't this work with and without annotation (but be more
> >> efficient
> >> > > with
> >> > > > the annotation in place)? Or am I missing something here?
> >> > > >
> >> > > > Thanks in advance for any help :))
> >> > > >
> >> > > > Cheers,
> >> > > > - Vasia.
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

Fabian Hueske-2
No, haven't looked at it since my last mail :-(
Both plans (with and without forward fields annotation) look good except
for the suspicious pipeline breaker.

@Vasia Could you open a JIRA and assign it to me?
I'll have a closer look and try to figure out what's going on.


2015-04-27 10:34 GMT+02:00 Stephan Ewen <[hidden email]>:

> I think Fabian looked into this a while back...
>
> @Fabian, do you have any insights what causes this?
>
>
> On Sat, Apr 25, 2015 at 7:46 PM, Vasiliki Kalavri <
> [hidden email]
> > wrote:
>
> > Hi,
> >
> > I actually ran into this problem again with a different algorithm :/
> > Same exception and it looks like getMatchFor() in CompactingHashTable
> > returns a null record.
> > Not sure why or why the annotation prevents this from happening. Any
> > insight is highly welcome :-)
> >
> > Shall I open an issue so that we don't forget about this?
> >
> > -Vasia.
> >
> >
> > On 4 April 2015 at 14:44, Vasiliki Kalavri <[hidden email]>
> > wrote:
> >
> > > Hi Fabian,
> > >
> > > thanks for looking into this.
> > > Let me know if there's anything I can do to help!
> > >
> > > Cheers,
> > > V.
> > >
> > > On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote:
> > >
> > >> Thanks for the nice setup!
> > >> I could easily reproduce the exception you are facing.
> > >> But that's the only good news so far :-(
> > >>
> > >> I checked the plans and both are valid and should compute the correct
> > >> result for the program.
> > >> The split-of solution set delta is required because the it needs to be
> > >> repartitioned (without the annotation, the optimizer does not know
> that
> > it
> > >> is in fact already correctly partitioned). One thing that made me a
> bit
> > >> suspicious is that the solution set delta partitioning is marked with
> a
> > >> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic
> > >> difference, but I am not sure if it is really required and also that
> > part
> > >> of the codebase was recently worked on.
> > >>
> > >> So, a closer look and more debugging is necessary to figure out what
> not
> > >> working correctly here...
> > >>
> > >>
> > >> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <
> [hidden email]
> > >:
> > >>
> > >> > Hi Fabian,
> > >> >
> > >> > I am using the dblp co-authorship dataset from SNAP:
> > >> > http://snap.stanford.edu/data/com-DBLP.html
> > >> > I also pushed my slightly modified version of ConnectedComponents,
> > here:
> > >> > https://github.com/vasia/flink/tree/cc-test. It basically generates
> > the
> > >> > vertex dataset from the edges, so that you don't need to create it
> > >> > separately.
> > >> > The annotation that creates the error is in line #172.
> > >> >
> > >> > Thanks a lot :))
> > >> >
> > >> > -Vasia.
> > >> >
> > >> >
> > >> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote:
> > >> >
> > >> > > That looks pretty much like a bug.
> > >> > >
> > >> > > As you said, fwd fields annotations are optional and may improve
> the
> > >> > > performance of a program, but never change its semantics (if set
> > >> > > correctly).
> > >> > >
> > >> > > I'll have a look at it later.
> > >> > > Would be great if you could provide some data to reproduce the
> bug.
> > >> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <
> > >> [hidden email]>
> > >> > > wrote:
> > >> > >
> > >> > > > Hello to my squirrels,
> > >> > > >
> > >> > > > I've been getting a NullPointerException for a DeltaIteration
> > >> program
> > >> > I'm
> > >> > > > trying to implement and I could really use your help :-)
> > >> > > > It seems that some of the input Tuples of the Join operator that
> > I'm
> > >> > > using
> > >> > > > to create the next workset / solution set delta are null.
> > >> > > > It also seems that adding ForwardedFields annotations solves the
> > >> issue.
> > >> > > >
> > >> > > > I managed to reproduce the behavior using the
> ConnectedComponents
> > >> > > example,
> > >> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from
> > >> > > > the ComponentIdFilter join.
> > >> > > > The exception message is the following:
> > >> > > >
> > >> > > > Caused by: java.lang.NullPointerException
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> > >> > > > at
> > >> > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> > >> > > > at java.lang.Thread.run(Thread.java:745)
> > >> > > >
> > >> > > > I get this error locally with any sufficiently big dataset
> (~10000
> > >> > > nodes).
> > >> > > > When the annotation is in place, it works without problem.
> > >> > > > I also generated the optimizer plans for the two cases:
> > >> > > > - with annotation (working):
> > >> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> > >> > > > - without annotation (failing):
> > >> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09
> > >> > > >
> > >> > > > After visualizing the plans, the main difference I see is that
> in
> > >> the
> > >> > > > working case, the next workset node and the solution set delta
> > nodes
> > >> > are
> > >> > > > merged, while in the failing case they are separate.
> > >> > > >
> > >> > > > Shouldn't this work with and without annotation (but be more
> > >> efficient
> > >> > > with
> > >> > > > the annotation in place)? Or am I missing something here?
> > >> > > >
> > >> > > > Thanks in advance for any help :))
> > >> > > >
> > >> > > > Cheers,
> > >> > > > - Vasia.
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: NullPointerException in DeltaIteration when no ForwardedFileds annotation

Vasiliki Kalavri
Will do, thanks!

On 27 April 2015 at 11:06, Fabian Hueske <[hidden email]> wrote:

> No, haven't looked at it since my last mail :-(
> Both plans (with and without forward fields annotation) look good except
> for the suspicious pipeline breaker.
>
> @Vasia Could you open a JIRA and assign it to me?
> I'll have a closer look and try to figure out what's going on.
>
>
> 2015-04-27 10:34 GMT+02:00 Stephan Ewen <[hidden email]>:
>
> > I think Fabian looked into this a while back...
> >
> > @Fabian, do you have any insights what causes this?
> >
> >
> > On Sat, Apr 25, 2015 at 7:46 PM, Vasiliki Kalavri <
> > [hidden email]
> > > wrote:
> >
> > > Hi,
> > >
> > > I actually ran into this problem again with a different algorithm :/
> > > Same exception and it looks like getMatchFor() in CompactingHashTable
> > > returns a null record.
> > > Not sure why or why the annotation prevents this from happening. Any
> > > insight is highly welcome :-)
> > >
> > > Shall I open an issue so that we don't forget about this?
> > >
> > > -Vasia.
> > >
> > >
> > > On 4 April 2015 at 14:44, Vasiliki Kalavri <[hidden email]>
> > > wrote:
> > >
> > > > Hi Fabian,
> > > >
> > > > thanks for looking into this.
> > > > Let me know if there's anything I can do to help!
> > > >
> > > > Cheers,
> > > > V.
> > > >
> > > > On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote:
> > > >
> > > >> Thanks for the nice setup!
> > > >> I could easily reproduce the exception you are facing.
> > > >> But that's the only good news so far :-(
> > > >>
> > > >> I checked the plans and both are valid and should compute the
> correct
> > > >> result for the program.
> > > >> The split-of solution set delta is required because the it needs to
> be
> > > >> repartitioned (without the annotation, the optimizer does not know
> > that
> > > it
> > > >> is in fact already correctly partitioned). One thing that made me a
> > bit
> > > >> suspicious is that the solution set delta partitioning is marked
> with
> > a
> > > >> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic
> > > >> difference, but I am not sure if it is really required and also that
> > > part
> > > >> of the codebase was recently worked on.
> > > >>
> > > >> So, a closer look and more debugging is necessary to figure out what
> > not
> > > >> working correctly here...
> > > >>
> > > >>
> > > >> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <
> > [hidden email]
> > > >:
> > > >>
> > > >> > Hi Fabian,
> > > >> >
> > > >> > I am using the dblp co-authorship dataset from SNAP:
> > > >> > http://snap.stanford.edu/data/com-DBLP.html
> > > >> > I also pushed my slightly modified version of ConnectedComponents,
> > > here:
> > > >> > https://github.com/vasia/flink/tree/cc-test. It basically
> generates
> > > the
> > > >> > vertex dataset from the edges, so that you don't need to create it
> > > >> > separately.
> > > >> > The annotation that creates the error is in line #172.
> > > >> >
> > > >> > Thanks a lot :))
> > > >> >
> > > >> > -Vasia.
> > > >> >
> > > >> >
> > > >> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]>
> wrote:
> > > >> >
> > > >> > > That looks pretty much like a bug.
> > > >> > >
> > > >> > > As you said, fwd fields annotations are optional and may improve
> > the
> > > >> > > performance of a program, but never change its semantics (if set
> > > >> > > correctly).
> > > >> > >
> > > >> > > I'll have a look at it later.
> > > >> > > Would be great if you could provide some data to reproduce the
> > bug.
> > > >> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <
> > > >> [hidden email]>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > Hello to my squirrels,
> > > >> > > >
> > > >> > > > I've been getting a NullPointerException for a DeltaIteration
> > > >> program
> > > >> > I'm
> > > >> > > > trying to implement and I could really use your help :-)
> > > >> > > > It seems that some of the input Tuples of the Join operator
> that
> > > I'm
> > > >> > > using
> > > >> > > > to create the next workset / solution set delta are null.
> > > >> > > > It also seems that adding ForwardedFields annotations solves
> the
> > > >> issue.
> > > >> > > >
> > > >> > > > I managed to reproduce the behavior using the
> > ConnectedComponents
> > > >> > > example,
> > > >> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from
> > > >> > > > the ComponentIdFilter join.
> > > >> > > > The exception message is the following:
> > > >> > > >
> > > >> > > > Caused by: java.lang.NullPointerException
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
> > > >> > > > at
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
> > > >> > > > at java.lang.Thread.run(Thread.java:745)
> > > >> > > >
> > > >> > > > I get this error locally with any sufficiently big dataset
> > (~10000
> > > >> > > nodes).
> > > >> > > > When the annotation is in place, it works without problem.
> > > >> > > > I also generated the optimizer plans for the two cases:
> > > >> > > > - with annotation (working):
> > > >> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b
> > > >> > > > - without annotation (failing):
> > > >> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09
> > > >> > > >
> > > >> > > > After visualizing the plans, the main difference I see is that
> > in
> > > >> the
> > > >> > > > working case, the next workset node and the solution set delta
> > > nodes
> > > >> > are
> > > >> > > > merged, while in the failing case they are separate.
> > > >> > > >
> > > >> > > > Shouldn't this work with and without annotation (but be more
> > > >> efficient
> > > >> > > with
> > > >> > > > the annotation in place)? Or am I missing something here?
> > > >> > > >
> > > >> > > > Thanks in advance for any help :))
> > > >> > > >
> > > >> > > > Cheers,
> > > >> > > > - Vasia.
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>