Hello to my squirrels,
I've been getting a NullPointerException for a DeltaIteration program I'm trying to implement and I could really use your help :-) It seems that some of the input Tuples of the Join operator that I'm using to create the next workset / solution set delta are null. It also seems that adding ForwardedFields annotations solves the issue. I managed to reproduce the behavior using the ConnectedComponents example, by removing the "@ForwardedFieldsFirst("*")" annotation from the ComponentIdFilter join. The exception message is the following: Caused by: java.lang.NullPointerException at org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) at org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) at org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) at org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) at java.lang.Thread.run(Thread.java:745) I get this error locally with any sufficiently big dataset (~10000 nodes). When the annotation is in place, it works without problem. I also generated the optimizer plans for the two cases: - with annotation (working): https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b - without annotation (failing): https://gist.github.com/vasia/086faa45b980bf7f4c09 After visualizing the plans, the main difference I see is that in the working case, the next workset node and the solution set delta nodes are merged, while in the failing case they are separate. Shouldn't this work with and without annotation (but be more efficient with the annotation in place)? Or am I missing something here? Thanks in advance for any help :)) Cheers, - Vasia. |
That looks pretty much like a bug.
As you said, fwd fields annotations are optional and may improve the performance of a program, but never change its semantics (if set correctly). I'll have a look at it later. Would be great if you could provide some data to reproduce the bug. On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <[hidden email]> wrote: > Hello to my squirrels, > > I've been getting a NullPointerException for a DeltaIteration program I'm > trying to implement and I could really use your help :-) > It seems that some of the input Tuples of the Join operator that I'm using > to create the next workset / solution set delta are null. > It also seems that adding ForwardedFields annotations solves the issue. > > I managed to reproduce the behavior using the ConnectedComponents example, > by removing the "@ForwardedFieldsFirst("*")" annotation from > the ComponentIdFilter join. > The exception message is the following: > > Caused by: java.lang.NullPointerException > at > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > at > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > at > > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > at > > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > at > > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > at > > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > at > > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > at > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > at java.lang.Thread.run(Thread.java:745) > > I get this error locally with any sufficiently big dataset (~10000 nodes). > When the annotation is in place, it works without problem. > I also generated the optimizer plans for the two cases: > - with annotation (working): > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > - without annotation (failing): > https://gist.github.com/vasia/086faa45b980bf7f4c09 > > After visualizing the plans, the main difference I see is that in the > working case, the next workset node and the solution set delta nodes are > merged, while in the failing case they are separate. > > Shouldn't this work with and without annotation (but be more efficient with > the annotation in place)? Or am I missing something here? > > Thanks in advance for any help :)) > > Cheers, > - Vasia. > |
Hi Fabian,
I am using the dblp co-authorship dataset from SNAP: http://snap.stanford.edu/data/com-DBLP.html I also pushed my slightly modified version of ConnectedComponents, here: https://github.com/vasia/flink/tree/cc-test. It basically generates the vertex dataset from the edges, so that you don't need to create it separately. The annotation that creates the error is in line #172. Thanks a lot :)) -Vasia. On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote: > That looks pretty much like a bug. > > As you said, fwd fields annotations are optional and may improve the > performance of a program, but never change its semantics (if set > correctly). > > I'll have a look at it later. > Would be great if you could provide some data to reproduce the bug. > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <[hidden email]> > wrote: > > > Hello to my squirrels, > > > > I've been getting a NullPointerException for a DeltaIteration program I'm > > trying to implement and I could really use your help :-) > > It seems that some of the input Tuples of the Join operator that I'm > using > > to create the next workset / solution set delta are null. > > It also seems that adding ForwardedFields annotations solves the issue. > > > > I managed to reproduce the behavior using the ConnectedComponents > example, > > by removing the "@ForwardedFieldsFirst("*")" annotation from > > the ComponentIdFilter join. > > The exception message is the following: > > > > Caused by: java.lang.NullPointerException > > at > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > > at > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > > at > > > > > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > > at > > > > > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > > at > > > > > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > > at > > > > > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > > at > > > > > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > > at > > > > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > > at java.lang.Thread.run(Thread.java:745) > > > > I get this error locally with any sufficiently big dataset (~10000 > nodes). > > When the annotation is in place, it works without problem. > > I also generated the optimizer plans for the two cases: > > - with annotation (working): > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > > - without annotation (failing): > > https://gist.github.com/vasia/086faa45b980bf7f4c09 > > > > After visualizing the plans, the main difference I see is that in the > > working case, the next workset node and the solution set delta nodes are > > merged, while in the failing case they are separate. > > > > Shouldn't this work with and without annotation (but be more efficient > with > > the annotation in place)? Or am I missing something here? > > > > Thanks in advance for any help :)) > > > > Cheers, > > - Vasia. > > > |
Thanks for the nice setup!
I could easily reproduce the exception you are facing. But that's the only good news so far :-( I checked the plans and both are valid and should compute the correct result for the program. The split-of solution set delta is required because the it needs to be repartitioned (without the annotation, the optimizer does not know that it is in fact already correctly partitioned). One thing that made me a bit suspicious is that the solution set delta partitioning is marked with a Pipeline-Breaker. The pipeline breaker shouldn't make a semantic difference, but I am not sure if it is really required and also that part of the codebase was recently worked on. So, a closer look and more debugging is necessary to figure out what not working correctly here... 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > Hi Fabian, > > I am using the dblp co-authorship dataset from SNAP: > http://snap.stanford.edu/data/com-DBLP.html > I also pushed my slightly modified version of ConnectedComponents, here: > https://github.com/vasia/flink/tree/cc-test. It basically generates the > vertex dataset from the edges, so that you don't need to create it > separately. > The annotation that creates the error is in line #172. > > Thanks a lot :)) > > -Vasia. > > > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote: > > > That looks pretty much like a bug. > > > > As you said, fwd fields annotations are optional and may improve the > > performance of a program, but never change its semantics (if set > > correctly). > > > > I'll have a look at it later. > > Would be great if you could provide some data to reproduce the bug. > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <[hidden email]> > > wrote: > > > > > Hello to my squirrels, > > > > > > I've been getting a NullPointerException for a DeltaIteration program > I'm > > > trying to implement and I could really use your help :-) > > > It seems that some of the input Tuples of the Join operator that I'm > > using > > > to create the next workset / solution set delta are null. > > > It also seems that adding ForwardedFields annotations solves the issue. > > > > > > I managed to reproduce the behavior using the ConnectedComponents > > example, > > > by removing the "@ForwardedFieldsFirst("*")" annotation from > > > the ComponentIdFilter join. > > > The exception message is the following: > > > > > > Caused by: java.lang.NullPointerException > > > at > > > > > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > > > at > > > > > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > > > at > > > > > > > > > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > > > at > > > > > > > > > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > > > at > > > > > > > > > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > > > at > > > > > > > > > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > > > at > > > > > > > > > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > > > at > > > > > > > > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > > > at java.lang.Thread.run(Thread.java:745) > > > > > > I get this error locally with any sufficiently big dataset (~10000 > > nodes). > > > When the annotation is in place, it works without problem. > > > I also generated the optimizer plans for the two cases: > > > - with annotation (working): > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > > > - without annotation (failing): > > > https://gist.github.com/vasia/086faa45b980bf7f4c09 > > > > > > After visualizing the plans, the main difference I see is that in the > > > working case, the next workset node and the solution set delta nodes > are > > > merged, while in the failing case they are separate. > > > > > > Shouldn't this work with and without annotation (but be more efficient > > with > > > the annotation in place)? Or am I missing something here? > > > > > > Thanks in advance for any help :)) > > > > > > Cheers, > > > - Vasia. > > > > > > |
Hi Fabian,
thanks for looking into this. Let me know if there's anything I can do to help! Cheers, V. On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote: > Thanks for the nice setup! > I could easily reproduce the exception you are facing. > But that's the only good news so far :-( > > I checked the plans and both are valid and should compute the correct > result for the program. > The split-of solution set delta is required because the it needs to be > repartitioned (without the annotation, the optimizer does not know that it > is in fact already correctly partitioned). One thing that made me a bit > suspicious is that the solution set delta partitioning is marked with a > Pipeline-Breaker. The pipeline breaker shouldn't make a semantic > difference, but I am not sure if it is really required and also that part > of the codebase was recently worked on. > > So, a closer look and more debugging is necessary to figure out what not > working correctly here... > > > 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <[hidden email]>: > > > Hi Fabian, > > > > I am using the dblp co-authorship dataset from SNAP: > > http://snap.stanford.edu/data/com-DBLP.html > > I also pushed my slightly modified version of ConnectedComponents, here: > > https://github.com/vasia/flink/tree/cc-test. It basically generates the > > vertex dataset from the edges, so that you don't need to create it > > separately. > > The annotation that creates the error is in line #172. > > > > Thanks a lot :)) > > > > -Vasia. > > > > > > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote: > > > > > That looks pretty much like a bug. > > > > > > As you said, fwd fields annotations are optional and may improve the > > > performance of a program, but never change its semantics (if set > > > correctly). > > > > > > I'll have a look at it later. > > > Would be great if you could provide some data to reproduce the bug. > > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" <[hidden email] > > > > > wrote: > > > > > > > Hello to my squirrels, > > > > > > > > I've been getting a NullPointerException for a DeltaIteration program > > I'm > > > > trying to implement and I could really use your help :-) > > > > It seems that some of the input Tuples of the Join operator that I'm > > > using > > > > to create the next workset / solution set delta are null. > > > > It also seems that adding ForwardedFields annotations solves the > issue. > > > > > > > > I managed to reproduce the behavior using the ConnectedComponents > > > example, > > > > by removing the "@ForwardedFieldsFirst("*")" annotation from > > > > the ComponentIdFilter join. > > > > The exception message is the following: > > > > > > > > Caused by: java.lang.NullPointerException > > > > at > > > > > > > > > > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > > > > at > > > > > > > > > > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > > > > at > > > > > > > > > > > > > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > I get this error locally with any sufficiently big dataset (~10000 > > > nodes). > > > > When the annotation is in place, it works without problem. > > > > I also generated the optimizer plans for the two cases: > > > > - with annotation (working): > > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > > > > - without annotation (failing): > > > > https://gist.github.com/vasia/086faa45b980bf7f4c09 > > > > > > > > After visualizing the plans, the main difference I see is that in the > > > > working case, the next workset node and the solution set delta nodes > > are > > > > merged, while in the failing case they are separate. > > > > > > > > Shouldn't this work with and without annotation (but be more > efficient > > > with > > > > the annotation in place)? Or am I missing something here? > > > > > > > > Thanks in advance for any help :)) > > > > > > > > Cheers, > > > > - Vasia. > > > > > > > > > > |
Hi,
I actually ran into this problem again with a different algorithm :/ Same exception and it looks like getMatchFor() in CompactingHashTable returns a null record. Not sure why or why the annotation prevents this from happening. Any insight is highly welcome :-) Shall I open an issue so that we don't forget about this? -Vasia. On 4 April 2015 at 14:44, Vasiliki Kalavri <[hidden email]> wrote: > Hi Fabian, > > thanks for looking into this. > Let me know if there's anything I can do to help! > > Cheers, > V. > > On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote: > >> Thanks for the nice setup! >> I could easily reproduce the exception you are facing. >> But that's the only good news so far :-( >> >> I checked the plans and both are valid and should compute the correct >> result for the program. >> The split-of solution set delta is required because the it needs to be >> repartitioned (without the annotation, the optimizer does not know that it >> is in fact already correctly partitioned). One thing that made me a bit >> suspicious is that the solution set delta partitioning is marked with a >> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic >> difference, but I am not sure if it is really required and also that part >> of the codebase was recently worked on. >> >> So, a closer look and more debugging is necessary to figure out what not >> working correctly here... >> >> >> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <[hidden email]>: >> >> > Hi Fabian, >> > >> > I am using the dblp co-authorship dataset from SNAP: >> > http://snap.stanford.edu/data/com-DBLP.html >> > I also pushed my slightly modified version of ConnectedComponents, here: >> > https://github.com/vasia/flink/tree/cc-test. It basically generates the >> > vertex dataset from the edges, so that you don't need to create it >> > separately. >> > The annotation that creates the error is in line #172. >> > >> > Thanks a lot :)) >> > >> > -Vasia. >> > >> > >> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote: >> > >> > > That looks pretty much like a bug. >> > > >> > > As you said, fwd fields annotations are optional and may improve the >> > > performance of a program, but never change its semantics (if set >> > > correctly). >> > > >> > > I'll have a look at it later. >> > > Would be great if you could provide some data to reproduce the bug. >> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" < >> [hidden email]> >> > > wrote: >> > > >> > > > Hello to my squirrels, >> > > > >> > > > I've been getting a NullPointerException for a DeltaIteration >> program >> > I'm >> > > > trying to implement and I could really use your help :-) >> > > > It seems that some of the input Tuples of the Join operator that I'm >> > > using >> > > > to create the next workset / solution set delta are null. >> > > > It also seems that adding ForwardedFields annotations solves the >> issue. >> > > > >> > > > I managed to reproduce the behavior using the ConnectedComponents >> > > example, >> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from >> > > > the ComponentIdFilter join. >> > > > The exception message is the following: >> > > > >> > > > Caused by: java.lang.NullPointerException >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) >> > > > at >> > > > >> > > > >> > > >> > >> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) >> > > > at java.lang.Thread.run(Thread.java:745) >> > > > >> > > > I get this error locally with any sufficiently big dataset (~10000 >> > > nodes). >> > > > When the annotation is in place, it works without problem. >> > > > I also generated the optimizer plans for the two cases: >> > > > - with annotation (working): >> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b >> > > > - without annotation (failing): >> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09 >> > > > >> > > > After visualizing the plans, the main difference I see is that in >> the >> > > > working case, the next workset node and the solution set delta nodes >> > are >> > > > merged, while in the failing case they are separate. >> > > > >> > > > Shouldn't this work with and without annotation (but be more >> efficient >> > > with >> > > > the annotation in place)? Or am I missing something here? >> > > > >> > > > Thanks in advance for any help :)) >> > > > >> > > > Cheers, >> > > > - Vasia. >> > > > >> > > >> > >> > > |
I think Fabian looked into this a while back...
@Fabian, do you have any insights what causes this? On Sat, Apr 25, 2015 at 7:46 PM, Vasiliki Kalavri <[hidden email] > wrote: > Hi, > > I actually ran into this problem again with a different algorithm :/ > Same exception and it looks like getMatchFor() in CompactingHashTable > returns a null record. > Not sure why or why the annotation prevents this from happening. Any > insight is highly welcome :-) > > Shall I open an issue so that we don't forget about this? > > -Vasia. > > > On 4 April 2015 at 14:44, Vasiliki Kalavri <[hidden email]> > wrote: > > > Hi Fabian, > > > > thanks for looking into this. > > Let me know if there's anything I can do to help! > > > > Cheers, > > V. > > > > On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote: > > > >> Thanks for the nice setup! > >> I could easily reproduce the exception you are facing. > >> But that's the only good news so far :-( > >> > >> I checked the plans and both are valid and should compute the correct > >> result for the program. > >> The split-of solution set delta is required because the it needs to be > >> repartitioned (without the annotation, the optimizer does not know that > it > >> is in fact already correctly partitioned). One thing that made me a bit > >> suspicious is that the solution set delta partitioning is marked with a > >> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic > >> difference, but I am not sure if it is really required and also that > part > >> of the codebase was recently worked on. > >> > >> So, a closer look and more debugging is necessary to figure out what not > >> working correctly here... > >> > >> > >> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri <[hidden email] > >: > >> > >> > Hi Fabian, > >> > > >> > I am using the dblp co-authorship dataset from SNAP: > >> > http://snap.stanford.edu/data/com-DBLP.html > >> > I also pushed my slightly modified version of ConnectedComponents, > here: > >> > https://github.com/vasia/flink/tree/cc-test. It basically generates > the > >> > vertex dataset from the edges, so that you don't need to create it > >> > separately. > >> > The annotation that creates the error is in line #172. > >> > > >> > Thanks a lot :)) > >> > > >> > -Vasia. > >> > > >> > > >> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote: > >> > > >> > > That looks pretty much like a bug. > >> > > > >> > > As you said, fwd fields annotations are optional and may improve the > >> > > performance of a program, but never change its semantics (if set > >> > > correctly). > >> > > > >> > > I'll have a look at it later. > >> > > Would be great if you could provide some data to reproduce the bug. > >> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" < > >> [hidden email]> > >> > > wrote: > >> > > > >> > > > Hello to my squirrels, > >> > > > > >> > > > I've been getting a NullPointerException for a DeltaIteration > >> program > >> > I'm > >> > > > trying to implement and I could really use your help :-) > >> > > > It seems that some of the input Tuples of the Join operator that > I'm > >> > > using > >> > > > to create the next workset / solution set delta are null. > >> > > > It also seems that adding ForwardedFields annotations solves the > >> issue. > >> > > > > >> > > > I managed to reproduce the behavior using the ConnectedComponents > >> > > example, > >> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from > >> > > > the ComponentIdFilter join. > >> > > > The exception message is the following: > >> > > > > >> > > > Caused by: java.lang.NullPointerException > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > >> > > > at > >> > > > > >> > > > > >> > > > >> > > >> > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > >> > > > at java.lang.Thread.run(Thread.java:745) > >> > > > > >> > > > I get this error locally with any sufficiently big dataset (~10000 > >> > > nodes). > >> > > > When the annotation is in place, it works without problem. > >> > > > I also generated the optimizer plans for the two cases: > >> > > > - with annotation (working): > >> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > >> > > > - without annotation (failing): > >> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09 > >> > > > > >> > > > After visualizing the plans, the main difference I see is that in > >> the > >> > > > working case, the next workset node and the solution set delta > nodes > >> > are > >> > > > merged, while in the failing case they are separate. > >> > > > > >> > > > Shouldn't this work with and without annotation (but be more > >> efficient > >> > > with > >> > > > the annotation in place)? Or am I missing something here? > >> > > > > >> > > > Thanks in advance for any help :)) > >> > > > > >> > > > Cheers, > >> > > > - Vasia. > >> > > > > >> > > > >> > > >> > > > > > |
No, haven't looked at it since my last mail :-(
Both plans (with and without forward fields annotation) look good except for the suspicious pipeline breaker. @Vasia Could you open a JIRA and assign it to me? I'll have a closer look and try to figure out what's going on. 2015-04-27 10:34 GMT+02:00 Stephan Ewen <[hidden email]>: > I think Fabian looked into this a while back... > > @Fabian, do you have any insights what causes this? > > > On Sat, Apr 25, 2015 at 7:46 PM, Vasiliki Kalavri < > [hidden email] > > wrote: > > > Hi, > > > > I actually ran into this problem again with a different algorithm :/ > > Same exception and it looks like getMatchFor() in CompactingHashTable > > returns a null record. > > Not sure why or why the annotation prevents this from happening. Any > > insight is highly welcome :-) > > > > Shall I open an issue so that we don't forget about this? > > > > -Vasia. > > > > > > On 4 April 2015 at 14:44, Vasiliki Kalavri <[hidden email]> > > wrote: > > > > > Hi Fabian, > > > > > > thanks for looking into this. > > > Let me know if there's anything I can do to help! > > > > > > Cheers, > > > V. > > > > > > On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote: > > > > > >> Thanks for the nice setup! > > >> I could easily reproduce the exception you are facing. > > >> But that's the only good news so far :-( > > >> > > >> I checked the plans and both are valid and should compute the correct > > >> result for the program. > > >> The split-of solution set delta is required because the it needs to be > > >> repartitioned (without the annotation, the optimizer does not know > that > > it > > >> is in fact already correctly partitioned). One thing that made me a > bit > > >> suspicious is that the solution set delta partitioning is marked with > a > > >> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic > > >> difference, but I am not sure if it is really required and also that > > part > > >> of the codebase was recently worked on. > > >> > > >> So, a closer look and more debugging is necessary to figure out what > not > > >> working correctly here... > > >> > > >> > > >> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri < > [hidden email] > > >: > > >> > > >> > Hi Fabian, > > >> > > > >> > I am using the dblp co-authorship dataset from SNAP: > > >> > http://snap.stanford.edu/data/com-DBLP.html > > >> > I also pushed my slightly modified version of ConnectedComponents, > > here: > > >> > https://github.com/vasia/flink/tree/cc-test. It basically generates > > the > > >> > vertex dataset from the edges, so that you don't need to create it > > >> > separately. > > >> > The annotation that creates the error is in line #172. > > >> > > > >> > Thanks a lot :)) > > >> > > > >> > -Vasia. > > >> > > > >> > > > >> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> wrote: > > >> > > > >> > > That looks pretty much like a bug. > > >> > > > > >> > > As you said, fwd fields annotations are optional and may improve > the > > >> > > performance of a program, but never change its semantics (if set > > >> > > correctly). > > >> > > > > >> > > I'll have a look at it later. > > >> > > Would be great if you could provide some data to reproduce the > bug. > > >> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" < > > >> [hidden email]> > > >> > > wrote: > > >> > > > > >> > > > Hello to my squirrels, > > >> > > > > > >> > > > I've been getting a NullPointerException for a DeltaIteration > > >> program > > >> > I'm > > >> > > > trying to implement and I could really use your help :-) > > >> > > > It seems that some of the input Tuples of the Join operator that > > I'm > > >> > > using > > >> > > > to create the next workset / solution set delta are null. > > >> > > > It also seems that adding ForwardedFields annotations solves the > > >> issue. > > >> > > > > > >> > > > I managed to reproduce the behavior using the > ConnectedComponents > > >> > > example, > > >> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from > > >> > > > the ComponentIdFilter join. > > >> > > > The exception message is the following: > > >> > > > > > >> > > > Caused by: java.lang.NullPointerException > > >> > > > at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > > >> > > > at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > > >> > > > at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > > >> > > > at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > > >> > > > at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > > >> > > > at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > > >> > > > at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > > >> > > > at > > >> > > > > > >> > > > > > >> > > > > >> > > > >> > > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > > >> > > > at java.lang.Thread.run(Thread.java:745) > > >> > > > > > >> > > > I get this error locally with any sufficiently big dataset > (~10000 > > >> > > nodes). > > >> > > > When the annotation is in place, it works without problem. > > >> > > > I also generated the optimizer plans for the two cases: > > >> > > > - with annotation (working): > > >> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > > >> > > > - without annotation (failing): > > >> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09 > > >> > > > > > >> > > > After visualizing the plans, the main difference I see is that > in > > >> the > > >> > > > working case, the next workset node and the solution set delta > > nodes > > >> > are > > >> > > > merged, while in the failing case they are separate. > > >> > > > > > >> > > > Shouldn't this work with and without annotation (but be more > > >> efficient > > >> > > with > > >> > > > the annotation in place)? Or am I missing something here? > > >> > > > > > >> > > > Thanks in advance for any help :)) > > >> > > > > > >> > > > Cheers, > > >> > > > - Vasia. > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > > |
Will do, thanks!
On 27 April 2015 at 11:06, Fabian Hueske <[hidden email]> wrote: > No, haven't looked at it since my last mail :-( > Both plans (with and without forward fields annotation) look good except > for the suspicious pipeline breaker. > > @Vasia Could you open a JIRA and assign it to me? > I'll have a closer look and try to figure out what's going on. > > > 2015-04-27 10:34 GMT+02:00 Stephan Ewen <[hidden email]>: > > > I think Fabian looked into this a while back... > > > > @Fabian, do you have any insights what causes this? > > > > > > On Sat, Apr 25, 2015 at 7:46 PM, Vasiliki Kalavri < > > [hidden email] > > > wrote: > > > > > Hi, > > > > > > I actually ran into this problem again with a different algorithm :/ > > > Same exception and it looks like getMatchFor() in CompactingHashTable > > > returns a null record. > > > Not sure why or why the annotation prevents this from happening. Any > > > insight is highly welcome :-) > > > > > > Shall I open an issue so that we don't forget about this? > > > > > > -Vasia. > > > > > > > > > On 4 April 2015 at 14:44, Vasiliki Kalavri <[hidden email]> > > > wrote: > > > > > > > Hi Fabian, > > > > > > > > thanks for looking into this. > > > > Let me know if there's anything I can do to help! > > > > > > > > Cheers, > > > > V. > > > > > > > > On 3 April 2015 at 22:31, Fabian Hueske <[hidden email]> wrote: > > > > > > > >> Thanks for the nice setup! > > > >> I could easily reproduce the exception you are facing. > > > >> But that's the only good news so far :-( > > > >> > > > >> I checked the plans and both are valid and should compute the > correct > > > >> result for the program. > > > >> The split-of solution set delta is required because the it needs to > be > > > >> repartitioned (without the annotation, the optimizer does not know > > that > > > it > > > >> is in fact already correctly partitioned). One thing that made me a > > bit > > > >> suspicious is that the solution set delta partitioning is marked > with > > a > > > >> Pipeline-Breaker. The pipeline breaker shouldn't make a semantic > > > >> difference, but I am not sure if it is really required and also that > > > part > > > >> of the codebase was recently worked on. > > > >> > > > >> So, a closer look and more debugging is necessary to figure out what > > not > > > >> working correctly here... > > > >> > > > >> > > > >> 2015-04-03 14:14 GMT+02:00 Vasiliki Kalavri < > > [hidden email] > > > >: > > > >> > > > >> > Hi Fabian, > > > >> > > > > >> > I am using the dblp co-authorship dataset from SNAP: > > > >> > http://snap.stanford.edu/data/com-DBLP.html > > > >> > I also pushed my slightly modified version of ConnectedComponents, > > > here: > > > >> > https://github.com/vasia/flink/tree/cc-test. It basically > generates > > > the > > > >> > vertex dataset from the edges, so that you don't need to create it > > > >> > separately. > > > >> > The annotation that creates the error is in line #172. > > > >> > > > > >> > Thanks a lot :)) > > > >> > > > > >> > -Vasia. > > > >> > > > > >> > > > > >> > On 3 April 2015 at 13:09, Fabian Hueske <[hidden email]> > wrote: > > > >> > > > > >> > > That looks pretty much like a bug. > > > >> > > > > > >> > > As you said, fwd fields annotations are optional and may improve > > the > > > >> > > performance of a program, but never change its semantics (if set > > > >> > > correctly). > > > >> > > > > > >> > > I'll have a look at it later. > > > >> > > Would be great if you could provide some data to reproduce the > > bug. > > > >> > > On Apr 3, 2015 12:48 PM, "Vasiliki Kalavri" < > > > >> [hidden email]> > > > >> > > wrote: > > > >> > > > > > >> > > > Hello to my squirrels, > > > >> > > > > > > >> > > > I've been getting a NullPointerException for a DeltaIteration > > > >> program > > > >> > I'm > > > >> > > > trying to implement and I could really use your help :-) > > > >> > > > It seems that some of the input Tuples of the Join operator > that > > > I'm > > > >> > > using > > > >> > > > to create the next workset / solution set delta are null. > > > >> > > > It also seems that adding ForwardedFields annotations solves > the > > > >> issue. > > > >> > > > > > > >> > > > I managed to reproduce the behavior using the > > ConnectedComponents > > > >> > > example, > > > >> > > > by removing the "@ForwardedFieldsFirst("*")" annotation from > > > >> > > > the ComponentIdFilter join. > > > >> > > > The exception message is the following: > > > >> > > > > > > >> > > > Caused by: java.lang.NullPointerException > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:186) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.examples.java.graph.ConnectedComponents$ComponentIdFilter.join(ConnectedComponents.java:1) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:198) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > > > >> > > > at > > > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217) > > > >> > > > at java.lang.Thread.run(Thread.java:745) > > > >> > > > > > > >> > > > I get this error locally with any sufficiently big dataset > > (~10000 > > > >> > > nodes). > > > >> > > > When the annotation is in place, it works without problem. > > > >> > > > I also generated the optimizer plans for the two cases: > > > >> > > > - with annotation (working): > > > >> > > > https://gist.github.com/vasia/4f4dc6b0cc6c72b5b64b > > > >> > > > - without annotation (failing): > > > >> > > > https://gist.github.com/vasia/086faa45b980bf7f4c09 > > > >> > > > > > > >> > > > After visualizing the plans, the main difference I see is that > > in > > > >> the > > > >> > > > working case, the next workset node and the solution set delta > > > nodes > > > >> > are > > > >> > > > merged, while in the failing case they are separate. > > > >> > > > > > > >> > > > Shouldn't this work with and without annotation (but be more > > > >> efficient > > > >> > > with > > > >> > > > the annotation in place)? Or am I missing something here? > > > >> > > > > > > >> > > > Thanks in advance for any help :)) > > > >> > > > > > > >> > > > Cheers, > > > >> > > > - Vasia. > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |