Delta Iteration Obstuse Error Message

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Delta Iteration Obstuse Error Message

Jack David Galilee
Hi Everyone,


I had an epiphany while reading through the Flink 0.6 API documentation and decided to try a new method for my iterative algorithm, but it just results in a weirder error. I've also included the error I was getting for the suggestion that was posted earlier.


I'm sorry for not being able to provide full source code. If it is any help all of my functions now produce Tuple2<String, String>(); Where the initial dataset is also Tuple2<String,String>. The goal is to write out the union of the results from all iterations where intersection of the set of keys for iteration i and iteration i - 1 is the empty set.


        DeltaIteration<Tuple2<String, String>, Tuple2<String, String>> iteration = transactions.
                iterateDelta(initial, maxIterations, 0);


        DataSet<Tuple2<String, String>> ... = ....
                flatMap(new ...()).withBroadcastSet(iteration.getWorkset(), "..."). // <-- Referencing the working set
                groupBy(0).
                reduceGroup(new ...()).
                withParameters(intValue).
                join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <-- Referencing the solution set
//             projectFirst(1).projectSecond(1).types(String.class, String.class);


Raises Exception. If I change it to get input2(), I get the same error, but for the working set which is referenced through the broadcast.


Exception in thread "main" org.apache.flink.compiler.CompilerException: Error: The step function does not reference the solution set.
    at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:856)
    at org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:616)
    at org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
    at org.apache.flink.api.common.operators.base.GenericDataSinkBase.accept(GenericDataSinkBase.java:289)


If I remove the getInput1() call and uncomment that last line it yields. I was concerned that I was accidentally writing out a null value

somewhere but I can't find out.


Exception in thread "main" org.apache.flink.runtime.client.JobExecutionException: java.lang.NullPointerException
    at org.apache.flink.api.java.operators.JoinOperator$ProjectFlatJoinFunction.join(JoinOperator.java:935)
    at org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:143)
    at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:510)
    at org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:137)
    at org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
    at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
    at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
    at java.lang.Thread.run(Thread.java:744)


After more investigation it appears that the null pointer exists somewhere between the the reduceGroup operator and next mapOperator as the next mapOperator does not run after the reduceGroup.



Thanks,

Jack
Reply | Threaded
Open this post in threaded view
|

Re: Delta Iteration Obstuse Error Message

Stephan Ewen
Hi Jack!

You are doing something very interesting there. I am not sure I am getting
everything, but are some issues I can see...

 - In the line where you join with the solution set:
"join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
Referencing the solution set"
   What happens is that after constructing the join, you grab that join's
first input data set. This means, that you actually ignore the join and
simply use its "input1", in that case the result of the reduceGroup
operation. That explains why the program does not depend on the solution
set. If you change the code to "getInput2()" then you construct teh join
and take only its second input (the solution set). The other inputs are the
"dangling", meaning they are not really consumed and thus regarded unused.

  - The second error you get is a bug in Flink. This projection in the join
cannot handle null inputs, which may occur when joining with the solution
set. Let us fix that!

Greetings,
Stephan



On Wed, Aug 27, 2014 at 11:10 AM, Jack David Galilee <
[hidden email]> wrote:

> Hi Everyone,
>
>
> I had an epiphany while reading through the Flink 0.6 API documentation
> and decided to try a new method for my iterative algorithm, but it just
> results in a weirder error. I've also included the error I was getting for
> the suggestion that was posted earlier.
>
>
> I'm sorry for not being able to provide full source code. If it is any
> help all of my functions now produce Tuple2<String, String>(); Where the
> initial dataset is also Tuple2<String,String>. The goal is to write out the
> union of the results from all iterations where intersection of the set of
> keys for iteration i and iteration i - 1 is the empty set.
>
>
>         DeltaIteration<Tuple2<String, String>, Tuple2<String, String>>
> iteration = transactions.
>                 iterateDelta(initial, maxIterations, 0);
>
>
>         DataSet<Tuple2<String, String>> ... = ....
>                 flatMap(new
> ...()).withBroadcastSet(iteration.getWorkset(), "..."). // <-- Referencing
> the working set
>                 groupBy(0).
>                 reduceGroup(new ...()).
>                 withParameters(intValue).
>
> join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> Referencing the solution set
> //             projectFirst(1).projectSecond(1).types(String.class,
> String.class);
>
>
> Raises Exception. If I change it to get input2(), I get the same error,
> but for the working set which is referenced through the broadcast.
>
>
> Exception in thread "main" org.apache.flink.compiler.CompilerException:
> Error: The step function does not reference the solution set.
>     at
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:856)
>     at
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:616)
>     at
> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
>     at
> org.apache.flink.api.common.operators.base.GenericDataSinkBase.accept(GenericDataSinkBase.java:289)
>
>
> If I remove the getInput1() call and uncomment that last line it yields. I
> was concerned that I was accidentally writing out a null value
>
> somewhere but I can't find out.
>
>
> Exception in thread "main"
> org.apache.flink.runtime.client.JobExecutionException:
> java.lang.NullPointerException
>     at
> org.apache.flink.api.java.operators.JoinOperator$ProjectFlatJoinFunction.join(JoinOperator.java:935)
>     at
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:143)
>     at
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:510)
>     at
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:137)
>     at
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
>     at
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
>     at
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
>     at java.lang.Thread.run(Thread.java:744)
>
>
> After more investigation it appears that the null pointer exists somewhere
> between the the reduceGroup operator and next mapOperator as the next
> mapOperator does not run after the reduceGroup.
>
>
>
> Thanks,
>
> Jack
>
Reply | Threaded
Open this post in threaded view
|

RE: Delta Iteration Obstuse Error Message

Jack David Galilee
Awesome, thanks Stephan for getting back to me so fast.

I had assumed that was what I was what was going on with regards to getInput1() or getInput2().
Good to know what I thought was happening was happening.

The interesting thing that might help is that the null value is not being written out in my ReduceGroup
function so it is coming from somewhere else in between. I did a trace of my program and it didn't write
out a null value to the output collector, but if I remember correctly it was called one extra time than I'd
expected.
________________________________________
From: [hidden email] <[hidden email]> on behalf of Stephan Ewen <[hidden email]>
Sent: 28 August 2014 09:52
To: [hidden email]
Subject: Re: Delta Iteration Obstuse Error Message

Hi Jack!

You are doing something very interesting there. I am not sure I am getting
everything, but are some issues I can see...

 - In the line where you join with the solution set:
"join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
Referencing the solution set"
   What happens is that after constructing the join, you grab that join's
first input data set. This means, that you actually ignore the join and
simply use its "input1", in that case the result of the reduceGroup
operation. That explains why the program does not depend on the solution
set. If you change the code to "getInput2()" then you construct teh join
and take only its second input (the solution set). The other inputs are the
"dangling", meaning they are not really consumed and thus regarded unused.

  - The second error you get is a bug in Flink. This projection in the join
cannot handle null inputs, which may occur when joining with the solution
set. Let us fix that!

Greetings,
Stephan



On Wed, Aug 27, 2014 at 11:10 AM, Jack David Galilee <
[hidden email]> wrote:

> Hi Everyone,
>
>
> I had an epiphany while reading through the Flink 0.6 API documentation
> and decided to try a new method for my iterative algorithm, but it just
> results in a weirder error. I've also included the error I was getting for
> the suggestion that was posted earlier.
>
>
> I'm sorry for not being able to provide full source code. If it is any
> help all of my functions now produce Tuple2<String, String>(); Where the
> initial dataset is also Tuple2<String,String>. The goal is to write out the
> union of the results from all iterations where intersection of the set of
> keys for iteration i and iteration i - 1 is the empty set.
>
>
>         DeltaIteration<Tuple2<String, String>, Tuple2<String, String>>
> iteration = transactions.
>                 iterateDelta(initial, maxIterations, 0);
>
>
>         DataSet<Tuple2<String, String>> ... = ....
>                 flatMap(new
> ...()).withBroadcastSet(iteration.getWorkset(), "..."). // <-- Referencing
> the working set
>                 groupBy(0).
>                 reduceGroup(new ...()).
>                 withParameters(intValue).
>
> join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> Referencing the solution set
> //             projectFirst(1).projectSecond(1).types(String.class,
> String.class);
>
>
> Raises Exception. If I change it to get input2(), I get the same error,
> but for the working set which is referenced through the broadcast.
>
>
> Exception in thread "main" org.apache.flink.compiler.CompilerException:
> Error: The step function does not reference the solution set.
>     at
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:856)
>     at
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:616)
>     at
> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
>     at
> org.apache.flink.api.common.operators.base.GenericDataSinkBase.accept(GenericDataSinkBase.java:289)
>
>
> If I remove the getInput1() call and uncomment that last line it yields. I
> was concerned that I was accidentally writing out a null value
>
> somewhere but I can't find out.
>
>
> Exception in thread "main"
> org.apache.flink.runtime.client.JobExecutionException:
> java.lang.NullPointerException
>     at
> org.apache.flink.api.java.operators.JoinOperator$ProjectFlatJoinFunction.join(JoinOperator.java:935)
>     at
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:143)
>     at
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:510)
>     at
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:137)
>     at
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
>     at
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
>     at
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
>     at java.lang.Thread.run(Thread.java:744)
>
>
> After more investigation it appears that the null pointer exists somewhere
> between the the reduceGroup operator and next mapOperator as the next
> mapOperator does not run after the reduceGroup.
>
>
>
> Thanks,
>
> Jack
>
Reply | Threaded
Open this post in threaded view
|

Re: Delta Iteration Obstuse Error Message

Fabian Hueske
I pushed a fix to allow null tuples as input for ProjectJoin (FLINK-1074).

You can update to the latest master or port the fix (just two lines:
https://github.com/apache/incubator-flink/commit/00840599a7a498cbd19d524ab5ad698365cbab4f
)

Cheers, Fabian


2014-08-28 2:09 GMT+02:00 Jack David Galilee <[hidden email]>:

> Awesome, thanks Stephan for getting back to me so fast.
>
> I had assumed that was what I was what was going on with regards to
> getInput1() or getInput2().
> Good to know what I thought was happening was happening.
>
> The interesting thing that might help is that the null value is not being
> written out in my ReduceGroup
> function so it is coming from somewhere else in between. I did a trace of
> my program and it didn't write
> out a null value to the output collector, but if I remember correctly it
> was called one extra time than I'd
> expected.
> ________________________________________
> From: [hidden email] <[hidden email]> on behalf of Stephan
> Ewen <[hidden email]>
> Sent: 28 August 2014 09:52
> To: [hidden email]
> Subject: Re: Delta Iteration Obstuse Error Message
>
> Hi Jack!
>
> You are doing something very interesting there. I am not sure I am getting
> everything, but are some issues I can see...
>
>  - In the line where you join with the solution set:
> "join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> Referencing the solution set"
>    What happens is that after constructing the join, you grab that join's
> first input data set. This means, that you actually ignore the join and
> simply use its "input1", in that case the result of the reduceGroup
> operation. That explains why the program does not depend on the solution
> set. If you change the code to "getInput2()" then you construct teh join
> and take only its second input (the solution set). The other inputs are the
> "dangling", meaning they are not really consumed and thus regarded unused.
>
>   - The second error you get is a bug in Flink. This projection in the join
> cannot handle null inputs, which may occur when joining with the solution
> set. Let us fix that!
>
> Greetings,
> Stephan
>
>
>
> On Wed, Aug 27, 2014 at 11:10 AM, Jack David Galilee <
> [hidden email]> wrote:
>
> > Hi Everyone,
> >
> >
> > I had an epiphany while reading through the Flink 0.6 API documentation
> > and decided to try a new method for my iterative algorithm, but it just
> > results in a weirder error. I've also included the error I was getting
> for
> > the suggestion that was posted earlier.
> >
> >
> > I'm sorry for not being able to provide full source code. If it is any
> > help all of my functions now produce Tuple2<String, String>(); Where the
> > initial dataset is also Tuple2<String,String>. The goal is to write out
> the
> > union of the results from all iterations where intersection of the set of
> > keys for iteration i and iteration i - 1 is the empty set.
> >
> >
> >         DeltaIteration<Tuple2<String, String>, Tuple2<String, String>>
> > iteration = transactions.
> >                 iterateDelta(initial, maxIterations, 0);
> >
> >
> >         DataSet<Tuple2<String, String>> ... = ....
> >                 flatMap(new
> > ...()).withBroadcastSet(iteration.getWorkset(), "..."). // <--
> Referencing
> > the working set
> >                 groupBy(0).
> >                 reduceGroup(new ...()).
> >                 withParameters(intValue).
> >
> > join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> > Referencing the solution set
> > //             projectFirst(1).projectSecond(1).types(String.class,
> > String.class);
> >
> >
> > Raises Exception. If I change it to get input2(), I get the same error,
> > but for the working set which is referenced through the broadcast.
> >
> >
> > Exception in thread "main" org.apache.flink.compiler.CompilerException:
> > Error: The step function does not reference the solution set.
> >     at
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:856)
> >     at
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:616)
> >     at
> >
> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
> >     at
> >
> org.apache.flink.api.common.operators.base.GenericDataSinkBase.accept(GenericDataSinkBase.java:289)
> >
> >
> > If I remove the getInput1() call and uncomment that last line it yields.
> I
> > was concerned that I was accidentally writing out a null value
> >
> > somewhere but I can't find out.
> >
> >
> > Exception in thread "main"
> > org.apache.flink.runtime.client.JobExecutionException:
> > java.lang.NullPointerException
> >     at
> >
> org.apache.flink.api.java.operators.JoinOperator$ProjectFlatJoinFunction.join(JoinOperator.java:935)
> >     at
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:143)
> >     at
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:510)
> >     at
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:137)
> >     at
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> >     at
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
> >     at
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> > After more investigation it appears that the null pointer exists
> somewhere
> > between the the reduceGroup operator and next mapOperator as the next
> > mapOperator does not run after the reduceGroup.
> >
> >
> >
> > Thanks,
> >
> > Jack
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: Delta Iteration Obstuse Error Message

Jack David Galilee
Cheers Fabian,

Just need to get past a new null pointer exception in the build process and I'll let you know.

Failed to execute goal pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project flink-runtime: Execution default of goal pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed. NullPointerException -> [Help 1]


Cheers,
Jack

________________________________________
From: Fabian Hueske <[hidden email]>
Sent: 28 August 2014 19:15
To: [hidden email]
Subject: Re: Delta Iteration Obstuse Error Message

I pushed a fix to allow null tuples as input for ProjectJoin (FLINK-1074).

You can update to the latest master or port the fix (just two lines:
https://github.com/apache/incubator-flink/commit/00840599a7a498cbd19d524ab5ad698365cbab4f
)

Cheers, Fabian


2014-08-28 2:09 GMT+02:00 Jack David Galilee <[hidden email]>:

> Awesome, thanks Stephan for getting back to me so fast.
>
> I had assumed that was what I was what was going on with regards to
> getInput1() or getInput2().
> Good to know what I thought was happening was happening.
>
> The interesting thing that might help is that the null value is not being
> written out in my ReduceGroup
> function so it is coming from somewhere else in between. I did a trace of
> my program and it didn't write
> out a null value to the output collector, but if I remember correctly it
> was called one extra time than I'd
> expected.
> ________________________________________
> From: [hidden email] <[hidden email]> on behalf of Stephan
> Ewen <[hidden email]>
> Sent: 28 August 2014 09:52
> To: [hidden email]
> Subject: Re: Delta Iteration Obstuse Error Message
>
> Hi Jack!
>
> You are doing something very interesting there. I am not sure I am getting
> everything, but are some issues I can see...
>
>  - In the line where you join with the solution set:
> "join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> Referencing the solution set"
>    What happens is that after constructing the join, you grab that join's
> first input data set. This means, that you actually ignore the join and
> simply use its "input1", in that case the result of the reduceGroup
> operation. That explains why the program does not depend on the solution
> set. If you change the code to "getInput2()" then you construct teh join
> and take only its second input (the solution set). The other inputs are the
> "dangling", meaning they are not really consumed and thus regarded unused.
>
>   - The second error you get is a bug in Flink. This projection in the join
> cannot handle null inputs, which may occur when joining with the solution
> set. Let us fix that!
>
> Greetings,
> Stephan
>
>
>
> On Wed, Aug 27, 2014 at 11:10 AM, Jack David Galilee <
> [hidden email]> wrote:
>
> > Hi Everyone,
> >
> >
> > I had an epiphany while reading through the Flink 0.6 API documentation
> > and decided to try a new method for my iterative algorithm, but it just
> > results in a weirder error. I've also included the error I was getting
> for
> > the suggestion that was posted earlier.
> >
> >
> > I'm sorry for not being able to provide full source code. If it is any
> > help all of my functions now produce Tuple2<String, String>(); Where the
> > initial dataset is also Tuple2<String,String>. The goal is to write out
> the
> > union of the results from all iterations where intersection of the set of
> > keys for iteration i and iteration i - 1 is the empty set.
> >
> >
> >         DeltaIteration<Tuple2<String, String>, Tuple2<String, String>>
> > iteration = transactions.
> >                 iterateDelta(initial, maxIterations, 0);
> >
> >
> >         DataSet<Tuple2<String, String>> ... = ....
> >                 flatMap(new
> > ...()).withBroadcastSet(iteration.getWorkset(), "..."). // <--
> Referencing
> > the working set
> >                 groupBy(0).
> >                 reduceGroup(new ...()).
> >                 withParameters(intValue).
> >
> > join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> > Referencing the solution set
> > //             projectFirst(1).projectSecond(1).types(String.class,
> > String.class);
> >
> >
> > Raises Exception. If I change it to get input2(), I get the same error,
> > but for the working set which is referenced through the broadcast.
> >
> >
> > Exception in thread "main" org.apache.flink.compiler.CompilerException:
> > Error: The step function does not reference the solution set.
> >     at
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:856)
> >     at
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:616)
> >     at
> >
> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
> >     at
> >
> org.apache.flink.api.common.operators.base.GenericDataSinkBase.accept(GenericDataSinkBase.java:289)
> >
> >
> > If I remove the getInput1() call and uncomment that last line it yields.
> I
> > was concerned that I was accidentally writing out a null value
> >
> > somewhere but I can't find out.
> >
> >
> > Exception in thread "main"
> > org.apache.flink.runtime.client.JobExecutionException:
> > java.lang.NullPointerException
> >     at
> >
> org.apache.flink.api.java.operators.JoinOperator$ProjectFlatJoinFunction.join(JoinOperator.java:935)
> >     at
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:143)
> >     at
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:510)
> >     at
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:137)
> >     at
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> >     at
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
> >     at
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> > After more investigation it appears that the null pointer exists
> somewhere
> > between the the reduceGroup operator and next mapOperator as the next
> > mapOperator does not run after the reduceGroup.
> >
> >
> >
> > Thanks,
> >
> > Jack
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Delta Iteration Obstuse Error Message

Stephan Ewen
Jack,

The null tuple may come when you join with the solution set, but there is
no entry for that key in the solution set yet. When joining the solution
set, the semantics are those of an outer join.

The rational is that this allows you to not loose potential new entries and
grow the solution set if you want.

Greetings,
Stephan



On Thu, Aug 28, 2014 at 12:17 PM, Jack David Galilee <
[hidden email]> wrote:

> Cheers Fabian,
>
> Just need to get past a new null pointer exception in the build process
> and I'll let you know.
>
> Failed to execute goal
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project
> flink-runtime: Execution default of goal
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed.
> NullPointerException -> [Help 1]
>
>
> Cheers,
> Jack
>
> ________________________________________
> From: Fabian Hueske <[hidden email]>
> Sent: 28 August 2014 19:15
> To: [hidden email]
> Subject: Re: Delta Iteration Obstuse Error Message
>
> I pushed a fix to allow null tuples as input for ProjectJoin (FLINK-1074).
>
> You can update to the latest master or port the fix (just two lines:
>
> https://github.com/apache/incubator-flink/commit/00840599a7a498cbd19d524ab5ad698365cbab4f
> )
>
> Cheers, Fabian
>
>
> 2014-08-28 2:09 GMT+02:00 Jack David Galilee <[hidden email]>:
>
> > Awesome, thanks Stephan for getting back to me so fast.
> >
> > I had assumed that was what I was what was going on with regards to
> > getInput1() or getInput2().
> > Good to know what I thought was happening was happening.
> >
> > The interesting thing that might help is that the null value is not being
> > written out in my ReduceGroup
> > function so it is coming from somewhere else in between. I did a trace of
> > my program and it didn't write
> > out a null value to the output collector, but if I remember correctly it
> > was called one extra time than I'd
> > expected.
> > ________________________________________
> > From: [hidden email] <[hidden email]> on behalf of Stephan
> > Ewen <[hidden email]>
> > Sent: 28 August 2014 09:52
> > To: [hidden email]
> > Subject: Re: Delta Iteration Obstuse Error Message
> >
> > Hi Jack!
> >
> > You are doing something very interesting there. I am not sure I am
> getting
> > everything, but are some issues I can see...
> >
> >  - In the line where you join with the solution set:
> > "join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> > Referencing the solution set"
> >    What happens is that after constructing the join, you grab that join's
> > first input data set. This means, that you actually ignore the join and
> > simply use its "input1", in that case the result of the reduceGroup
> > operation. That explains why the program does not depend on the solution
> > set. If you change the code to "getInput2()" then you construct teh join
> > and take only its second input (the solution set). The other inputs are
> the
> > "dangling", meaning they are not really consumed and thus regarded
> unused.
> >
> >   - The second error you get is a bug in Flink. This projection in the
> join
> > cannot handle null inputs, which may occur when joining with the solution
> > set. Let us fix that!
> >
> > Greetings,
> > Stephan
> >
> >
> >
> > On Wed, Aug 27, 2014 at 11:10 AM, Jack David Galilee <
> > [hidden email]> wrote:
> >
> > > Hi Everyone,
> > >
> > >
> > > I had an epiphany while reading through the Flink 0.6 API documentation
> > > and decided to try a new method for my iterative algorithm, but it just
> > > results in a weirder error. I've also included the error I was getting
> > for
> > > the suggestion that was posted earlier.
> > >
> > >
> > > I'm sorry for not being able to provide full source code. If it is any
> > > help all of my functions now produce Tuple2<String, String>(); Where
> the
> > > initial dataset is also Tuple2<String,String>. The goal is to write out
> > the
> > > union of the results from all iterations where intersection of the set
> of
> > > keys for iteration i and iteration i - 1 is the empty set.
> > >
> > >
> > >         DeltaIteration<Tuple2<String, String>, Tuple2<String, String>>
> > > iteration = transactions.
> > >                 iterateDelta(initial, maxIterations, 0);
> > >
> > >
> > >         DataSet<Tuple2<String, String>> ... = ....
> > >                 flatMap(new
> > > ...()).withBroadcastSet(iteration.getWorkset(), "..."). // <--
> > Referencing
> > > the working set
> > >                 groupBy(0).
> > >                 reduceGroup(new ...()).
> > >                 withParameters(intValue).
> > >
> > > join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); //
> <--
> > > Referencing the solution set
> > > //             projectFirst(1).projectSecond(1).types(String.class,
> > > String.class);
> > >
> > >
> > > Raises Exception. If I change it to get input2(), I get the same error,
> > > but for the working set which is referenced through the broadcast.
> > >
> > >
> > > Exception in thread "main" org.apache.flink.compiler.CompilerException:
> > > Error: The step function does not reference the solution set.
> > >     at
> > >
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:856)
> > >     at
> > >
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:616)
> > >     at
> > >
> >
> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
> > >     at
> > >
> >
> org.apache.flink.api.common.operators.base.GenericDataSinkBase.accept(GenericDataSinkBase.java:289)
> > >
> > >
> > > If I remove the getInput1() call and uncomment that last line it
> yields.
> > I
> > > was concerned that I was accidentally writing out a null value
> > >
> > > somewhere but I can't find out.
> > >
> > >
> > > Exception in thread "main"
> > > org.apache.flink.runtime.client.JobExecutionException:
> > > java.lang.NullPointerException
> > >     at
> > >
> >
> org.apache.flink.api.java.operators.JoinOperator$ProjectFlatJoinFunction.join(JoinOperator.java:935)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:143)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:510)
> > >     at
> > >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:137)
> > >     at
> > >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
> > >     at
> > >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
> > >     at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > > After more investigation it appears that the null pointer exists
> > somewhere
> > > between the the reduceGroup operator and next mapOperator as the next
> > > mapOperator does not run after the reduceGroup.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Jack
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Delta Iteration Obstuse Error Message

Robert Metzger
In reply to this post by Jack David Galilee
Hi Jack,

are you building Flink in a git repository our just from the sources? This
is most likely a bug in the "git-commit-id-plugin" maven plugin. Can you
try executing maven with the -X argument. This is going to show you the
full stack trace. This might help to understand the issue.

Robert


On Thu, Aug 28, 2014 at 12:17 PM, Jack David Galilee <
[hidden email]> wrote:

> Cheers Fabian,
>
> Just need to get past a new null pointer exception in the build process
> and I'll let you know.
>
> Failed to execute goal
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project
> flink-runtime: Execution default of goal
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed.
> NullPointerException -> [Help 1]
>
>
> Cheers,
> Jack
>
> ________________________________________
> From: Fabian Hueske <[hidden email]>
> Sent: 28 August 2014 19:15
> To: [hidden email]
> Subject: Re: Delta Iteration Obstuse Error Message
>
> I pushed a fix to allow null tuples as input for ProjectJoin (FLINK-1074).
>
> You can update to the latest master or port the fix (just two lines:
>
> https://github.com/apache/incubator-flink/commit/00840599a7a498cbd19d524ab5ad698365cbab4f
> )
>
> Cheers, Fabian
>
>
> 2014-08-28 2:09 GMT+02:00 Jack David Galilee <[hidden email]>:
>
> > Awesome, thanks Stephan for getting back to me so fast.
> >
> > I had assumed that was what I was what was going on with regards to
> > getInput1() or getInput2().
> > Good to know what I thought was happening was happening.
> >
> > The interesting thing that might help is that the null value is not being
> > written out in my ReduceGroup
> > function so it is coming from somewhere else in between. I did a trace of
> > my program and it didn't write
> > out a null value to the output collector, but if I remember correctly it
> > was called one extra time than I'd
> > expected.
> > ________________________________________
> > From: [hidden email] <[hidden email]> on behalf of Stephan
> > Ewen <[hidden email]>
> > Sent: 28 August 2014 09:52
> > To: [hidden email]
> > Subject: Re: Delta Iteration Obstuse Error Message
> >
> > Hi Jack!
> >
> > You are doing something very interesting there. I am not sure I am
> getting
> > everything, but are some issues I can see...
> >
> >  - In the line where you join with the solution set:
> > "join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> > Referencing the solution set"
> >    What happens is that after constructing the join, you grab that join's
> > first input data set. This means, that you actually ignore the join and
> > simply use its "input1", in that case the result of the reduceGroup
> > operation. That explains why the program does not depend on the solution
> > set. If you change the code to "getInput2()" then you construct teh join
> > and take only its second input (the solution set). The other inputs are
> the
> > "dangling", meaning they are not really consumed and thus regarded
> unused.
> >
> >   - The second error you get is a bug in Flink. This projection in the
> join
> > cannot handle null inputs, which may occur when joining with the solution
> > set. Let us fix that!
> >
> > Greetings,
> > Stephan
> >
> >
> >
> > On Wed, Aug 27, 2014 at 11:10 AM, Jack David Galilee <
> > [hidden email]> wrote:
> >
> > > Hi Everyone,
> > >
> > >
> > > I had an epiphany while reading through the Flink 0.6 API documentation
> > > and decided to try a new method for my iterative algorithm, but it just
> > > results in a weirder error. I've also included the error I was getting
> > for
> > > the suggestion that was posted earlier.
> > >
> > >
> > > I'm sorry for not being able to provide full source code. If it is any
> > > help all of my functions now produce Tuple2<String, String>(); Where
> the
> > > initial dataset is also Tuple2<String,String>. The goal is to write out
> > the
> > > union of the results from all iterations where intersection of the set
> of
> > > keys for iteration i and iteration i - 1 is the empty set.
> > >
> > >
> > >         DeltaIteration<Tuple2<String, String>, Tuple2<String, String>>
> > > iteration = transactions.
> > >                 iterateDelta(initial, maxIterations, 0);
> > >
> > >
> > >         DataSet<Tuple2<String, String>> ... = ....
> > >                 flatMap(new
> > > ...()).withBroadcastSet(iteration.getWorkset(), "..."). // <--
> > Referencing
> > > the working set
> > >                 groupBy(0).
> > >                 reduceGroup(new ...()).
> > >                 withParameters(intValue).
> > >
> > > join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); //
> <--
> > > Referencing the solution set
> > > //             projectFirst(1).projectSecond(1).types(String.class,
> > > String.class);
> > >
> > >
> > > Raises Exception. If I change it to get input2(), I get the same error,
> > > but for the working set which is referenced through the broadcast.
> > >
> > >
> > > Exception in thread "main" org.apache.flink.compiler.CompilerException:
> > > Error: The step function does not reference the solution set.
> > >     at
> > >
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:856)
> > >     at
> > >
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:616)
> > >     at
> > >
> >
> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
> > >     at
> > >
> >
> org.apache.flink.api.common.operators.base.GenericDataSinkBase.accept(GenericDataSinkBase.java:289)
> > >
> > >
> > > If I remove the getInput1() call and uncomment that last line it
> yields.
> > I
> > > was concerned that I was accidentally writing out a null value
> > >
> > > somewhere but I can't find out.
> > >
> > >
> > > Exception in thread "main"
> > > org.apache.flink.runtime.client.JobExecutionException:
> > > java.lang.NullPointerException
> > >     at
> > >
> >
> org.apache.flink.api.java.operators.JoinOperator$ProjectFlatJoinFunction.join(JoinOperator.java:935)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:143)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:510)
> > >     at
> > >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:137)
> > >     at
> > >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
> > >     at
> > >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
> > >     at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > > After more investigation it appears that the null pointer exists
> > somewhere
> > > between the the reduceGroup operator and next mapOperator as the next
> > > mapOperator does not run after the reduceGroup.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Jack
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: Delta Iteration Obstuse Error Message

Jack David Galilee
In reply to this post by Stephan Ewen
Stepha, ahh so that'd make it a full outer join yeah?
________________________________________
From: [hidden email] <[hidden email]> on behalf of Stephan Ewen <[hidden email]>
Sent: 28 August 2014 20:21
To: [hidden email]
Subject: Re: Delta Iteration Obstuse Error Message

Jack,

The null tuple may come when you join with the solution set, but there is
no entry for that key in the solution set yet. When joining the solution
set, the semantics are those of an outer join.

The rational is that this allows you to not loose potential new entries and
grow the solution set if you want.

Greetings,
Stephan



On Thu, Aug 28, 2014 at 12:17 PM, Jack David Galilee <
[hidden email]> wrote:

> Cheers Fabian,
>
> Just need to get past a new null pointer exception in the build process
> and I'll let you know.
>
> Failed to execute goal
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision (default) on project
> flink-runtime: Execution default of goal
> pl.project13.maven:git-commit-id-plugin:2.1.5:revision failed.
> NullPointerException -> [Help 1]
>
>
> Cheers,
> Jack
>
> ________________________________________
> From: Fabian Hueske <[hidden email]>
> Sent: 28 August 2014 19:15
> To: [hidden email]
> Subject: Re: Delta Iteration Obstuse Error Message
>
> I pushed a fix to allow null tuples as input for ProjectJoin (FLINK-1074).
>
> You can update to the latest master or port the fix (just two lines:
>
> https://github.com/apache/incubator-flink/commit/00840599a7a498cbd19d524ab5ad698365cbab4f
> )
>
> Cheers, Fabian
>
>
> 2014-08-28 2:09 GMT+02:00 Jack David Galilee <[hidden email]>:
>
> > Awesome, thanks Stephan for getting back to me so fast.
> >
> > I had assumed that was what I was what was going on with regards to
> > getInput1() or getInput2().
> > Good to know what I thought was happening was happening.
> >
> > The interesting thing that might help is that the null value is not being
> > written out in my ReduceGroup
> > function so it is coming from somewhere else in between. I did a trace of
> > my program and it didn't write
> > out a null value to the output collector, but if I remember correctly it
> > was called one extra time than I'd
> > expected.
> > ________________________________________
> > From: [hidden email] <[hidden email]> on behalf of Stephan
> > Ewen <[hidden email]>
> > Sent: 28 August 2014 09:52
> > To: [hidden email]
> > Subject: Re: Delta Iteration Obstuse Error Message
> >
> > Hi Jack!
> >
> > You are doing something very interesting there. I am not sure I am
> getting
> > everything, but are some issues I can see...
> >
> >  - In the line where you join with the solution set:
> > "join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); // <--
> > Referencing the solution set"
> >    What happens is that after constructing the join, you grab that join's
> > first input data set. This means, that you actually ignore the join and
> > simply use its "input1", in that case the result of the reduceGroup
> > operation. That explains why the program does not depend on the solution
> > set. If you change the code to "getInput2()" then you construct teh join
> > and take only its second input (the solution set). The other inputs are
> the
> > "dangling", meaning they are not really consumed and thus regarded
> unused.
> >
> >   - The second error you get is a bug in Flink. This projection in the
> join
> > cannot handle null inputs, which may occur when joining with the solution
> > set. Let us fix that!
> >
> > Greetings,
> > Stephan
> >
> >
> >
> > On Wed, Aug 27, 2014 at 11:10 AM, Jack David Galilee <
> > [hidden email]> wrote:
> >
> > > Hi Everyone,
> > >
> > >
> > > I had an epiphany while reading through the Flink 0.6 API documentation
> > > and decided to try a new method for my iterative algorithm, but it just
> > > results in a weirder error. I've also included the error I was getting
> > for
> > > the suggestion that was posted earlier.
> > >
> > >
> > > I'm sorry for not being able to provide full source code. If it is any
> > > help all of my functions now produce Tuple2<String, String>(); Where
> the
> > > initial dataset is also Tuple2<String,String>. The goal is to write out
> > the
> > > union of the results from all iterations where intersection of the set
> of
> > > keys for iteration i and iteration i - 1 is the empty set.
> > >
> > >
> > >         DeltaIteration<Tuple2<String, String>, Tuple2<String, String>>
> > > iteration = transactions.
> > >                 iterateDelta(initial, maxIterations, 0);
> > >
> > >
> > >         DataSet<Tuple2<String, String>> ... = ....
> > >                 flatMap(new
> > > ...()).withBroadcastSet(iteration.getWorkset(), "..."). // <--
> > Referencing
> > > the working set
> > >                 groupBy(0).
> > >                 reduceGroup(new ...()).
> > >                 withParameters(intValue).
> > >
> > > join(iteration.getSolutionSet()).where(0).equalTo(0).getInput1(); //
> <--
> > > Referencing the solution set
> > > //             projectFirst(1).projectSecond(1).types(String.class,
> > > String.class);
> > >
> > >
> > > Raises Exception. If I change it to get input2(), I get the same error,
> > > but for the working set which is referenced through the broadcast.
> > >
> > >
> > > Exception in thread "main" org.apache.flink.compiler.CompilerException:
> > > Error: The step function does not reference the solution set.
> > >     at
> > >
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:856)
> > >     at
> > >
> >
> org.apache.flink.compiler.PactCompiler$GraphCreatingVisitor.postVisit(PactCompiler.java:616)
> > >     at
> > >
> >
> org.apache.flink.api.common.operators.DualInputOperator.accept(DualInputOperator.java:283)
> > >     at
> > >
> >
> org.apache.flink.api.common.operators.base.GenericDataSinkBase.accept(GenericDataSinkBase.java:289)
> > >
> > >
> > > If I remove the getInput1() call and uncomment that last line it
> yields.
> > I
> > > was concerned that I was accidentally writing out a null value
> > >
> > > somewhere but I can't find out.
> > >
> > >
> > > Exception in thread "main"
> > > org.apache.flink.runtime.client.JobExecutionException:
> > > java.lang.NullPointerException
> > >     at
> > >
> >
> org.apache.flink.api.java.operators.JoinOperator$ProjectFlatJoinFunction.join(JoinOperator.java:935)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.JoinWithSolutionSetSecondDriver.run(JoinWithSolutionSetSecondDriver.java:143)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:510)
> > >     at
> > >
> >
> org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:137)
> > >     at
> > >
> >
> org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
> > >     at
> > >
> >
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
> > >     at
> > >
> >
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
> > >     at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > > After more investigation it appears that the null pointer exists
> > somewhere
> > > between the the reduceGroup operator and next mapOperator as the next
> > > mapOperator does not run after the reduceGroup.
> > >
> > >
> > >
> > > Thanks,
> > >
> > > Jack
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Delta Iteration Obstuse Error Message

Stephan Ewen
The non-solution set side is preserved in an outer-join fashion.

The solution set is carried forward into the next iteration, but it is not
part of the "outer join" result.
Reply | Threaded
Open this post in threaded view
|

RE: Delta Iteration Obstuse Error Message

Jack David Galilee
Stephan. OK, that sounds very interesting. I'll have to read the code to fully understand.

Robert, in case some newbie like me pops in again. The issue (although the exception's stack trace is not at all informative) is that I had cloned Flink as a git sub-module and JGit was failing  to walk a higher git ref tree for some reason. Common solutions posted on line are to add a random commit, although this didn't work in my case.

Fabian, thanks that's solved the issue. Unfortunately not the algorithm :(

Also. You guys are amazingly hard working.

________________________________________
From: [hidden email] <[hidden email]> on behalf of Stephan Ewen <[hidden email]>
Sent: 28 August 2014 20:30
To: [hidden email]
Subject: Re: Delta Iteration Obstuse Error Message

The non-solution set side is preserved in an outer-join fashion.

The solution set is carried forward into the next iteration, but it is not
part of the "outer join" result.