Unit testing Flink programs / DataSet operations

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Unit testing Flink programs / DataSet operations

Viktor Rosenfeld
Hi everybody,

I have the following test case prototype and I want to verify that sum() actually computes the sum.

    @Test
    public void shouldComputeSum() throws Exception {
        // given
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        DataSet<Tuple1<Long>> input = env.fromElements(
                new Tuple1<Long>(1L),
                new Tuple1<Long>(2L),
                new Tuple1<Long>(3L));

        // when
        DataSet<Tuple1<Long>> result = input.sum(0);

        // then
        // verify that result is 6
    }

I found AggregateTranslationTest where a program plan is created and then the sink is accessed to verify some structure on the output operator. Using this as a starting point, I wrote the following code:

        // verify that the result is 6
        OutputFormat<Tuple1<Long>> outputFormat = mock(OutputFormat.class, withSettings().serializable());
        output.output(outputFormat);
        env.execute("ComputeCountTest");
        verify(outputFormat).writeRecord(new Tuple1<Long>(6L));

I encountered a few problems:

- I can't run this test code from the flink-java module because env.execute() requires flink-clients which leads to a circular dependency.

- The outputFormat needs to be serializable; luckily Mockito supports this even though they consider it a code smell but that can be argued.

- It doesn't actually work. Mockito prints:

    Wanted but not invoked:
    outputFormat.writeRecord((6));
    -> at org.apache.flink.api.java.operator.MyAggregateOperatorTest.shouldComputeSum(MyAggregateOperatorTest.java:31)
    Actually, there were zero interactions with this mock.

I suspect env.execute() is non-blocking and that there's a race condition.

Executing a whole Flink program is probably too heavyweight for a unit test but I wanted to use it as a starting point. I also found two other methods to test operator code but I'm not sure which is the preferred way:

- MapTest: invokes a Map operator on a collection using MockInvokable.createAndExecute()

- MapOperatorTest: invokes a Map operator op on a collection using op.executeOnCollection()

So, my question is basically if there's a best practice in the Flink code base to write a unit test similar to the one above.

Best,
Viktor
Reply | Threaded
Open this post in threaded view
|

Re: Unit testing Flink programs / DataSet operations

Stephan Ewen
Hey!

Why don't you simply run this program and verify that the result is 6?

You can use the "LocalCollectionOutputFormat" to collect the results (in
your case the one value) and compare it.

Stephan



On Wed, Nov 5, 2014 at 1:44 PM, Viktor Rosenfled <
[hidden email]> wrote:

> Hi everybody,
>
> I have the following test case prototype and I want to verify that sum()
> actually computes the sum.
>
>     @Test
>     public void shouldComputeSum() throws Exception {
>         // given
>         ExecutionEnvironment env =
> ExecutionEnvironment.getExecutionEnvironment();
>         DataSet<Tuple1&lt;Long>> input = env.fromElements(
>                 new Tuple1<Long>(1L),
>                 new Tuple1<Long>(2L),
>                 new Tuple1<Long>(3L));
>
>         // when
>         DataSet<Tuple1&lt;Long>> result = input.sum(0);
>
>         // then
>         // verify that result is 6
>     }
>
> I found AggregateTranslationTest where a program plan is created and then
> the sink is accessed to verify some structure on the output operator. Using
> this as a starting point, I wrote the following code:
>
>         // verify that the result is 6
>         OutputFormat<Tuple1&lt;Long>> outputFormat =
> mock(OutputFormat.class, withSettings().serializable());
>         output.output(outputFormat);
>         env.execute("ComputeCountTest");
>         verify(outputFormat).writeRecord(new Tuple1<Long>(6L));
>
> I encountered a few problems:
>
> - I can't run this test code from the flink-java module because
> env.execute() requires flink-clients which leads to a circular dependency.
>
> - The outputFormat needs to be serializable; luckily Mockito supports this
> even though they consider it a code smell but that can be argued.
>
> - It doesn't actually work. Mockito prints:
>
>     Wanted but not invoked:
>     outputFormat.writeRecord((6));
>     -> at
>
> org.apache.flink.api.java.operator.MyAggregateOperatorTest.shouldComputeSum(MyAggregateOperatorTest.java:31)
>     Actually, there were zero interactions with this mock.
>
> I suspect env.execute() is non-blocking and that there's a race condition.
>
> Executing a whole Flink program is probably too heavyweight for a unit test
> but I wanted to use it as a starting point. I also found two other methods
> to test operator code but I'm not sure which is the preferred way:
>
> - MapTest: invokes a Map operator on a collection using
> MockInvokable.createAndExecute()
>
> - MapOperatorTest: invokes a Map operator op on a collection using
> op.executeOnCollection()
>
> So, my question is basically if there's a best practice in the Flink code
> base to write a unit test similar to the one above.
>
> Best,
> Viktor
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Unit-testing-Flink-programs-DataSet-operations-tp2371.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Unit testing Flink programs / DataSet operations

Viktor Rosenfeld
Hi Stephan,

Stephan Ewen wrote
Why don't you simply run this program and verify that the result is 6?
You mean verify by hand? I want to automate that.

You can use the "LocalCollectionOutputFormat" to collect the results (in
your case the one value) and compare it.
Thanks, that's what I was looking for!

Best,
Viktor
Reply | Threaded
Open this post in threaded view
|

Re: Unit testing Flink programs / DataSet operations

Stephan Ewen
You can have a look at this example, this grabs the output for verification:

https://github.com/apache/incubator-flink/blob/master/flink-tests/src/test/java/org/apache/flink/test/broadcastvars/BroadcastVarInitializationITCase.java

On Wed, Nov 5, 2014 at 2:13 PM, Viktor Rosenfeld <
[hidden email]> wrote:

> Hi Stephan,
>
>
> Stephan Ewen wrote
> > Why don't you simply run this program and verify that the result is 6?
>
> You mean verify by hand? I want to automate that.
>
>
> > You can use the "LocalCollectionOutputFormat" to collect the results (in
> > your case the one value) and compare it.
>
> Thanks, that's what I was looking for!
>
> Best,
> Viktor
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Unit-testing-Flink-programs-DataSet-operations-tp2371p2374.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>