Performing consecutive Action operators

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Performing consecutive Action operators

Le Quoc Do
Hi all,

Right now, in Flink, if I call to 2 action operators (print, count,
collect, ) consecutively, Flink will create 2 independent execution plans.
A simple example:

             DataSet<String> text = env.fromElements(

                               "Some text ….",

                               );

              DataSet<Tuple2<String, Integer>> result =

                               text.flatMap(new Tokenizer())

                               .groupBy(0)

                               .sum(1);

  result.count();

  result.print();

Could you please show me the way to force Flink create only one execution
plan, so that the later operator just reuse the result of previous one, and
doesn’t need to re-compute again?

Thanks,
Do
Reply | Threaded
Open this post in threaded view
|

Re: Performing consecutive Action operators

Till Rohrmann
Hi Do,

the easiest way is to avoid using methods which trigger an eager execution
(collect, count, print) but to define sinks instead. Alternatively, you can
persist intermediate results by writing them to disk and continue
processing from there. That way, you won't re-calculate all parts of your
dataflow.

Cheers,
Till

On Thu, Mar 31, 2016 at 3:09 PM, Le Quoc Do <[hidden email]> wrote:

> Hi all,
>
> Right now, in Flink, if I call to 2 action operators (print, count,
> collect, ) consecutively, Flink will create 2 independent execution plans.
> A simple example:
>
>              DataSet<String> text = env.fromElements(
>
>                                "Some text ….",
>
>                                );
>
>               DataSet<Tuple2<String, Integer>> result =
>
>                                text.flatMap(new Tokenizer())
>
>                                .groupBy(0)
>
>                                .sum(1);
>
>   result.count();
>
>   result.print();
>
> Could you please show me the way to force Flink create only one execution
> plan, so that the later operator just reuse the result of previous one, and
> doesn’t need to re-compute again?
>
> Thanks,
> Do
>