Removing reduce/aggregations from non-grouped data streams

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Removing reduce/aggregations from non-grouped data streams

Gyula Fóra-2
Hey all,
Currently we have reduce and aggregation methods for non-grouped
DataStreams as well, which will produce local aggregates depending on the
parallelism of the operator.

This behaviour is neither intuitive nor useful as it only produces sensible
results if the user specifically sets the parallelism to 1 which should not
be encouraged.

I would like to remove these methods from the DataStream api and only keep
it for GroupedDataStreams and WindowedDataStream where the aggregation is
either executed per-key or per-window.

Cheers,
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Removing reduce/aggregations from non-grouped data streams

Stephan Ewen
+1 totally agreed

On Mon, Jun 22, 2015 at 5:32 PM, Gyula Fóra <[hidden email]> wrote:

> Hey all,
> Currently we have reduce and aggregation methods for non-grouped
> DataStreams as well, which will produce local aggregates depending on the
> parallelism of the operator.
>
> This behaviour is neither intuitive nor useful as it only produces sensible
> results if the user specifically sets the parallelism to 1 which should not
> be encouraged.
>
> I would like to remove these methods from the DataStream api and only keep
> it for GroupedDataStreams and WindowedDataStream where the aggregation is
> either executed per-key or per-window.
>
> Cheers,
> Gyula
>
Reply | Threaded
Open this post in threaded view
|

Re: Removing reduce/aggregations from non-grouped data streams

Gyula Fóra
I opened a PR <https://github.com/apache/flink/pull/860> for this.

Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. jún. 22., H,
19:25):

> +1 totally agreed
>
> On Mon, Jun 22, 2015 at 5:32 PM, Gyula Fóra <[hidden email]> wrote:
>
> > Hey all,
> > Currently we have reduce and aggregation methods for non-grouped
> > DataStreams as well, which will produce local aggregates depending on the
> > parallelism of the operator.
> >
> > This behaviour is neither intuitive nor useful as it only produces
> sensible
> > results if the user specifically sets the parallelism to 1 which should
> not
> > be encouraged.
> >
> > I would like to remove these methods from the DataStream api and only
> keep
> > it for GroupedDataStreams and WindowedDataStream where the aggregation is
> > either executed per-key or per-window.
> >
> > Cheers,
> > Gyula
> >
>