Fabian Hueske created FLINK-1267:
------------------------------------
Summary: Add crossGroup operator
Key: FLINK-1267
URL:
https://issues.apache.org/jira/browse/FLINK-1267 Project: Flink
Issue Type: New Feature
Components: Java API, Local Runtime, Optimizer, Scala API
Affects Versions: 0.7.0-incubating
Reporter: Fabian Hueske
Priority: Minor
A common operator is to pair-wise compare or combine all elements of a group (there were two questions about this on the user mailing list, recently). Right now, this can be done in two ways:
1. {{groupReduce}}: consume and store the complete iterator in memory and build all pairs
2. do a self-{{Join}}: the engine builds all pairs of the full symmetric Cartesian product.
Both approaches have drawbacks. The {{groupReduce}} variant requires that the full group fits into memory and is more cumbersome to implement for the user, but pairs can be arbitrarily built. The self-{{Join}} approach pushes most of the work into the system, but the execution strategy does not treat the self-join different from a regular join (both identical inputs are shuffled, etc.) and always builds the full symmetric Cartesian product.
I propose to add a dedicated {{crossGroup()}} operator, that offers this functionality in a proper way.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)