[jira] [Created] (FLINK-19896) Support first-n-rows deduplication in the Deduplicate operator

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19896) Support first-n-rows deduplication in the Deduplicate operator

Shang Yuanchun (Jira)
Jun Zhang created FLINK-19896:
---------------------------------

             Summary: Support first-n-rows deduplication in the Deduplicate operator
                 Key: FLINK-19896
                 URL: https://issues.apache.org/jira/browse/FLINK-19896
             Project: Flink
          Issue Type: Improvement
          Components: Table SQL / Runtime
    Affects Versions: 1.12.0, 1.11.3
            Reporter: Jun Zhang
             Fix For: 1.11.2


Currently Deduplicate operator only supports first-row deduplication (ordered by proc-time). In scenario of first-n-rows deduplication, the planner has to resort to Rank operator.  However, Rank operator is less efficient than Deduplicate in terms of state consumption.

This issue proposes to extend DeduplicateKeepFirstRowFunction to support first-n-rows deduplication.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)