Jingsong Lee created FLINK-12936:
------------------------------------
Summary: Support intersect all and minus all to blink planner
Key: FLINK-12936
URL:
https://issues.apache.org/jira/browse/FLINK-12936 Project: Flink
Issue Type: New Feature
Components: Table SQL / Planner
Reporter: Jingsong Lee
Assignee: Jingsong Lee
Now, we just support intersect and minus, See ReplaceIntersectWithSemiJoinRule and ReplaceMinusWithAntiJoinRule, replace intersect with null aware semi-join and distinct aggregate.
We need support intersect all and minus all too.
Presto and Spark already support them:
[
https://github.com/prestodb/presto/issues/4918]
https://issues.apache.org/jira/browse/SPARK-21274I think them have a good rewrite design and we can follow them:
1.For intersect all
Input Query
{code:java}
SELECT c1 FROM ut1 INTERSECT ALL SELECT c1 FROM ut2
{code}
Rewritten Query
{code:java}
SELECT c1
FROM (
SELECT replicate_row(min_count, c1)
FROM (
SELECT c1,
IF (vcol1_cnt > vcol2_cnt, vcol2_cnt, vcol1_cnt) AS min_count
FROM (
SELECT c1, count(vcol1) as vcol1_cnt, count(vcol2) as vcol2_cnt
FROM (
SELECT c1, true as vcol1, null as vcol2 FROM ut1
UNION ALL
SELECT c1, null as vcol1, true as vcol2 FROM ut2
) AS union_all
GROUP BY c1
HAVING vcol1_cnt >= 1 AND vcol2_cnt >= 1
)
)
)
{code}
2.For minus all:
Input Query
{code:java}
SELECT c1 FROM ut1 EXCEPT ALL SELECT c1 FROM ut2
{code}
Rewritten Query
{code:java}
SELECT c1
FROM (
SELECT replicate_rows(sum_val, c1)
FROM (
SELECT c1, sum_val
FROM (
SELECT c1, sum(vcol) AS sum_val
FROM (
SELECT 1L as vcol, c1 FROM ut1
UNION ALL
SELECT -1L as vcol, c1 FROM ut2
) AS union_all
GROUP BY union_all.c1
)
WHERE sum_val > 0
)
)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)