Xingxing Di created FLINK-17228:
-----------------------------------
Summary: Streaming sql with nested GROUP BY got wrong results
Key: FLINK-17228
URL:
https://issues.apache.org/jira/browse/FLINK-17228 Project: Flink
Issue Type: Bug
Components: Table SQL / API, Table SQL / Runtime
Affects Versions: 1.7.2
Environment: Flink 1.7.2
Parallelism is 1
Reporter: Xingxing Di
We are facing an special scenario, *we want to know if this feature is supported*:
First count distinct deviceid for A,B dimensions, then sum up for just A dimension.
Here is SQL:
{code:java}
SELECT dt, SUM(a.uv) AS uv
FROM (
SELECT dt, pvareaid, COUNT(DISTINCT cuid) AS uv
FROM streaming_log_event
WHERE action IN ('action1')
AND pvareaid NOT IN ('pv1', 'pv2')
AND pvareaid IS NOT NULL
GROUP BY dt, pvareaid
) a
GROUP BY dt;{code}
The question is the data emitted to sink was wrong, sink periodically got smaller result ({color:#FF0000}86{color}) which was wrong, here is the log:
{code:java}
2020-04-17 22:28:38,727 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417)
2020-04-17 22:28:38,727 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,130,20200417)
2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,130,20200417)
2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,86,20200417)
2020-04-17 22:28:39,327 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(false,0,86,20200417)
2020-04-17 22:28:39,328 INFO groupBy xx -> to: Tuple2 -> Sink: Unnamed (1/1) (GeneralRedisSinkFunction.invoke:169) - receive data(true,0,131,20200417)
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)