Zhengchao Shi created FLINK-19452:
-------------------------------------
Summary: statistics of group by CDC data is always 1
Key: FLINK-19452
URL:
https://issues.apache.org/jira/browse/FLINK-19452 Project: Flink
Issue Type: Bug
Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.11.1
Reporter: Zhengchao Shi
Fix For: 1.12.0
When using CDC to do count statistics, if only updates are made to the source table(mysql table), then the value of count is always 1.
{code:sql}
CREATE TABLE orders (
order_number int,
product_id int
) with (
'connector' = 'kafka-0.11',
'topic' = 'Topic',
'properties.bootstrap.servers' = 'localhost:9092',
'properties.group.id' = 'GroupId',
'scan.startup.mode' = 'latest-offset',
'format' = 'canal-json'
);
CREATE TABLE order_test (
order_number int,
order_cnt bigint
) WITH (
'connector' = 'print'
);
INSERT INTO order_test
SELECT order_number, count(1) FROM orders GROUP BY order_number;
{code}
3 records in “orders” :
||order_number||product_id||
|10001|1|
|10001|2|
|10001|3|
now update orders table:
{code:sql}
update orders set product_id = 5 where order_number = 10001;
{code}
the output of is :
-D(10001,1)
+I(10001,1)
-D(10001,1)
+I(10001,1)
-D(10001,1)
+I(10001,1)
i think, the final result is +I(10001, 3)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)