[jira] [Created] (FLINK-18871) Non-deterministic function could break retract mechanism

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-18871) Non-deterministic function could break retract mechanism

Shang Yuanchun (Jira)
Benchao Li created FLINK-18871:
----------------------------------

             Summary: Non-deterministic function could break retract mechanism
                 Key: FLINK-18871
                 URL: https://issues.apache.org/jira/browse/FLINK-18871
             Project: Flink
          Issue Type: Bug
          Components: Table SQL / API, Table SQL / Runtime
            Reporter: Benchao Li


For example, we have a following SQL:
{code:sql}
create view view1 as
select
  max(a) as m1,
  max(b) as m2 -- b is a timestmap
from T
group by c, d;

create view view2 as
select * from view1
where m2 > CURRENT_TIMESTAMP;

insert into MySink
select sum(m1) as m1
from view2
group by c;
{code}
view1 will produce retract messages, and the same message in view2 maybe produce different results. and the second agg will produce wrong result.
 For example,
{noformat}
view1:
+ (1, 2020-8-10 16:13:00)
- (1, 2020-8-10 16:13:00)
+ (2, 2020-8-10 16:13:10)

view2:
+ (1, 2020-8-10 16:13:00)
- (1, 2020-8-10 16:13:00) // this record may be filtered out
+ (2, 2020-8-10 16:13:10)

MySink:
+ (1, 2020-8-10 16:13:00)
+ (2, 2020-8-10 16:13:10) // will produce wrong result.
{noformat}
In general, the non-deterministic function may break the retract mechanism. All operators downstream which will rely on the retraction mechanism will produce wrong results, or throw exception, such as Agg / some Sink which need retract message / TopN / Window.

(The example above is a simplified version of some production jobs in our scenario, just to explain the core problem)

CC [~ykt836] [~jark]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)