[jira] [Created] (FLINK-9673) Improve State efficiency of bounded OVER window operators

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-9673) Improve State efficiency of bounded OVER window operators

Shang Yuanchun (Jira)
Fabian Hueske created FLINK-9673:
------------------------------------

             Summary: Improve State efficiency of bounded OVER window operators
                 Key: FLINK-9673
                 URL: https://issues.apache.org/jira/browse/FLINK-9673
             Project: Flink
          Issue Type: Improvement
          Components: Table API & SQL
            Reporter: Fabian Hueske


Currently, the implementations of bounded OVER window aggregations store the complete input for the bound interval. For example for the query:

{code}
SELECT user_id, count(action) OVER (PARTITION BY user_id ORDER BY rowtime RANGE INTERVAL '14' DAY PRECEDING) action_count, rowtime
FROM
    SELECT rowtime, user_id, action, val1, val2, val3, val4 FROM user
{code}

The whole records with schema {{(rowtime, user_id, action, val1, val2, val3, val4)}} are stored for 14 days in order to retract them after 14 days from the accumulators.

However, it would be sufficient to only store those fields that are required for the aggregtions, i.e., {{action}} in the example above. All other fields could be set to {{null}} and hence significantly reduce the amount of data that needs to be stored in state.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)