Elias Levy created FLINK-11517:
----------------------------------
Summary: Inefficient window state access when using RocksDB state backend
Key: FLINK-11517
URL:
https://issues.apache.org/jira/browse/FLINK-11517 Project: Flink
Issue Type: Bug
Reporter: Elias Levy
When using an aggregate function on a window with a process function and the RocksDB state backend, state access is inefficient.
The WindowOperator calls windowState.add to merge the new element using the aggregate function. The add method of RocksDBAggregatingState will read the state, deserialize the state, call the aggregate function, deserialize the state, and write it out.
If the trigger decides the window must be fired, as the the windowState.add does not return the state, the WindowOperator must call windowState.get to get it and pass it to the window process function, resulting in another read and deserialization.
Finally, while the state is not passed in to the trigger, in some cases the trigger may have a need to access the state. That is our case. As the state is not passed to the trigger, we must read and deserialize the state one more from within the trigger.
Thus, state must be read and deserialized three times to process a single element. If the state is large, this can be quite costly.
Ideally windowState.add would return the state, so that the WindowOperator can pass it to the process function without having to read it again. Additionally, the state would be made available to the trigger to enable more use cases without having to go through the state descriptor again.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)