Xiang Gao created FLINK-19300:
---------------------------------
Summary: Timer loss after restoring from savepoint
Key: FLINK-19300
URL:
https://issues.apache.org/jira/browse/FLINK-19300 Project: Flink
Issue Type: Bug
Components: Runtime / State Backends
Reporter: Xiang Gao
While using heap-based timers, we are seeing occasional timer loss after restoring program from savepoint, especially when using a remote savepoint storage (s3).
After some investigation, the issue seems to be related to [this line in deserialization|
https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/io/PostVersionedIOReadableWritable.java#L65]. When try checking the VERSIONED_IDENTIFIER, the input stream may not guarantee filling the byte array, causing timers to be dropped for the affected key group.
Should consider reading until expected number of bytes are read or if end of the stream has been reached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)