Piotr Nowojski created FLINK-19683:
--------------------------------------
Summary: Actively timeout aligned checkpoints on the output
Key: FLINK-19683
URL:
https://issues.apache.org/jira/browse/FLINK-19683 Project: Flink
Issue Type: Sub-task
Components: Runtime / Checkpointing, Runtime / Task
Affects Versions: 1.12.0
Reporter: Piotr Nowojski
After enqueuing aligned checkpoint barrier on the output, we could register a timeout to check if it was sent downstream within some threshold. If not, we can convert it to unaligned checkpoint.
Note, this will significantly complicate how to execute the actual checkpoint. Namely currently the logic inside `AsyncCheckpointRunnable` is executed as soon as checkpoint is triggered. With the timeout on the outputs, we can not complete the `AsyncCheckpointRunnable` until we know if the timeout happened or not. We would need to register some listener/CompletableFuture tracking if all of the checkpoint barriers were sent down the stream, and the aligned checkpoint can only complete if those futures are completed before the timeout. Otherwise, if timeout happens, we would need to convert the aligned checkpoint on the outputs to unaligned.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)