[jira] [Created] (FLINK-19683) Actively timeout aligned checkpoints on the output

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19683) Actively timeout aligned checkpoints on the output

Shang Yuanchun (Jira)
Piotr Nowojski created FLINK-19683:
--------------------------------------

             Summary: Actively timeout aligned checkpoints on the output
                 Key: FLINK-19683
                 URL: https://issues.apache.org/jira/browse/FLINK-19683
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Checkpointing, Runtime / Task
    Affects Versions: 1.12.0
            Reporter: Piotr Nowojski


After enqueuing aligned checkpoint barrier on the output, we could register a timeout to check if it was sent downstream within some threshold. If not, we can convert it to unaligned checkpoint.

Note, this will significantly complicate how to execute the actual checkpoint. Namely currently the logic inside `AsyncCheckpointRunnable` is executed as soon as checkpoint is triggered. With the timeout on the outputs, we can not complete the `AsyncCheckpointRunnable` until we know if the timeout happened or not. We would need to register some listener/CompletableFuture tracking if all of the checkpoint barriers were sent down the stream, and the aligned checkpoint can only complete if those futures are completed before the timeout. Otherwise, if timeout happens, we would need to convert the aligned checkpoint on the outputs to unaligned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)