[jira] [Created] (FLINK-5960) Make CheckpointCoordinator less blocking

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-5960) Make CheckpointCoordinator less blocking

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-5960:
------------------------------------

             Summary: Make CheckpointCoordinator less blocking
                 Key: FLINK-5960
                 URL: https://issues.apache.org/jira/browse/FLINK-5960
             Project: Flink
          Issue Type: Improvement
          Components: State Backends, Checkpointing
    Affects Versions: 1.2.0, 1.3.0
            Reporter: Till Rohrmann


Currently the {{CheckpointCoordinator}} locks its operation under a global lock. This also includes writing checkpoint data out to a state storage. If this operation blocks, then the whole checkpoint operator stands still. I think we should rework the {{CheckpointCoordinator}} to make fewer assumptions about external systems to tolerate write failures and timeouts. Furthermore, we should try to limit the scope of locking and the execution of potentially blocking operation under the lock. This will improve the runtime behaviour of the {{CheckpointCoordinator}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)