Till Rohrmann created FLINK-5960:
------------------------------------
Summary: Make CheckpointCoordinator less blocking
Key: FLINK-5960
URL:
https://issues.apache.org/jira/browse/FLINK-5960 Project: Flink
Issue Type: Improvement
Components: State Backends, Checkpointing
Affects Versions: 1.2.0, 1.3.0
Reporter: Till Rohrmann
Currently the {{CheckpointCoordinator}} locks its operation under a global lock. This also includes writing checkpoint data out to a state storage. If this operation blocks, then the whole checkpoint operator stands still. I think we should rework the {{CheckpointCoordinator}} to make fewer assumptions about external systems to tolerate write failures and timeouts. Furthermore, we should try to limit the scope of locking and the execution of potentially blocking operation under the lock. This will improve the runtime behaviour of the {{CheckpointCoordinator}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)