Till Rohrmann created FLINK-5940:
------------------------------------
Summary: ZooKeeperCompletedCheckpointStore cannot handle broken state handles
Key: FLINK-5940
URL:
https://issues.apache.org/jira/browse/FLINK-5940 Project: Flink
Issue Type: Bug
Components: State Backends, Checkpointing
Affects Versions: 1.1.4, 1.2.0, 1.3.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
The {{ZooKeeperCompletedCheckpointStore}} reads a set of {{RetrievableStateHandles}} from ZooKeeper upon recovery. It then tries to retrieve the {{CompletedCheckpoint}} from the latest state handle. If the retrieve operation fails, then the whole recovery of completed checkpoints fails even though the store might have read older state handles from ZooKeeper.
I propose to harden the behaviour by removing broken state handles and returning the first successfully retrieved {{CompletedCheckpoint}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)