Yang Wang created FLINK-19544:
---------------------------------
Summary: Implement CheckpointRecoveryFactory based on Kubernetes API
Key: FLINK-19544
URL:
https://issues.apache.org/jira/browse/FLINK-19544 Project: Flink
Issue Type: Sub-task
Components: Deployment / Kubernetes, Runtime / Checkpointing
Reporter: Yang Wang
Fix For: 1.12.0
* *_CheckpointRecoveryFactory_*
* Stores meta information to Zookeeper/ConfigMap for checkpoint recovery.
* Stores the latest checkpoint counter.
Each component(Dispatcher, ResourceManager, JobManager, RestEndpoint) will have a dedicated ConfigMap. All the HA information relevant for a specific component will be stored in a single ConfigMap. The JobManager's ConfigMap would then contain the current leader, the pointers to the checkpoints and the checkpoint ID counter. Since “Get(check the leader)-and-Update(write back to the ConfigMap)” is a transactional operation, we will completely solved the concurrent modification issues and not using the "lock-and-release" in Zookeeper.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)