[jira] [Created] (FLINK-19544) Implement CheckpointRecoveryFactory based on Kubernetes API

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19544) Implement CheckpointRecoveryFactory based on Kubernetes API

Shang Yuanchun (Jira)
Yang Wang created FLINK-19544:
---------------------------------

             Summary: Implement CheckpointRecoveryFactory based on Kubernetes API
                 Key: FLINK-19544
                 URL: https://issues.apache.org/jira/browse/FLINK-19544
             Project: Flink
          Issue Type: Sub-task
          Components: Deployment / Kubernetes, Runtime / Checkpointing
            Reporter: Yang Wang
             Fix For: 1.12.0


* *_CheckpointRecoveryFactory_*
 * Stores meta information to Zookeeper/ConfigMap for checkpoint recovery.
 * Stores the latest checkpoint counter.

Each component(Dispatcher, ResourceManager, JobManager, RestEndpoint) will have a dedicated ConfigMap. All the HA information relevant for a specific component will be stored in a single ConfigMap. The JobManager's ConfigMap would then contain the current leader, the pointers to the checkpoints and the checkpoint ID counter. Since “Get(check the leader)-and-Update(write back to the ConfigMap)” is a transactional operation, we will completely solved the concurrent modification issues and not using the "lock-and-release" in Zookeeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)