[jira] [Created] (FLINK-4356) JobMaster HA

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-4356) JobMaster HA

Shang Yuanchun (Jira)
zhangjing created FLINK-4356:
--------------------------------

             Summary: JobMaster HA
                 Key: FLINK-4356
                 URL: https://issues.apache.org/jira/browse/FLINK-4356
             Project: Flink
          Issue Type: Sub-task
            Reporter: zhangjing


1. for standalone mode, LocalDispatcher watch JobMaster
LocalDispatcher detect the failure of JobMaster,  recover jobGraph and Libraries from persistent storage, spawn a new JobManager
new JobMaster compete for leadership, save address to zookeeper storage
new JobMaster registers at ResourceManager
new JobMaster  recover Execution of its job (execution graph) from latest completed checkpoint
2. for yarn mode, YarnApplicationMasterRunner create a ProcessReaper of JobMaster
ProcessReaper monitor JobMaster, kill JVM upon JobMaster termination
Yarn will create a new AppMaster which contains a new JobManager, JobGraph and Libraries are retrieved as startup artifacts
new JobMaster compete for leadership, save address to zookeeper storage
new JobMaster registers at ResourceManager
new JobMaster  recover Execution of its job (execution graph) from latest completed checkpoint



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)