[jira] [Created] (FLINK-22636) Group job specific ZooKeeper HA services under common jobs/<JobID> zNode

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-22636) Group job specific ZooKeeper HA services under common jobs/<JobID> zNode

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-22636:
-------------------------------------

             Summary: Group job specific ZooKeeper HA services under common jobs/<JobID> zNode
                 Key: FLINK-22636
                 URL: https://issues.apache.org/jira/browse/FLINK-22636
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Coordination
    Affects Versions: 1.12.3, 1.13.0, 1.14.0
            Reporter: Till Rohrmann
             Fix For: 1.14.0


In order to better clean up Zookeeper HA services, I suggest grouping job-specific services under a common {{jobs/<JobID>}} zNode. That way, it becomes trivial to clean up the job-specific Zookeeper data (simply deleting the {{jobs/<JobID>}} node.

Currently, our Zookeeper structure is not really structured well. The current layout looks like this:

{code}
clusterID -> jobgraphs -> <job-id>
                -> checkpoints -> <job-id> -> checkpoint-1
                -> checkpoint-counter -> <job-id> -> counter
                -> leaderlatch -> dispatcher_lock
                                         -> resourc_emanager_lock
                                         -> <job-id>
                -> leader -> dispatcher_lock
                                 -> resource_manager_lock
                                 -> <job-id>
{code}

The new layout could look like this:

{code}
clusterID -> jobgraphs -> <job-id>
                -> jobs -> <job-id> -> checkpoints -> checkpoint-1
                                                  -> checkpoint_id_counter -> counter
                                                  -> leader -> latch
                                                                   -> connection_info
                -> leader -> dispatcher -> latch
                                                         -> connection_info
                                 -> resource_manager -> latch
                                                                       -> connection_info
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)