[jira] [Created] (FLINK-11400) JobManagerRunner does not wait for suspension of JobMaster

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11400) JobManagerRunner does not wait for suspension of JobMaster

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-11400:
-------------------------------------

             Summary: JobManagerRunner does not wait for suspension of JobMaster
                 Key: FLINK-11400
                 URL: https://issues.apache.org/jira/browse/FLINK-11400
             Project: Flink
          Issue Type: Bug
          Components: Distributed Coordination
    Affects Versions: 1.7.1, 1.6.3, 1.8.0
            Reporter: Till Rohrmann
            Assignee: Till Rohrmann
             Fix For: 1.8.0


The {{JobManagerRunner}} does not wait for the suspension of the {{JobMaster}} to finish before granting leadership again. This can lead to a state where the {{JobMaster}} tries to start the {{ExecutionGraph}} but the {{SlotPool}} is still stopped.

I suggest to linearize the leadership operations (granting and revoking leadership) similarly to the {{Dispatcher}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)