[jira] [Created] (FLINK-11537) ExecutionGraph does not reach terminal state when JobMaster lost leadership

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11537) ExecutionGraph does not reach terminal state when JobMaster lost leadership

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-11537:
-------------------------------------

             Summary: ExecutionGraph does not reach terminal state when JobMaster lost leadership
                 Key: FLINK-11537
                 URL: https://issues.apache.org/jira/browse/FLINK-11537
             Project: Flink
          Issue Type: Bug
          Components: Distributed Coordination
    Affects Versions: 1.8.0
            Reporter: Till Rohrmann
            Assignee: Till Rohrmann
             Fix For: 1.8.0


The {{ExecutionGraph}} sometimes does not reach a terminal state if the {{JobMaster}} lost the leadership. The reason is that we use the fenced main thread executor to execute {{ExecutionGraph}} changes and we don't wait for the {{ExecutionGraph}} to reach the terminal state before we set the fencing token {{null}}.

One possible solution would be to wait for the {{ExecutionGraph}} to reach the terminal state before clearing the fencing token. This has, however, the downside that the {{JobMaster}} is still reachable until the {{ExecutionGraph}} has been properly terminated. Alternatively, we could use the unfenced main thread executor to send the cancel calls out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)