Zili Chen created FLINK-14434:
---------------------------------
Summary: Dispatcher#createJobManagerRunner should returns on creation succeed, not after startJobManagerRunner
Key: FLINK-14434
URL:
https://issues.apache.org/jira/browse/FLINK-14434 Project: Flink
Issue Type: Bug
Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Zili Chen
Assignee: Zili Chen
Fix For: 1.10.0
In an edge case, let's said
1) job finished nearly immediately
2) Dispatcher has been suspended in {{#startJobManagerRunner}} after {{jobManagerRunner.start();}} but before {{return jobManagerRunner;}}
due to
1) we put {{jobManagerRunnerFutures}} with {{#startJobManagerRunner}} finished.
2) the creation of JobManagerRunner doesn't happen in MainThread.
it is a possible execution order
1) JobManagerRunner created in akka-dispatcher thread
2) then apply {{Dispatcher#startJobManagerRunner}}
3) until {{jobManagerRunner.start();}} and before {{return jobManagerRunner;}}
4) this thread suspended
5) job finished, execute callback on MainThread
6) {{jobManagerRunnerFutures.get(jobID).getNow(null)}} returns {{null}} because akka-dispatcher thread doesn't {{return jobManagerRunner;}}
7) it report {{There is a newer JobManagerRunner for the job}} but actually not.
**Solution**
Two perspective but we can even have them both.
1. return {{jobManagerRunnerFuture}} in {{#createJobManagerRunner}}, let {{#startJobManagerRunner}} an action
2. on JobManagerRunner created, execute {{#startJobManagerRunner}} in MainThread.
CC [~trohrmann]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)