[jira] [Created] (FLINK-14434) Dispatcher#createJobManagerRunner should returns on creation succeed, not after startJobManagerRunner

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14434) Dispatcher#createJobManagerRunner should returns on creation succeed, not after startJobManagerRunner

Shang Yuanchun (Jira)
Zili Chen created FLINK-14434:
---------------------------------

             Summary: Dispatcher#createJobManagerRunner should returns on creation succeed, not after startJobManagerRunner
                 Key: FLINK-14434
                 URL: https://issues.apache.org/jira/browse/FLINK-14434
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Coordination
    Affects Versions: 1.10.0
            Reporter: Zili Chen
            Assignee: Zili Chen
             Fix For: 1.10.0


In an edge case, let's said

1) job finished nearly immediately
2) Dispatcher has been suspended in {{#startJobManagerRunner}} after {{jobManagerRunner.start();}} but before {{return jobManagerRunner;}}

due to

1) we put {{jobManagerRunnerFutures}} with {{#startJobManagerRunner}} finished.
2) the creation of JobManagerRunner doesn't happen in MainThread.

it is a possible execution order

1) JobManagerRunner created in akka-dispatcher thread
2) then apply {{Dispatcher#startJobManagerRunner}}
3) until {{jobManagerRunner.start();}} and before {{return jobManagerRunner;}}
4) this thread suspended
5) job finished, execute callback on MainThread
6) {{jobManagerRunnerFutures.get(jobID).getNow(null)}} returns {{null}} because akka-dispatcher thread doesn't {{return jobManagerRunner;}}
7) it report {{There is a newer JobManagerRunner for the job}} but actually not.

**Solution**

Two perspective but we can even have them both.

1. return {{jobManagerRunnerFuture}} in {{#createJobManagerRunner}}, let {{#startJobManagerRunner}} an action
2. on JobManagerRunner created, execute {{#startJobManagerRunner}} in MainThread.

CC [~trohrmann]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)