[jira] [Created] (FLINK-19568) Offload creating TM launch contexts to the IO executor

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19568) Offload creating TM launch contexts to the IO executor

Shang Yuanchun (Jira)
Xintong Song created FLINK-19568:
------------------------------------

             Summary: Offload creating TM launch contexts to the IO executor
                 Key: FLINK-19568
                 URL: https://issues.apache.org/jira/browse/FLINK-19568
             Project: Flink
          Issue Type: Improvement
          Components: Deployment / YARN
            Reporter: Xintong Song
             Fix For: 1.12.0


Currently, for launching each TM container on Yarn, Flink creates a container launch context in RM's PRC main thread. This includes accessing file status from remote file systems, which may blocks the RM's main thread, especially when remote file system is slow. See [this thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/TM-heartbeat-timeout-due-to-ResourceManager-being-busy-td38626.html].

The creating of TM context does not access nor change any RM's internal states. Therefore, I propose to offload the work to the IO executor. To be specific, I think the entire {{YarnResourceManagerDriver#createTaskExecutorLaunchContext}} can be invoked on the IO executor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)