Xintong Song created FLINK-19568:
------------------------------------
Summary: Offload creating TM launch contexts to the IO executor
Key: FLINK-19568
URL:
https://issues.apache.org/jira/browse/FLINK-19568 Project: Flink
Issue Type: Improvement
Components: Deployment / YARN
Reporter: Xintong Song
Fix For: 1.12.0
Currently, for launching each TM container on Yarn, Flink creates a container launch context in RM's PRC main thread. This includes accessing file status from remote file systems, which may blocks the RM's main thread, especially when remote file system is slow. See [this thread|
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/TM-heartbeat-timeout-due-to-ResourceManager-being-busy-td38626.html].
The creating of TM context does not access nor change any RM's internal states. Therefore, I propose to offload the work to the IO executor. To be specific, I think the entire {{YarnResourceManagerDriver#createTaskExecutorLaunchContext}} can be invoked on the IO executor.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)