Xintong Song created FLINK-19324:
------------------------------------
Summary: Map requested/allocated containers with priority on YARN
Key: FLINK-19324
URL:
https://issues.apache.org/jira/browse/FLINK-19324 Project: Flink
Issue Type: Bug
Components: Deployment / YARN
Reporter: Xintong Song
In the design doc of FLINK-14106, there was a [discussion|
https://docs.google.com/document/d/1f8imSus3QwKEUPAldzR8CSMjZ-2a9O17-rn4oeKGtqw/edit?disco=AAAAGPX_tmg] on how we map allocated containers with the requested ones on YARN. We rejected the design option that uses container priorities for mapping containers of different resources, because we do not want to priorities different container requests (which is the original purpose for this field). As a result, we have to interpret how the requested container request would be normalized by Yarn, and map the allocated/requested containers accordingly, which is complicated and fragile. See also FLINK-19151.
Recently in our POC for fine grained resource management, we surprisingly discovered that Yarn actually doesn't work with container requests same priority and different resources. I do not find this described as an official protocol in any Yarn's documents. The issue has been raised in early Yarn versions (YARN-314) and has not been fixed util Hadoop 2.9 when {{allocationRequestId}} is introduced. In Hadoop 2.8, Yarn scheduler is still internally using priority as the key of a container request (see [AppSchedulingInfo#updateResourceRequests |
https://github.com/apache/hadoop/blob/eb818cdc64336ade273a960ba3b9b5a5d0c4d4ec/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java#L341]), thus requests same priority and different resources would overwrite each other.
The new discovery suggests that, if we want to support containers with different resources on Hadoop 2.8 and earlier versions, we have to give them different priorities anyway. Thus, I would suggest to get rid of the container normalization simulation and go back to the previously rejected priority based design option.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)