[jira] [Created] (FLINK-19324) Map requested/allocated containers with priority on YARN

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19324) Map requested/allocated containers with priority on YARN

Shang Yuanchun (Jira)
Xintong Song created FLINK-19324:
------------------------------------

             Summary: Map requested/allocated containers with priority on YARN
                 Key: FLINK-19324
                 URL: https://issues.apache.org/jira/browse/FLINK-19324
             Project: Flink
          Issue Type: Bug
          Components: Deployment / YARN
            Reporter: Xintong Song


In the design doc of FLINK-14106, there was a [discussion|https://docs.google.com/document/d/1f8imSus3QwKEUPAldzR8CSMjZ-2a9O17-rn4oeKGtqw/edit?disco=AAAAGPX_tmg] on how we map allocated containers with the requested ones on YARN. We rejected the design option that uses container priorities for mapping containers of different resources, because we do not want to priorities different container requests (which is the original purpose for this field). As a result, we have to interpret how the requested container request would be normalized by Yarn, and map the allocated/requested containers accordingly, which is complicated and fragile. See also FLINK-19151.

Recently in our POC for fine grained resource management, we surprisingly discovered that Yarn actually doesn't work with container requests same priority and different resources. I do not find this described as an official protocol in any Yarn's documents. The issue has been raised in early Yarn versions (YARN-314) and has not been fixed util Hadoop 2.9 when {{allocationRequestId}} is introduced. In Hadoop 2.8, Yarn scheduler is still internally using priority as the key of a container request (see [AppSchedulingInfo#updateResourceRequests |https://github.com/apache/hadoop/blob/eb818cdc64336ade273a960ba3b9b5a5d0c4d4ec/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AppSchedulingInfo.java#L341]), thus requests same priority and different resources would overwrite each other.

The new discovery suggests that, if we want to support containers with different resources on Hadoop 2.8 and earlier versions, we have to give them different priorities anyway. Thus, I would suggest to get rid of the container normalization simulation and go back to the previously rejected priority based design option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)