[jira] [Created] (FLINK-17390) Container resource cannot be mapped on Hadoop 2.10+

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-17390) Container resource cannot be mapped on Hadoop 2.10+

Shang Yuanchun (Jira)
Xintong Song created FLINK-17390:
------------------------------------

             Summary: Container resource cannot be mapped on Hadoop 2.10+
                 Key: FLINK-17390
                 URL: https://issues.apache.org/jira/browse/FLINK-17390
             Project: Flink
          Issue Type: Bug
          Components: Deployment / YARN
    Affects Versions: 1.11.0
            Reporter: Xintong Song
             Fix For: 1.11.0


In FLINK-16438, we introduced {{WorkerSpecContainerResourceAdapter}} for mapping Yarn container {{Resource}} with Flink {{WorkerResourceSpec}}. Inside this class, we use {{Resource}} for hash map keys and set elements, assuming that {{Resource}} instances that describes the same set of resources have the same hash code.

This assumption is not always true. {{Resource}} is an abstract class and may have different implementations. In Hadoop 2.10+, {{LightWeightResource}}, a new implementation of {{Resource}}, is introduced for {{Resource}} generated by {{Resource.newInstance}} on the AM side, which overrides the {{hashCode}} method. That means, a {{Resource}} generated on AM may have a different hash code compared to an equal {{Resource}} returned from Yarn.

To solve this problem, we may introduce an {{InternalResource}} as an inner class of {{WorkerSpecContainerResourceAdapter}}, with {{hashCode}} method depends only on the fields needed by Flink (ATM memroy and vcores). {{WorkerSpecContainerResourceAdapter}} should only use {{InternalResource}} for internal state management, and do conversions for {{Resource}} passed into and returned from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)