Xintong Song created FLINK-17390:
------------------------------------
Summary: Container resource cannot be mapped on Hadoop 2.10+
Key: FLINK-17390
URL:
https://issues.apache.org/jira/browse/FLINK-17390 Project: Flink
Issue Type: Bug
Components: Deployment / YARN
Affects Versions: 1.11.0
Reporter: Xintong Song
Fix For: 1.11.0
In FLINK-16438, we introduced {{WorkerSpecContainerResourceAdapter}} for mapping Yarn container {{Resource}} with Flink {{WorkerResourceSpec}}. Inside this class, we use {{Resource}} for hash map keys and set elements, assuming that {{Resource}} instances that describes the same set of resources have the same hash code.
This assumption is not always true. {{Resource}} is an abstract class and may have different implementations. In Hadoop 2.10+, {{LightWeightResource}}, a new implementation of {{Resource}}, is introduced for {{Resource}} generated by {{Resource.newInstance}} on the AM side, which overrides the {{hashCode}} method. That means, a {{Resource}} generated on AM may have a different hash code compared to an equal {{Resource}} returned from Yarn.
To solve this problem, we may introduce an {{InternalResource}} as an inner class of {{WorkerSpecContainerResourceAdapter}}, with {{hashCode}} method depends only on the fields needed by Flink (ATM memroy and vcores). {{WorkerSpecContainerResourceAdapter}} should only use {{InternalResource}} for internal state management, and do conversions for {{Resource}} passed into and returned from it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)