[jira] [Created] (FLINK-14594) Fix matching logics of ResourceSpec/ResourceProfile/Resource considering double values

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14594) Fix matching logics of ResourceSpec/ResourceProfile/Resource considering double values

Shang Yuanchun (Jira)
Zhu Zhu created FLINK-14594:
-------------------------------

             Summary: Fix matching logics of ResourceSpec/ResourceProfile/Resource considering double values
                 Key: FLINK-14594
                 URL: https://issues.apache.org/jira/browse/FLINK-14594
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination
    Affects Versions: 1.10.0
            Reporter: Zhu Zhu
             Fix For: 1.10.0


There are resources of double type values, like cpuCores in ResourceSpec/ResourceProfiles or all extended resources. These values can be generated via a merge or subtract, so that there can be small deltas.

Currently, in resource matching, these resources are matched without considering the deltas, which may result in issues as below:
1. A shared slot cannot fulfill a slot request even if it should be able to (because it is possible that {{(d1 + d2) - d1 < d2}} for double values)
2. if a shared slot is used up, an unexpected error may occur when calculating its remaining resources in SlotSharingManager#listResolvedRootSlotInfo -> ResourceProfile#subtract
3. an unexpected error may happen when releasing a single task slot from a shared slot (in ResourceProfile#subtract)

To solve this issue, I'd propose to:
1.  Introduce a ResourceValue which stores a double value and its acceptable precision (the same kind of resource should use the same precision). It provides {{compareTo}} method, in which two ResourceValues are considered equal if the subtracted abs does not exceed the precision. It also provides merge/subtract/validation operations.
2. ResourceSpec/ResourceProfile uses ResourceValue for cpuCores and fix related logics(ctor/validation/subtract/matching). The usages of {{equals}} should be replaced with another method {{hasSameResources}} which considers the precision.
3. Resource uses ResourceValue to store its value. Also fix related logics.

cc [~trohrmann] [~azagrebin] [~xintongsong]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)