[jira] [Created] (FLINK-20864) Apply exact matching rules in fulfilling resource requirement with slot resource

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-20864) Apply exact matching rules in fulfilling resource requirement with slot resource

Shang Yuanchun (Jira)
Yangze Guo created FLINK-20864:
----------------------------------

             Summary: Apply exact matching rules in fulfilling resource requirement with slot resource
                 Key: FLINK-20864
                 URL: https://issues.apache.org/jira/browse/FLINK-20864
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination
            Reporter: Yangze Guo
             Fix For: 1.13.0
         Attachments: 屏幕快照 2021-01-06 下午5.34.17.png

Currently, ResourceProfile::isMatching uses the following rules (hereinafter, *loose matching*) to decide whether a slot resource can be used to fulfill the given resource requirement, in both SlotManager and SlotPool:
 * An unspecified requirement (ResourceProfile::UNKNOWN) can be fulfilled by any resource.
 * A specified requirement can be fulfilled by any resource that is greater than or equal to itself. Note that this rule is not taking effect since there’s no specified requirement atm.

The loose matching rules were designed before the dynamic slot allocation. Under the assumption that resources of slots are decided when the TM is started and cannot be changed, the loose matching rules have the following advantages.
 * For standalone deployments, it allows slot requests to be fulfilled when the slots of pre-launched TMs can hardly have the exact required resources.
 * For active resource manager deployments, it increases the chance of slots being reused, thus reducing the cost of starting new TMs for various resource requirements.

With dynamic slot allocation introduced in FLIP-56, the benefits of the loose matching rules have been significantly reduced. As slots can be dynamically created after the TMs being started, with any desired resources as long as available, the only benefit the loose matching rules retain is to avoid allocating new slots when the slots can be reused on the JM side, which is insignificant since there’s no need to start new TMs.

 

On the other hand, the loose matching rules also introduce some problems.
 * Reusing larger slots for fulfilling smaller requirements can harm resource utilization.
 * It’s not straightforward to always find a feasible matching solution (assuming there is one) when matching a set of requirements and slots, in cases of job failovers or declarative slot allocation protocol.

!屏幕快照 2021-01-06 下午5.34.17.png!

The above figure demonstrates how it could fail to find the feasible matching solution with the loose matching rules. Assuming there are two resource requirements A and B, and there are two slots X and Y. The number below each Requirement/Slot represents the amount of resource. Then A can be fulfilled with X and Y, while B can only be fulfilled with Y. A feasible matching is shown on the left, where both requirements can be fulfilled. However, the loose matching rules can also result in another matching, shown on the right, where A is fulfilled by Y, leaving B and X unmatched. 

Given the reduction of its benefits and the problems it introduced, we proposed to replace the loose matching rules with the following *exact matching* rules.
 * An unspecified requirement (ResourceProfile::UNKNOWN) can only be fulfilled by a TM's default slot resource.
 * A specified requirement can only be fulfilled by a resource that is equal to itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)