Yangze Guo created FLINK-20865:
----------------------------------
Summary: Prevent potential resource deadlock in fine-grained resource management
Key: FLINK-20865
URL:
https://issues.apache.org/jira/browse/FLINK-20865 Project: Flink
Issue Type: Improvement
Components: Runtime / Coordination
Reporter: Yangze Guo
Fix For: 1.13.0
Attachments: 屏幕快照 2021-01-06 下午2.32.57.png
!屏幕快照 2021-01-06 下午2.32.57.png|width=954,height=288!
The above figure demonstrates a potential case of deadlock due to scheduling dependency. For the given topology, initially the scheduler will request 4 slots, for A, B, C and D. Assuming only 2 slots are available, if both slots are assigned to Pipeline Region 0 (as shown on the left), A and B will first finish execution, then C and D will be executed, and finally E will be executed. However, if in the beginning the 2 slots are assigned to A and C (as shown on the right), then neither of A and C can finish execution due to missing B and D consuming the data they produced.
Currently, with coarse-grained resource management, the scheduler guarantees to always finish fulfilling requirements of one pipeline region before starting to fulfill requirements of another. That means the deadlock case shown on the right of the above figure can never happen.
However, there’s no such guarantee in fine-grained resource management. Since resource requirements for SSGs can be different, there’s no control on which requirements will be fulfilled first, when there’s not enough resources to fulfill all the requirements. Therefore, it’s not always possible to fulfill one pipeline region prior to another.
To solve this problem, we can make the scheduler defer requesting slots for other SSGs before requirements of the current SSG are fulfilled, for fine-grained resource management, at the price of more scheduling time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)