[jira] [Created] (FLINK-7851) Improve scheduling balance in case of fewer sub tasks than input operator

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-7851) Improve scheduling balance in case of fewer sub tasks than input operator

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-7851:
------------------------------------

             Summary: Improve scheduling balance in case of fewer sub tasks than input operator
                 Key: FLINK-7851
                 URL: https://issues.apache.org/jira/browse/FLINK-7851
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Coordination
    Affects Versions: 1.3.2, 1.4.0
            Reporter: Till Rohrmann
             Fix For: 1.4.0


When having a job where we have a mapper {{m1}} running with dop {{n}} followed by a key by and a mapper {{m2}} (all-to-all communication) which runs with dop {{m}} and {{n > m}}, it happens that the sub tasks of {{m2}} are not uniformly spread out across all currently used {{TaskManagers}}.

For example: {{n = 4}}, {{m = 2}} and we have 2 TaskManagers with 2 slots each. The deployment would look the following:

TM1:
Slot 1: {{m1_1}} -> {{m_2_1}}
Slot 2: {{m1_3}} -> {{m_2_2}}

TM2:
Slot 1: {{m1_2}}
Slot 2: {{m1_4}}

The problem for this behaviour is that when there are too many preferred locations (currently 8) due to an all-to-all communication pattern, then we will simply poll the next slot from the MultiMap in {{SlotSharingGroupAssignment}}. The polling algorithm first drains all available slots for a single machine before it polls slots from another machine.

I think it would be better to poll slots in a round robin fashion wrt to the machines. That way we would get a better resource utilisation by spreading the tasks more evenly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)