(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-18799) improve slot allocation to make resource balance among machines.

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-18799) improve slot allocation to make resource balance among machines.

nobleyd created FLINK-18799:
-------------------------------

Summary: improve slot allocation to make resource balance among machines.
Key: FLINK-18799
URL: https://issues.apache.org/jira/browse/FLINK-18799
Project: Flink
Issue Type: Improvement
Components: API / Core, Client / Job Submission
Reporter: nobleyd

I have a completed job, and each vertex may have different parallelism, and what troubles me is that the metric 'cpu used' differs among machines.

It comes to be good when I upgraded to use flink1.10, and add 'cluster.evenly-spread-out-slots: true' to flink config. This is good, while sometimes it is not enough.

For example, I have 5 taskmanagers(each deployed in one machine). I have a job, and some vertexs and the parallelism info is below.

||vertex||parallelism||
|A|1|
|B|15|
|C|20|
|D|1|
|E|15|

In this case, the resource sometimes won't balance very good. What I expected is that the vertext B/C/E can distribute evenly amont 5 taskmanagers. Vertex A and D only have 1 parallelism, and it is just some config stream.

Expected allocation strategy: For each vertex, allocate slot evenly among taskmanagers. Then next vertex and repeat. For example, the result below:

||TaskManager1||TaskManager2||TaskManager3||TaskManager4||TaskManager5||
|{color:#FF0000}A1{color}|{color:#00875a}B1{color}|{color:#00875a}B2{color}|{color:#00875a}B3{color}|{color:#00875a}B4{color}|
|{color:#00875a}B5{color}|{color:#00875a}B6{color}|{color:#00875a}B7{color}|{color:#00875a}B8{color}|{color:#00875a}B9{color}|
|{color:#00875a}B10{color}|{color:#00875a}B11{color}|{color:#00875a}B12{color}|{color:#00875a}B13{color}|{color:#00875a}B14{color}|
|{color:#00875a}B15{color}|{color:#ff8b00}C1{color}|{color:#ff8b00}C2{color}|{color:#ff8b00}C3{color}|{color:#ff8b00}C4{color}|
|{color:#ff8b00}C5{color}|{color:#ff8b00}C6{color}|{color:#ff8b00}C7{color}|{color:#ff8b00}C8{color}|{color:#ff8b00}C9{color}|
|{color:#ff8b00}C10{color}|{color:#ff8b00}C11{color}|{color:#ff8b00}C12{color}|{color:#ff8b00}C13{color}|{color:#ff8b00}C14{color}|
|{color:#ff8b00}C15{color}|{color:#ff8b00}C16{color}|{color:#ff8b00}C17{color}|{color:#ff8b00}C18{color}|{color:#ff8b00}C19{color}|
|{color:#ff8b00}C20{color}|{color:#403294}D1{color}|E1|E2|E3|
|E4|E5|E6|E7|E8|
|E9|E10|E11|E12|E13|
|E14|E15| | | |
| | | | | |

The allocation order is A -> B -> C -> D -> E or some other order, it doesn't matter. The key point is one vertex's all parallel subtasks should be allocated at one time, and then to consider the next vertex. With this strategy, vertext A/D won't influence other vertex's distribution equilibrium.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)