[jira] [Created] (FLINK-18799) improve slot allocation to make resource balance among machines.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-18799) improve slot allocation to make resource balance among machines.

Shang Yuanchun (Jira)
nobleyd created FLINK-18799:
-------------------------------

             Summary: improve slot allocation to make resource balance among machines.
                 Key: FLINK-18799
                 URL: https://issues.apache.org/jira/browse/FLINK-18799
             Project: Flink
          Issue Type: Improvement
          Components: API / Core, Client / Job Submission
            Reporter: nobleyd


  I have a completed job, and each vertex may have different parallelism, and what troubles me is that the metric 'cpu used' differs among machines.

  It comes to be good when I upgraded to use flink1.10, and add 'cluster.evenly-spread-out-slots: true' to flink config. This is good, while sometimes it is not enough.

  For example, I have 5 taskmanagers(each deployed in one machine). I have a job, and some vertexs and the parallelism info is below.

 
||vertex||parallelism||
|A|1|
|B|15|
|C|20|
|D|1|
|E|15|

In this case, the resource sometimes won't balance very good. What I expected is that the vertext B/C/E can distribute evenly amont 5 taskmanagers. Vertex A and D only have 1 parallelism, and it is just some config stream. 

  Expected allocation strategy: For each vertex, allocate slot evenly among taskmanagers. Then next vertex and repeat. For example, the result below:

 

 
||TaskManager1||TaskManager2||TaskManager3||TaskManager4||TaskManager5||
|{color:#FF0000}A1{color}|{color:#00875a}B1{color}|{color:#00875a}B2{color}|{color:#00875a}B3{color}|{color:#00875a}B4{color}|
|{color:#00875a}B5{color}|{color:#00875a}B6{color}|{color:#00875a}B7{color}|{color:#00875a}B8{color}|{color:#00875a}B9{color}|
|{color:#00875a}B10{color}|{color:#00875a}B11{color}|{color:#00875a}B12{color}|{color:#00875a}B13{color}|{color:#00875a}B14{color}|
|{color:#00875a}B15{color}|{color:#ff8b00}C1{color}|{color:#ff8b00}C2{color}|{color:#ff8b00}C3{color}|{color:#ff8b00}C4{color}|
|{color:#ff8b00}C5{color}|{color:#ff8b00}C6{color}|{color:#ff8b00}C7{color}|{color:#ff8b00}C8{color}|{color:#ff8b00}C9{color}|
|{color:#ff8b00}C10{color}|{color:#ff8b00}C11{color}|{color:#ff8b00}C12{color}|{color:#ff8b00}C13{color}|{color:#ff8b00}C14{color}|
|{color:#ff8b00}C15{color}|{color:#ff8b00}C16{color}|{color:#ff8b00}C17{color}|{color:#ff8b00}C18{color}|{color:#ff8b00}C19{color}|
|{color:#ff8b00}C20{color}|{color:#403294}D1{color}|E1|E2|E3|
|E4|E5|E6|E7|E8|
|E9|E10|E11|E12|E13|
|E14|E15| | | |
| | | | | |

The allocation order is A -> B -> C -> D -> E or some other order, it doesn't matter. The key point is one vertex's all  parallel subtasks should be allocated at one time, and then to consider the next vertex. With this strategy, vertext A/D won't influence other vertex's distribution equilibrium.

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)