[jira] [Created] (FLINK-19286) Improve pipelined region scheduling performance

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19286) Improve pipelined region scheduling performance

Shang Yuanchun (Jira)
Zhu Zhu created FLINK-19286:
-------------------------------

             Summary: Improve pipelined region scheduling performance
                 Key: FLINK-19286
                 URL: https://issues.apache.org/jira/browse/FLINK-19286
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination
    Affects Versions: 1.12.0
            Reporter: Zhu Zhu
            Assignee: Zhu Zhu
             Fix For: 1.12.0


In my recent TPCDS benchmark, pipelined region scheduling is slower than lazy-from-sources scheduling.
The regression is due to some suboptimal implementation of {{PipelinedRegionSchedulingStrategy}}, including:
1. topologically sorting of vertices to deploy
2. unnecessary O(V) loop when sorting an empty set of regions

After improving these implementations, pipelined region scheduling turned to be 10% faster in the previous benchmark setup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)