[jira] [Created] (FLINK-13056) Optimize region failover performance on calculating vertices to restart

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-13056) Optimize region failover performance on calculating vertices to restart

Shang Yuanchun (Jira)
Zhu Zhu created FLINK-13056:
-------------------------------

             Summary: Optimize region failover performance on calculating vertices to restart
                 Key: FLINK-13056
                 URL: https://issues.apache.org/jira/browse/FLINK-13056
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination
    Affects Versions: 1.9.0
            Reporter: Zhu Zhu
            Assignee: Zhu Zhu


Currently some region boundary structures are calculated each time of a region failover. This calculation can be heavy as its complexity goes up with execution edge count.

We tested it in a sample case with 8000 vertices and 16,000,000 edges. It takes ~2.0s to calculate vertices to restart.

(more details in [https://docs.google.com/document/d/197Ou-01h2obvxq8viKqg4FnOnsykOEKxk3r5WrVBPuA/edit?usp=sharing)]

That's why we'd propose to cache the region boundary structures to improve the region failover performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)