[jira] [Created] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-22677) Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion

Shang Yuanchun (Jira)
Jin Xing created FLINK-22677:
--------------------------------

             Summary: Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
                 Key: FLINK-22677
                 URL: https://issues.apache.org/jira/browse/FLINK-22677
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Coordination
            Reporter: Jin Xing


Current scheduler enforces a synchronous registration though the API of ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In scenario of remote shuffle service, the talk between ShuffleMaster and remote cluster tends to be expensive. A synchronous registration risks to block main thread potentially and might cause negative side effects like heartbeat timeout.

Additionally, expensive synchronous invokes to remote could bottleneck the throughput for applying shuffle resource, especially for batch jobs with complicated DAGs;



--
This message was sent by Atlassian Jira
(v8.3.4#803005)