Jin Xing created FLINK-22677:
--------------------------------
Summary: Scheduler should invoke ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
Key: FLINK-22677
URL:
https://issues.apache.org/jira/browse/FLINK-22677 Project: Flink
Issue Type: Sub-task
Components: Runtime / Coordination
Reporter: Jin Xing
Current scheduler enforces a synchronous registration though the API of ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In scenario of remote shuffle service, the talk between ShuffleMaster and remote cluster tends to be expensive. A synchronous registration risks to block main thread potentially and might cause negative side effects like heartbeat timeout.
Additionally, expensive synchronous invokes to remote could bottleneck the throughput for applying shuffle resource, especially for batch jobs with complicated DAGs;
--
This message was sent by Atlassian Jira
(v8.3.4#803005)