[jira] [Created] (FLINK-22676) The partition tracker should support remote shuffle properly

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-22676) The partition tracker should support remote shuffle properly

Shang Yuanchun (Jira)
Jin Xing created FLINK-22676:
--------------------------------

             Summary: The partition tracker should support remote shuffle properly
                 Key: FLINK-22676
                 URL: https://issues.apache.org/jira/browse/FLINK-22676
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Network
            Reporter: Jin Xing


In current Flink, data partition is bound with the ResourceID of TM in Execution#startTrackingPartitions and partition tracker will stop tracking corresponding partitions when a TM disconnects(JobMaster#disconnectTaskManager), i.e. the lifecycle of shuffle data is bound with computing resource (TM). It works fine for internal shuffle service, but doesn't for remote shuffle service. Note that shuffle data is accommodated on remote, the lifecycle of a completed partition is capable to be decoupled with TM, i.e. TM is totally fine to be released when no computing task on it and further shuffle reading requests could be directed to remote shuffle cluster. In addition, when a TM is lost, its completed data partitions on remote shuffle cluster could avoid reproducing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)