[DISCUSS] Flip-31: Pluggable Shuffle Manager

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Flip-31: Pluggable Shuffle Manager

Zhijiang(wangzhijiang999)
Hi all,

I ever launched the discussion of "Proposal of external shuffle service" before and received very helpful feedbacks, especially with @Andrey Zagrebin's in-depth communication offline.
Based on @Till Rohrmann's suggestion, I launch this separate thread again to summarize the current progress and welcome further reviews.

You might encounter such problems before:
1. When the task finishes, the TM is released by ResourceManager soon. But the internal partition has not been transfered completed or not consumed by downstream side yet, which would cause
unnecessary failover.
2. When the partition is transfered completed via network, it would be removed from TM immediately. If there are any exceptions during transport or downstream's consumption, the upstream task
has to be restarted again to re-produce the data.
3. You may not satisfy with the performance of one-partition-one-file mode for batch jobs in some scenarios, or you want to realize some external shuffle service deployed on YARN/K8S,etc. Or the
transport layer is not limited by curernt netty, such as via DFS or RDMA etc. But you might find it is difficult to make changes within current architecture.

All the above concerns would be covered by proposed pluggable shuffle manager architecture. If you are interested or wish more details, please refer to the google doc [1] or FLIP [2]. There are also
some sub-tasks under going within the umbrella jira [3] .

[1] https://docs.google.com/document/d/1l7yIVNH3HATP4BnjEOZFkO2CaHf1sVn_DSxS2llmkd8/edit?usp=sharing
[2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
[3] https://issues.apache.org/jira/browse/FLINK-10653

Best,
Zhijiang