netty channel for data transfer

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

netty channel for data transfer

wangzhijiang
As I know, each TaskManager has NettyConnectionManager component and  all the tasks in the TaskManager will user  that to transfer data. In the PartitionRequestClientFactory, the nettyClient will make a connection based on connectionId, and the connectionId consists of socketAddress and connectionIndex of IntermediateResult. That means for different IntermediateResult, it will create different connection to transfer data in the same TaskManager. I am wondering why can not use the same channel for different intermediateResult to transfer data. Are there any issues?
Thank you in advance for any suggestions!
Best wishes,
Zhijiang Wang
Reply | Threaded
Open this post in threaded view
|

Re: netty channel for data transfer

Stephan Ewen
Hi!

You can think of it the following way: When you execute a join, the data
shuffled from node A to node B for the left side of the join will have a
separate TCP connection than the data shuffled from node A to node B for
the right side of the join.

That is currently important to avoid distributed deadlocks, because once
the TCP connection backpressures for one input, it would backpressure also
the other input.

In order to multiplex them through the same connection, we would need an
application-layer flow control overlay over the connection. Something like
the credit-based system that SSH tunnels use, or so.

It has been discussed a few times, but no one got the time to do that, yet.
It's quite delicate to get this right to not degrade network throughput...

Greetings,
Stephan


On Tue, Aug 11, 2015 at 11:10 AM, wangzhijiang999 <
[hidden email]> wrote:

> As I know, each TaskManager has NettyConnectionManager component and  all
> the tasks in the TaskManager will user  that to transfer data. In the
> PartitionRequestClientFactory, the nettyClient will make a connection based
> on connectionId, and the connectionId consists of socketAddress and
> connectionIndex of IntermediateResult. That means for different
> IntermediateResult, it will create different connection to transfer data in
> the same TaskManager. I am wondering why can not use the same channel for
> different intermediateResult to transfer data. Are there any issues?
> Thank you in advance for any suggestions!
> Best wishes,
> Zhijiang Wang