Ufuk Celebi created FLINK-1093:
----------------------------------
Summary: Add non-multiplexing NetworkConnectionManager
Key: FLINK-1093
URL:
https://issues.apache.org/jira/browse/FLINK-1093 Project: Flink
Issue Type: Improvement
Components: Distributed Runtime
Affects Versions: 0.7-incubating
Reporter: Ufuk Celebi
Priority: Minor
When translating a JobGraph to an ExecutionGraph, edges are assigned connection indexes to ensure deadlock freedom of the network stack. Deadlocks can easily occur when multiplexing multiple logical channels over a single TCP connection, because of the limited network buffers. Each incoming envelope, which has an attached buffer, needs to request a network buffer from the receiver's input buffer pool. If the receiver does not have a buffer available, it unsubscribes from the TCP connection until a new buffer is available. This unsubscribe then affects all other logical channels, which are multiplexed over the same TCP connection. With certain user programs this can result in a deadlock.
One of the problems when running in multiplexed mode is the question of when to close TCP connections in a reliable way without loosing or re-ordering network envelopes (as evident in FLINK-1063). In non-multiplexed mode this is trivial, because every logical channel is mapped to a phyiscal channel and the close of the logical channel translates to a close of the physical channel.
However, this approach results in limited scalability as high degrees of parallelism result in a high number of TCP connections per machine, which are continuously opened and closed. Still, it is useful as a baseline and should work fine with smaller setups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)