[jira] [Created] (FLINK-1093) Add non-multiplexing NetworkConnectionManager

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-1093) Add non-multiplexing NetworkConnectionManager

Shang Yuanchun (Jira)
Ufuk Celebi created FLINK-1093:
----------------------------------

             Summary: Add non-multiplexing NetworkConnectionManager
                 Key: FLINK-1093
                 URL: https://issues.apache.org/jira/browse/FLINK-1093
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Runtime
    Affects Versions: 0.7-incubating
            Reporter: Ufuk Celebi
            Priority: Minor


When translating a JobGraph to an ExecutionGraph, edges are assigned connection indexes to ensure deadlock freedom of the network stack. Deadlocks can easily occur when multiplexing multiple logical channels over a single TCP connection, because of the limited network buffers. Each incoming envelope, which has an attached buffer, needs to request a network buffer from the receiver's input buffer pool. If the receiver does not have a buffer available, it unsubscribes from the TCP connection until a new buffer is available. This unsubscribe then affects all other logical channels, which are multiplexed over the same TCP connection. With certain user programs this can result in a deadlock.

One of the problems when running in multiplexed mode is the question of when to close TCP connections in a reliable way without loosing or re-ordering network envelopes (as evident in FLINK-1063). In non-multiplexed mode this is trivial, because every logical channel is mapped to a phyiscal channel and the close of the logical channel translates to a close of the physical channel.

However, this approach results in limited scalability as high degrees of parallelism result in a high number of TCP connections per machine, which are continuously opened and closed. Still, it is useful as a baseline and should work fine with smaller setups.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)