[jira] [Created] (FLINK-11632) Make TaskManager automatic bind address picking more explicit (by default) and more configurable

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11632) Make TaskManager automatic bind address picking more explicit (by default) and more configurable

Shang Yuanchun (Jira)
Alex created FLINK-11632:
----------------------------

             Summary: Make TaskManager automatic bind address picking more explicit (by default) and more configurable
                 Key: FLINK-11632
                 URL: https://issues.apache.org/jira/browse/FLINK-11632
             Project: Flink
          Issue Type: Improvement
          Components: Distributed Coordination, Network, TaskManager
            Reporter: Alex


Currently, there is an optional {{taskmanager.host}} configuration option in {{flink-conf.yaml}} that allows users of Flink to "statically" pre-define what should be a bind address for TaskManager to listen on (note: it's also possible to override this option by passing corresponding command line option to Flink).

In case when the option is not set, TaskManager would try [heuristically pick up a bind address|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L421-L442].

The resulting address (hostname) is used to advertise different service endpoints (running in TM) to the JobManager. Also it would be resolved to an {{[InetAddress|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/taskexecutor/TaskManagerRunner.java#L359]}} later that used as binding address for TMs inner node communication.

This proposal is to minimize usage of heuristics (by default) by introducing a new configuration option (for example, {{taskmanager.host.bind-policy}}) with possible values:
 * {{"hostname"}} - default, use TM's host's name ({{== InetAddress.getLocalHost().getHostName()}};
 * {{"ip"}} - use TM's host's ip address ({{== InetAddress.getLocalHost().getHostAddress()}});
 * {{"auto-detect-hostname"}} - use the heuristics based detection mechanism.

*Note:* the configuration key and values could be named better and open for proposals.
*Note 2:* in the future, the configuration option _may_ require to be extended to allow choosing some specific network interface, or preference of ipv6 vs ipv4.
h3. Rationale

[The heuristics mechanism|https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/net/ConnectionUtils.java#L364-L475] tries to establish a probe connection to {{jobmanager.rpc.address}} from different network interface addresses.
 In case of parallel setups (when JM and multiple TMs start simultaneously, in parallel), this depends on timing, assigned network ip addresses and may end up with "non-uniform" address bindings of TMs (some may be "lucky" to pick up non default network interface, some would fallback to {{InetAddress.getLocalHost().getHostName()}}. At the end, it's less obvious and transparent which binding address a TM picks up.

In practice, it's possible that in majority of cases (in well setup environments) the heuristics mechanism returns a result that matches {{InetAddress.getLocalHost()}}. The proposal is to stick with this more simpler and explicit binding (by default), avoiding non-determinism of heuristics.

The old mechanism is kept available, in case if it is useful in some setups. But would require explicit configuration setting.

Additionally, this proposal extends "auto configuration" option by allowing users to choose the host's ip address (instead of hostname). This may be convenient in situations where the TMs' machines are not necessary reachable via DNS (for example in a Kubernetes setup).









--
This message was sent by Atlassian JIRA
(v7.6.3#76005)