[jira] [Created] (FLINK-19677) TaskManager takes abnormally long time to register with JobManager on Kubernetes

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19677) TaskManager takes abnormally long time to register with JobManager on Kubernetes

Shang Yuanchun (Jira)
Weike Dong created FLINK-19677:
----------------------------------

             Summary: TaskManager takes abnormally long time to register with JobManager on Kubernetes
                 Key: FLINK-19677
                 URL: https://issues.apache.org/jira/browse/FLINK-19677
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Task
    Affects Versions: 1.11.2, 1.11.1, 1.11.0
            Reporter: Weike Dong


During the registration process of TaskManager, JobManager would create a 

_TaskManagerLocation_ instance, which tries to get hostname of the TaskManager via reverse DNS lookup.

However, this always fails in Kubernetes environment, because for pods that are not exposed by Services, their IPs cannot be resolved to domains by coredns, and _InetAddress#getCanonicalHostName()_ would take ~5 seconds to return, blocking the whole registration process.

Therefore Flink should provide a configuration parameter to turn off reverse DNS lookup. Also, even when hostname is actually needed, this could be done lazily to avoid blocking registration of other TaskManagers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)