Hi All,
I've created 2 docker containers on my local machine, one running JM(192.168.99.104) and other running TM. I was expecting to see TM in the JM UI but it did not happen. On looking into the TM logs I see following lines 01:29:50,862 DEBUG org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager process reaper 01:29:50,868 INFO org.apache.flink.runtime.filecache.FileCache - User file cache uses directory /tmp/flink-dist-cache-be63f351-2bce-48ef-bbc4-fb0f40fecd49 01:29:51,093 INFO org.apache.flink.runtime.taskmanager.TaskManager - Starting TaskManager actor at akka://flink/user/taskmanager#1222392284. 01:29:51,095 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager data connection information: 140efeb188cc (dataPort=6122) 01:29:51,096 INFO org.apache.flink.runtime.taskmanager.TaskManager - TaskManager has 1 task slot(s). 01:29:51,097 INFO org.apache.flink.runtime.taskmanager.TaskManager - Memory usage stats: [HEAP: 386/494/494 MB, NON HEAP: 30/31/-1 MB (used/committed/max)] 01:29:51,104 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp:// flink@192.168.99.104:6123/user/jobmanager (attempt 1, timeout: 500 milliseconds) 01:29:51,633 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp:// flink@192.168.99.104:6123/user/jobmanager (attempt 2, timeout: 1000 milliseconds) 01:29:52,652 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp:// flink@192.168.99.104:6123/user/jobmanager (attempt 3, timeout: 2000 milliseconds) 01:29:54,672 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp:// flink@192.168.99.104:6123/user/jobmanager (attempt 4, timeout: 4000 milliseconds) 01:29:58,693 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp:// flink@192.168.99.104:6123/user/jobmanager (attempt 5, timeout: 8000 milliseconds) 01:30:06,702 INFO org.apache.flink.runtime.taskmanager.TaskManager - Trying to register at JobManager akka.tcp:// flink@192.168.99.104:6123/user/jobmanager (attempt 6, timeout: 16000 milliseconds) However, from TM i am able to reach JM on port 6123 root@140efeb188cc:/# nc -v 192.168.99.104 6123 Connection to 192.168.99.104 6123 port [tcp/*] succeeded! masters file on TM contains 192.168.99.104:8080 Did anyone face this issue with remote JM/TM combination ? -- Thanks, Deepak Jha |
Hi!
This registration phase means that the TaskManager tries to tell the JobManager that it is available. If that fails, there can be two reasons 1) Network communication not possible to the port 1.1) JobManager IP really not reachable (not the case, as you described) 1.2) TaskManager selected a wrong network interface to work with 2) JobManager not listening To look into 1.2, can you check the TaskManager log at the beginning, where it says what interface/hostname the TaskManager selected to use? Thanks, Stephan On Fri, Mar 4, 2016 at 2:48 AM, Deepak Jha <[hidden email]> wrote: > Hi All, > I've created 2 docker containers on my local machine, one running > JM(192.168.99.104) and other running TM. I was expecting to see TM in the > JM UI but it did not happen. On looking into the TM logs I see following > lines > > > 01:29:50,862 DEBUG org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager process reaper > 01:29:50,868 INFO org.apache.flink.runtime.filecache.FileCache > - User file cache uses directory > /tmp/flink-dist-cache-be63f351-2bce-48ef-bbc4-fb0f40fecd49 > 01:29:51,093 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Starting TaskManager actor at > akka://flink/user/taskmanager#1222392284. > 01:29:51,095 INFO org.apache.flink.runtime.taskmanager.TaskManager > - TaskManager data connection information: 140efeb188cc > (dataPort=6122) > 01:29:51,096 INFO org.apache.flink.runtime.taskmanager.TaskManager > - TaskManager has 1 task slot(s). > 01:29:51,097 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Memory usage stats: [HEAP: 386/494/494 MB, NON HEAP: 30/31/-1 MB > (used/committed/max)] > 01:29:51,104 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager akka.tcp:// > flink@192.168.99.104:6123/user/jobmanager (attempt 1, timeout: 500 > milliseconds) > 01:29:51,633 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager akka.tcp:// > flink@192.168.99.104:6123/user/jobmanager (attempt 2, timeout: 1000 > milliseconds) > 01:29:52,652 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager akka.tcp:// > flink@192.168.99.104:6123/user/jobmanager (attempt 3, timeout: 2000 > milliseconds) > 01:29:54,672 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager akka.tcp:// > flink@192.168.99.104:6123/user/jobmanager (attempt 4, timeout: 4000 > milliseconds) > 01:29:58,693 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager akka.tcp:// > flink@192.168.99.104:6123/user/jobmanager (attempt 5, timeout: 8000 > milliseconds) > 01:30:06,702 INFO org.apache.flink.runtime.taskmanager.TaskManager > - Trying to register at JobManager akka.tcp:// > flink@192.168.99.104:6123/user/jobmanager (attempt 6, timeout: 16000 > milliseconds) > > > However, from TM i am able to reach JM on port 6123 > root@140efeb188cc:/# nc -v 192.168.99.104 6123 > Connection to 192.168.99.104 6123 port [tcp/*] succeeded! > > > masters file on TM contains > 192.168.99.104:8080 > > Did anyone face this issue with remote JM/TM combination ? > > -- > Thanks, > Deepak Jha > |
The pull request https://github.com/apache/flink/pull/1758 should improve
the TaskManager's network interface selection. On Fri, Mar 4, 2016 at 10:19 AM, Stephan Ewen <[hidden email]> wrote: > Hi! > > This registration phase means that the TaskManager tries to tell the > JobManager that it is available. > If that fails, there can be two reasons > > 1) Network communication not possible to the port > 1.1) JobManager IP really not reachable (not the case, as you > described) > 1.2) TaskManager selected a wrong network interface to work with > 2) JobManager not listening > > > To look into 1.2, can you check the TaskManager log at the beginning, > where it says what interface/hostname the TaskManager selected to use? > > Thanks, > Stephan > > > > > > > On Fri, Mar 4, 2016 at 2:48 AM, Deepak Jha <[hidden email]> wrote: > >> Hi All, >> I've created 2 docker containers on my local machine, one running >> JM(192.168.99.104) and other running TM. I was expecting to see TM in the >> JM UI but it did not happen. On looking into the TM logs I see following >> lines >> >> >> 01:29:50,862 DEBUG org.apache.flink.runtime.taskmanager.TaskManager >> - Starting TaskManager process reaper >> 01:29:50,868 INFO org.apache.flink.runtime.filecache.FileCache >> - User file cache uses directory >> /tmp/flink-dist-cache-be63f351-2bce-48ef-bbc4-fb0f40fecd49 >> 01:29:51,093 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Starting TaskManager actor at >> akka://flink/user/taskmanager#1222392284. >> 01:29:51,095 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - TaskManager data connection information: 140efeb188cc >> (dataPort=6122) >> 01:29:51,096 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - TaskManager has 1 task slot(s). >> 01:29:51,097 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Memory usage stats: [HEAP: 386/494/494 MB, NON HEAP: 30/31/-1 MB >> (used/committed/max)] >> 01:29:51,104 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Trying to register at JobManager akka.tcp:// >> flink@192.168.99.104:6123/user/jobmanager (attempt 1, timeout: 500 >> milliseconds) >> 01:29:51,633 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Trying to register at JobManager akka.tcp:// >> flink@192.168.99.104:6123/user/jobmanager (attempt 2, timeout: 1000 >> milliseconds) >> 01:29:52,652 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Trying to register at JobManager akka.tcp:// >> flink@192.168.99.104:6123/user/jobmanager (attempt 3, timeout: 2000 >> milliseconds) >> 01:29:54,672 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Trying to register at JobManager akka.tcp:// >> flink@192.168.99.104:6123/user/jobmanager (attempt 4, timeout: 4000 >> milliseconds) >> 01:29:58,693 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Trying to register at JobManager akka.tcp:// >> flink@192.168.99.104:6123/user/jobmanager (attempt 5, timeout: 8000 >> milliseconds) >> 01:30:06,702 INFO org.apache.flink.runtime.taskmanager.TaskManager >> - Trying to register at JobManager akka.tcp:// >> flink@192.168.99.104:6123/user/jobmanager (attempt 6, timeout: 16000 >> milliseconds) >> >> >> However, from TM i am able to reach JM on port 6123 >> root@140efeb188cc:/# nc -v 192.168.99.104 6123 >> Connection to 192.168.99.104 6123 port [tcp/*] succeeded! >> >> >> masters file on TM contains >> 192.168.99.104:8080 >> >> Did anyone face this issue with remote JM/TM combination ? >> >> -- >> Thanks, >> Deepak Jha >> > > |
Hi Stephan,
Thanks for the response. I was able to resolve the issue, I was using localhost in jobmanager name instead of container name... There were few more issues which I would like to mention - I'm using S3 for storage/checkpoint in Flink HA mode, I realized that I have to set fs.hdfs.hadoopconf in conf/flink-conf.yaml and add core-site.xml in conf/ .. Since I'm deploying it on AWS I had to place hadoop-aws.jar as well.... On Fri, Mar 4, 2016 at 1:22 AM, Stephan Ewen <[hidden email]> wrote: > The pull request https://github.com/apache/flink/pull/1758 should improve > the TaskManager's network interface selection. > > > On Fri, Mar 4, 2016 at 10:19 AM, Stephan Ewen <[hidden email]> wrote: > > > Hi! > > > > This registration phase means that the TaskManager tries to tell the > > JobManager that it is available. > > If that fails, there can be two reasons > > > > 1) Network communication not possible to the port > > 1.1) JobManager IP really not reachable (not the case, as you > > described) > > 1.2) TaskManager selected a wrong network interface to work with > > 2) JobManager not listening > > > > > > To look into 1.2, can you check the TaskManager log at the beginning, > > where it says what interface/hostname the TaskManager selected to use? > > > > Thanks, > > Stephan > > > > > > > > > > > > > > On Fri, Mar 4, 2016 at 2:48 AM, Deepak Jha <[hidden email]> wrote: > > > >> Hi All, > >> I've created 2 docker containers on my local machine, one running > >> JM(192.168.99.104) and other running TM. I was expecting to see TM in > the > >> JM UI but it did not happen. On looking into the TM logs I see following > >> lines > >> > >> > >> 01:29:50,862 DEBUG org.apache.flink.runtime.taskmanager.TaskManager > >> - Starting TaskManager process reaper > >> 01:29:50,868 INFO org.apache.flink.runtime.filecache.FileCache > >> - User file cache uses directory > >> /tmp/flink-dist-cache-be63f351-2bce-48ef-bbc4-fb0f40fecd49 > >> 01:29:51,093 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - Starting TaskManager actor at > >> akka://flink/user/taskmanager#1222392284. > >> 01:29:51,095 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - TaskManager data connection information: 140efeb188cc > >> (dataPort=6122) > >> 01:29:51,096 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - TaskManager has 1 task slot(s). > >> 01:29:51,097 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - Memory usage stats: [HEAP: 386/494/494 MB, NON HEAP: 30/31/-1 MB > >> (used/committed/max)] > >> 01:29:51,104 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - Trying to register at JobManager akka.tcp:// > >> flink@192.168.99.104:6123/user/jobmanager (attempt 1, timeout: 500 > >> milliseconds) > >> 01:29:51,633 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - Trying to register at JobManager akka.tcp:// > >> flink@192.168.99.104:6123/user/jobmanager (attempt 2, timeout: 1000 > >> milliseconds) > >> 01:29:52,652 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - Trying to register at JobManager akka.tcp:// > >> flink@192.168.99.104:6123/user/jobmanager (attempt 3, timeout: 2000 > >> milliseconds) > >> 01:29:54,672 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - Trying to register at JobManager akka.tcp:// > >> flink@192.168.99.104:6123/user/jobmanager (attempt 4, timeout: 4000 > >> milliseconds) > >> 01:29:58,693 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - Trying to register at JobManager akka.tcp:// > >> flink@192.168.99.104:6123/user/jobmanager (attempt 5, timeout: 8000 > >> milliseconds) > >> 01:30:06,702 INFO org.apache.flink.runtime.taskmanager.TaskManager > >> - Trying to register at JobManager akka.tcp:// > >> flink@192.168.99.104:6123/user/jobmanager (attempt 6, timeout: 16000 > >> milliseconds) > >> > >> > >> However, from TM i am able to reach JM on port 6123 > >> root@140efeb188cc:/# nc -v 192.168.99.104 6123 > >> Connection to 192.168.99.104 6123 port [tcp/*] succeeded! > >> > >> > >> masters file on TM contains > >> 192.168.99.104:8080 > >> > >> Did anyone face this issue with remote JM/TM combination ? > >> > >> -- > >> Thanks, > >> Deepak Jha > >> > > > > > -- Thanks, Deepak Jha |
Free forum by Nabble | Edit this page |