Could not build up connection to JobManager

classic Classic list List threaded Threaded
52 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Could not build up connection to JobManager

Dulaj Viduranga
I’m getting "Could not build up connection to JobManager.” When i tried to run the wordCount example. Can anyone help?

Dulaj
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Robert Metzger
Hi,
you said in the other email thread that the error only occurs for
Wordcount, not for Kmeans.
Can you copy me the commands for both examples?
I can not really believe that there is a difference between the two jobs.

Can you also send us the contents of the jobmanager log file?

Best,
Robert


On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]>
wrote:

> I’m getting "Could not build up connection to JobManager.” When i tried to
> run the wordCount example. Can anyone help?
>
> Dulaj
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Dulaj Viduranga
Yes. It seams it is not a problem with the arguments. I tried two days but different error occurs. It seams the web client can’t connect to the job manager although it is running
Right now, I can’t even get the webclient to run. ./bin/start-webclient.sh executes fine but I cannot connect to localhost:8080 (even with telnet or curl)
Here is the log for jobManager

23:22:31,933 INFO  org.apache.flink.client.web.WebInterfaceServer                - Setting up web frontend server, using web-root directory 'jar:file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs'.
23:22:31,934 INFO  org.apache.flink.client.web.WebInterfaceServer                - Web frontend server will store temporary files in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs', plan-json-dumps in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
23:22:31,934 INFO  org.apache.flink.client.web.WebInterfaceServer                - Web-frontend will submit jobs to nephele job-manager on localhost, port 6123.
23:22:32,580 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
23:22:32,625 INFO  Remoting                                                      - Starting remoting
23:22:32,838 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://JobsInfoServletActorSystem@127.0.0.1:51517]
23:23:48,119 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Operation timed out: /10.218.98.169:6123
23:23:48,124 ERROR org.apache.flink.client.WebFrontend                           - Unexpected exception: Could not find job manager at specified address akka.tcp://flink@10.218.98.169:6123/user/jobmanager.
java.lang.RuntimeException: Could not find job manager at specified address akka.tcp://flink@10.218.98.169:6123/user/jobmanager.
        at org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
        at org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
        at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
 

> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]> wrote:
>
> Hi,
> you said in the other email thread that the error only occurs for
> Wordcount, not for Kmeans.
> Can you copy me the commands for both examples?
> I can not really believe that there is a difference between the two jobs.
>
> Can you also send us the contents of the jobmanager log file?
>
> Best,
> Robert
>
>
> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]>
> wrote:
>
>> I’m getting "Could not build up connection to JobManager.” When i tried to
>> run the wordCount example. Can anyone help?
>>
>> Dulaj

Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Robert Metzger
Thank you for the quick reply.

The log you've send is from the webclient. Can you also send the log of the
JobManager?

On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
wrote:

> Yes. It seams it is not a problem with the arguments. I tried two days but
> different error occurs. It seams the web client can’t connect to the job
> manager although it is running
> Right now, I can’t even get the webclient to run. ./bin/start-webclient.sh
> executes fine but I cannot connect to localhost:8080 (even with telnet or
> curl)
> Here is the log for jobManager
>
> 23:22:31,933 INFO  org.apache.flink.client.web.WebInterfaceServer
>       - Setting up web frontend server, using web-root directory
> 'jar:file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs'.
> 23:22:31,934 INFO  org.apache.flink.client.web.WebInterfaceServer
>       - Web frontend server will store temporary files in
> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
> plan-json-dumps in
> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
> 23:22:31,934 INFO  org.apache.flink.client.web.WebInterfaceServer
>       - Web-frontend will submit jobs to nephele job-manager on localhost,
> port 6123.
> 23:22:32,580 INFO  akka.event.slf4j.Slf4jLogger
>       - Slf4jLogger started
> 23:22:32,625 INFO  Remoting
>       - Starting remoting
> 23:22:32,838 INFO  Remoting
>       - Remoting started; listening on addresses :[akka.tcp://
> JobsInfoServletActorSystem@127.0.0.1:51517]
> 23:23:48,119 WARN  Remoting
>       - Tried to associate with unreachable remote address [akka.tcp://
> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all messages
> to this address will be delivered to dead letters. Reason: Operation timed
> out: /10.218.98.169:6123
> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>        - Unexpected exception: Could not find job manager at specified
> address akka.tcp://flink@10.218.98.169:6123/user/jobmanager.
> java.lang.RuntimeException: Could not find job manager at specified
> address akka.tcp://flink@10.218.98.169:6123/user/jobmanager.
>         at
> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>         at
> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>         at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>
>
> > On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
> wrote:
> >
> > Hi,
> > you said in the other email thread that the error only occurs for
> > Wordcount, not for Kmeans.
> > Can you copy me the commands for both examples?
> > I can not really believe that there is a difference between the two jobs.
> >
> > Can you also send us the contents of the jobmanager log file?
> >
> > Best,
> > Robert
> >
> >
> > On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]>
> > wrote:
> >
> >> I’m getting "Could not build up connection to JobManager.” When i tried
> to
> >> run the wordCount example. Can anyone help?
> >>
> >> Dulaj
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Stephan Ewen
Hi Dulaj,

That error message indicates that the JobManager is not running. Are you
sure that the JobManager runs properly? Anything in the JobManager logs?

BTW: The 0.9 branch is under heavy development / changes. That is why it
may behave a bit different on different days right now. I would recommend
to use the 0.8.1 release for a stable experience.

Greetings,
Stephan


On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]> wrote:

> Thank you for the quick reply.
>
> The log you've send is from the webclient. Can you also send the log of the
> JobManager?
>
> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
> wrote:
>
> > Yes. It seams it is not a problem with the arguments. I tried two days
> but
> > different error occurs. It seams the web client can’t connect to the job
> > manager although it is running
> > Right now, I can’t even get the webclient to run.
> ./bin/start-webclient.sh
> > executes fine but I cannot connect to localhost:8080 (even with telnet or
> > curl)
> > Here is the log for jobManager
> >
> > 23:22:31,933 INFO  org.apache.flink.client.web.WebInterfaceServer
> >       - Setting up web frontend server, using web-root directory
> >
> 'jar:file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs'.
> > 23:22:31,934 INFO  org.apache.flink.client.web.WebInterfaceServer
> >       - Web frontend server will store temporary files in
> > '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
> > '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
> > plan-json-dumps in
> > '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
> > 23:22:31,934 INFO  org.apache.flink.client.web.WebInterfaceServer
> >       - Web-frontend will submit jobs to nephele job-manager on
> localhost,
> > port 6123.
> > 23:22:32,580 INFO  akka.event.slf4j.Slf4jLogger
> >       - Slf4jLogger started
> > 23:22:32,625 INFO  Remoting
> >       - Starting remoting
> > 23:22:32,838 INFO  Remoting
> >       - Remoting started; listening on addresses :[akka.tcp://
> > JobsInfoServletActorSystem@127.0.0.1:51517]
> > 23:23:48,119 WARN  Remoting
> >       - Tried to associate with unreachable remote address [akka.tcp://
> > flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
> messages
> > to this address will be delivered to dead letters. Reason: Operation
> timed
> > out: /10.218.98.169:6123
> > 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
> >        - Unexpected exception: Could not find job manager at specified
> > address akka.tcp://flink@10.218.98.169:6123/user/jobmanager.
> > java.lang.RuntimeException: Could not find job manager at specified
> > address akka.tcp://flink@10.218.98.169:6123/user/jobmanager.
> >         at
> >
> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
> >         at
> >
> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
> >         at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
> >
> >
> > > On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
> > wrote:
> > >
> > > Hi,
> > > you said in the other email thread that the error only occurs for
> > > Wordcount, not for Kmeans.
> > > Can you copy me the commands for both examples?
> > > I can not really believe that there is a difference between the two
> jobs.
> > >
> > > Can you also send us the contents of the jobmanager log file?
> > >
> > > Best,
> > > Robert
> > >
> > >
> > > On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]
> >
> > > wrote:
> > >
> > >> I’m getting "Could not build up connection to JobManager.” When i
> tried
> > to
> > >> run the wordCount example. Can anyone help?
> > >>
> > >> Dulaj
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Dulaj Viduranga
The JobManager seems to run fine. I don't know. When I tried to run start-local.sh again, It shows the PID of the running JobManager and also :8081 runs fine. I want to contribute to the project and I could get a little boost if I could see the capabilities of FLINK. :)
Will it be OK to use 0.8.1 as a developer?

On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:

Hi Dulaj,

That error message indicates that the JobManager is not running. Are you
sure that the JobManager runs properly? Anything in the JobManager logs?

BTW: The 0.9 branch is under heavy development / changes. That is why it
may behave a bit different on different days right now. I would recommend
to use the 0.8.1 release for a stable experience.

Greetings,
Stephan


On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]> wrote:

Thank you for the quick reply.
The log you've send is from the webclient. Can you also send the log of the
JobManager?
On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
wrote:
> Yes. It seams it is not a problem with the arguments. I tried two days
but
> different error occurs. It seams the web client can’t connect to the job
> manager although it is running
> Right now, I can’t even get the webclient to run.
./bin/start-webclient.sh
> executes fine but I cannot connect to localhost:8080 (even with telnet or
> curl)
> Here is the log for jobManager
>
> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
> - Setting up web frontend server, using web-root directory
>
'jar:file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs'.
> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
> - Web frontend server will store temporary files in
> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
> plan-json-dumps in
> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
> - Web-frontend will submit jobs to nephele job-manager on
localhost,
> port 6123.
> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
> - Slf4jLogger started
> 23:22:32,625 INFO Remoting
> - Starting remoting
> 23:22:32,838 INFO Remoting
> - Remoting started; listening on addresses :[akka.tcp://
 
> [hidden email]:51517]
> 23:23:48,119 WARN Remoting
> - Tried to associate with unreachable remote address [akka.tcp://
 
> [hidden email]:6123]. Address is now gated for 5000 ms, all
messages
> to this address will be delivered to dead letters. Reason: Operation
timed
> out: /10.218.98.169:6123
> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
> - Unexpected exception: Could not find job manager at specified
> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://[hidden email]:6123/user/jobmanager.
> java.lang.RuntimeException: Could not find job manager at specified
> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://[hidden email]:6123/user/jobmanager.
> at
>
org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
> at
>
org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>
>
> > On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
> wrote:
> >
> > Hi,
> > you said in the other email thread that the error only occurs for
> > Wordcount, not for Kmeans.
> > Can you copy me the commands for both examples?
> > I can not really believe that there is a difference between the two
jobs.
> >
> > Can you also send us the contents of the jobmanager log file?
> >
> > Best,
> > Robert
> >
> >
> > On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]
 
>
> > wrote:
> >
> >> I’m getting "Could not build up connection to JobManager.” When i
tried
> to
> >> run the wordCount example. Can anyone help?
> >>
> >> Dulaj
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Stephan Ewen
Hey Dulaj!

As a contributor, I would go against the latest version, which is
0.9-SNAPSHOT.

It may be in your case that the JobManager actor is down, but the process
still lingers. (BTW: I have a patch pending that makes sure the process
disappears when the actor via down).

Could you have a look at the log "flink-<user>-jobmanager-<host>-.log" and
see if there are any errors logged?

Greetings,
Stephan
Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]>:

> The JobManager seems to run fine. I don't know. When I tried to run
> start-local.sh again, It shows the PID of the running JobManager and also
> :8081 runs fine. I want to contribute to the project and I could get a
> little boost if I could see the capabilities of FLINK. :)
> Will it be OK to use 0.8.1 as a developer?
>
> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:
>
> Hi Dulaj,
>
> That error message indicates that the JobManager is not running. Are you
> sure that the JobManager runs properly? Anything in the JobManager logs?
>
> BTW: The 0.9 branch is under heavy development / changes. That is why it
> may behave a bit different on different days right now. I would recommend
> to use the 0.8.1 release for a stable experience.
>
> Greetings,
> Stephan
>
>
> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]>
> wrote:
>
> Thank you for the quick reply.
>
> The log you've send is from the webclient. Can you also send the log of the
>
> JobManager?
>
> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
>
> wrote:
>
> > Yes. It seams it is not a problem with the arguments. I tried two days
>
> but
>
> > different error occurs. It seams the web client can’t connect to the job
>
> > manager although it is running
>
> > Right now, I can’t even get the webclient to run.
>
> ./bin/start-webclient.sh
>
> > executes fine but I cannot connect to localhost:8080 (even with telnet or
>
> > curl)
>
> > Here is the log for jobManager
>
> >
>
> > 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>
> > - Setting up web frontend server, using web-root directory
>
> >
>
> 'jar:
> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
> '.
>
> > 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>
> > - Web frontend server will store temporary files in
>
> > '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
>
> > '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>
> > plan-json-dumps in
>
> > '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>
> > 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>
> > - Web-frontend will submit jobs to nephele job-manager on
>
> localhost,
>
> > port 6123.
>
> > 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>
> > - Slf4jLogger started
>
> > 23:22:32,625 INFO Remoting
>
> > - Starting remoting
>
> > 23:22:32,838 INFO Remoting
>
> > - Remoting started; listening on addresses :[akka.tcp://
>
>
> > JobsInfoServletActorSystem@127.0.0.1:51517]
>
> > 23:23:48,119 WARN Remoting
>
> > - Tried to associate with unreachable remote address [akka.tcp://
>
>
> > flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
>
> messages
>
> > to this address will be delivered to dead letters. Reason: Operation
>
> timed
>
> > out: /10.218.98.169:6123
>
> > 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>
> > - Unexpected exception: Could not find job manager at specified
>
> > address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> flink@10.218.98.169:6123/user/jobmanager.
>
> > java.lang.RuntimeException: Could not find job manager at specified
>
> > address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> flink@10.218.98.169:6123/user/jobmanager.
>
> > at
>
> >
>
> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>
> > at
>
> >
>
>
> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>
> > at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>
> >
>
> >
>
> > > On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
>
> > wrote:
>
> > >
>
> > > Hi,
>
> > > you said in the other email thread that the error only occurs for
>
> > > Wordcount, not for Kmeans.
>
> > > Can you copy me the commands for both examples?
>
> > > I can not really believe that there is a difference between the two
>
> jobs.
>
> > >
>
> > > Can you also send us the contents of the jobmanager log file?
>
> > >
>
> > > Best,
>
> > > Robert
>
> > >
>
> > >
>
> > > On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]
>
>
> >
>
> > > wrote:
>
> > >
>
> > >> I’m getting "Could not build up connection to JobManager.” When i
>
> tried
>
> > to
>
> > >> run the wordCount example. Can anyone help?
>
> > >>
>
> > >> Dulaj
>
> >
>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Dulaj Viduranga
Hi,
I still couldn’t figure out a solution. The logs for Jobmanager and webclient follows… It would be great if someone could take a look… Thanks




> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
>
> Hey Dulaj!
>
> As a contributor, I would go against the latest version, which is
> 0.9-SNAPSHOT.
>
> It may be in your case that the JobManager actor is down, but the process
> still lingers. (BTW: I have a patch pending that makes sure the process
> disappears when the actor via down).
>
> Could you have a look at the log "flink-<user>-jobmanager-<host>-.log" and
> see if there are any errors logged?
>
> Greetings,
> Stephan
> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]>:
>
>> The JobManager seems to run fine. I don't know. When I tried to run
>> start-local.sh again, It shows the PID of the running JobManager and also
>> :8081 runs fine. I want to contribute to the project and I could get a
>> little boost if I could see the capabilities of FLINK. :)
>> Will it be OK to use 0.8.1 as a developer?
>>
>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:
>>
>> Hi Dulaj,
>>
>> That error message indicates that the JobManager is not running. Are you
>> sure that the JobManager runs properly? Anything in the JobManager logs?
>>
>> BTW: The 0.9 branch is under heavy development / changes. That is why it
>> may behave a bit different on different days right now. I would recommend
>> to use the 0.8.1 release for a stable experience.
>>
>> Greetings,
>> Stephan
>>
>>
>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]>
>> wrote:
>>
>> Thank you for the quick reply.
>>
>> The log you've send is from the webclient. Can you also send the log of the
>>
>> JobManager?
>>
>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
>>
>> wrote:
>>
>>> Yes. It seams it is not a problem with the arguments. I tried two days
>>
>> but
>>
>>> different error occurs. It seams the web client can’t connect to the job
>>
>>> manager although it is running
>>
>>> Right now, I can’t even get the webclient to run.
>>
>> ./bin/start-webclient.sh
>>
>>> executes fine but I cannot connect to localhost:8080 (even with telnet or
>>
>>> curl)
>>
>>> Here is the log for jobManager
>>
>>>
>>
>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>>
>>> - Setting up web frontend server, using web-root directory
>>
>>>
>>
>> 'jar:
>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>> '.
>>
>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>
>>> - Web frontend server will store temporary files in
>>
>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
>>
>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>
>>> plan-json-dumps in
>>
>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>
>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>
>>> - Web-frontend will submit jobs to nephele job-manager on
>>
>> localhost,
>>
>>> port 6123.
>>
>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>
>>> - Slf4jLogger started
>>
>>> 23:22:32,625 INFO Remoting
>>
>>> - Starting remoting
>>
>>> 23:22:32,838 INFO Remoting
>>
>>> - Remoting started; listening on addresses :[akka.tcp://
>>
>>
>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>>
>>> 23:23:48,119 WARN Remoting
>>
>>> - Tried to associate with unreachable remote address [akka.tcp://
>>
>>
>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
>>
>> messages
>>
>>> to this address will be delivered to dead letters. Reason: Operation
>>
>> timed
>>
>>> out: /10.218.98.169:6123
>>
>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>
>>> - Unexpected exception: Could not find job manager at specified
>>
>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> flink@10.218.98.169:6123/user/jobmanager.
>>
>>> java.lang.RuntimeException: Could not find job manager at specified
>>
>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> flink@10.218.98.169:6123/user/jobmanager.
>>
>>> at
>>
>>>
>>
>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>
>>> at
>>
>>>
>>
>>
>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>
>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>
>>>
>>
>>>
>>
>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
>>
>>> wrote:
>>
>>>>
>>
>>>> Hi,
>>
>>>> you said in the other email thread that the error only occurs for
>>
>>>> Wordcount, not for Kmeans.
>>
>>>> Can you copy me the commands for both examples?
>>
>>>> I can not really believe that there is a difference between the two
>>
>> jobs.
>>
>>>>
>>
>>>> Can you also send us the contents of the jobmanager log file?
>>
>>>>
>>
>>>> Best,
>>
>>>> Robert
>>
>>>>
>>
>>>>
>>
>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]
>>
>>
>>>
>>
>>>> wrote:
>>
>>>>
>>
>>>>> I’m getting "Could not build up connection to JobManager.” When i
>>
>> tried
>>
>>> to
>>
>>>>> run the wordCount example. Can anyone help?
>>
>>>>>
>>
>>>>> Dulaj
>>
>>>
>>
>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Dulaj Viduranga
In reply to this post by Stephan Ewen
I tried to kill the job manager manually in the terminal and start it again but no luck. Also could you tell me if it’s possible to change webclient’s port (8080) ?

> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
>
> Hey Dulaj!
>
> As a contributor, I would go against the latest version, which is
> 0.9-SNAPSHOT.
>
> It may be in your case that the JobManager actor is down, but the process
> still lingers. (BTW: I have a patch pending that makes sure the process
> disappears when the actor via down).
>
> Could you have a look at the log "flink-<user>-jobmanager-<host>-.log" and
> see if there are any errors logged?
>
> Greetings,
> Stephan
> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]>:
>
>> The JobManager seems to run fine. I don't know. When I tried to run
>> start-local.sh again, It shows the PID of the running JobManager and also
>> :8081 runs fine. I want to contribute to the project and I could get a
>> little boost if I could see the capabilities of FLINK. :)
>> Will it be OK to use 0.8.1 as a developer?
>>
>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:
>>
>> Hi Dulaj,
>>
>> That error message indicates that the JobManager is not running. Are you
>> sure that the JobManager runs properly? Anything in the JobManager logs?
>>
>> BTW: The 0.9 branch is under heavy development / changes. That is why it
>> may behave a bit different on different days right now. I would recommend
>> to use the 0.8.1 release for a stable experience.
>>
>> Greetings,
>> Stephan
>>
>>
>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]>
>> wrote:
>>
>> Thank you for the quick reply.
>>
>> The log you've send is from the webclient. Can you also send the log of the
>>
>> JobManager?
>>
>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
>>
>> wrote:
>>
>>> Yes. It seams it is not a problem with the arguments. I tried two days
>>
>> but
>>
>>> different error occurs. It seams the web client can’t connect to the job
>>
>>> manager although it is running
>>
>>> Right now, I can’t even get the webclient to run.
>>
>> ./bin/start-webclient.sh
>>
>>> executes fine but I cannot connect to localhost:8080 (even with telnet or
>>
>>> curl)
>>
>>> Here is the log for jobManager
>>
>>>
>>
>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>>
>>> - Setting up web frontend server, using web-root directory
>>
>>>
>>
>> 'jar:
>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>> '.
>>
>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>
>>> - Web frontend server will store temporary files in
>>
>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
>>
>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>
>>> plan-json-dumps in
>>
>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>
>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>
>>> - Web-frontend will submit jobs to nephele job-manager on
>>
>> localhost,
>>
>>> port 6123.
>>
>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>
>>> - Slf4jLogger started
>>
>>> 23:22:32,625 INFO Remoting
>>
>>> - Starting remoting
>>
>>> 23:22:32,838 INFO Remoting
>>
>>> - Remoting started; listening on addresses :[akka.tcp://
>>
>>
>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>>
>>> 23:23:48,119 WARN Remoting
>>
>>> - Tried to associate with unreachable remote address [akka.tcp://
>>
>>
>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
>>
>> messages
>>
>>> to this address will be delivered to dead letters. Reason: Operation
>>
>> timed
>>
>>> out: /10.218.98.169:6123
>>
>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>
>>> - Unexpected exception: Could not find job manager at specified
>>
>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> flink@10.218.98.169:6123/user/jobmanager.
>>
>>> java.lang.RuntimeException: Could not find job manager at specified
>>
>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> flink@10.218.98.169:6123/user/jobmanager.
>>
>>> at
>>
>>>
>>
>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>
>>> at
>>
>>>
>>
>>
>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>
>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>
>>>
>>
>>>
>>
>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
>>
>>> wrote:
>>
>>>>
>>
>>>> Hi,
>>
>>>> you said in the other email thread that the error only occurs for
>>
>>>> Wordcount, not for Kmeans.
>>
>>>> Can you copy me the commands for both examples?
>>
>>>> I can not really believe that there is a difference between the two
>>
>> jobs.
>>
>>>>
>>
>>>> Can you also send us the contents of the jobmanager log file?
>>
>>>>
>>
>>>> Best,
>>
>>>> Robert
>>
>>>>
>>
>>>>
>>
>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]
>>
>>
>>>
>>
>>>> wrote:
>>
>>>>
>>
>>>>> I’m getting "Could not build up connection to JobManager.” When i
>>
>> tried
>>
>>> to
>>
>>>>> run the wordCount example. Can anyone help?
>>
>>>>>
>>
>>>>> Dulaj
>>
>>>
>>
>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Robert Metzger
Hi,
I could not find the logfiles attached to your mails. I think the
mailinglists are not accepting attachments.
Can you put the logs on gist.github.com?

The configuration values are documented here:
http://flink.apache.org/docs/0.8/config.html
For the webclient's port its called webclient.port

On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <[hidden email]>
wrote:

> I tried to kill the job manager manually in the terminal and start it
> again but no luck. Also could you tell me if it’s possible to change
> webclient’s port (8080) ?
>
> > On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
> >
> > Hey Dulaj!
> >
> > As a contributor, I would go against the latest version, which is
> > 0.9-SNAPSHOT.
> >
> > It may be in your case that the JobManager actor is down, but the process
> > still lingers. (BTW: I have a patch pending that makes sure the process
> > disappears when the actor via down).
> >
> > Could you have a look at the log "flink-<user>-jobmanager-<host>-.log"
> and
> > see if there are any errors logged?
> >
> > Greetings,
> > Stephan
> > Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]>:
> >
> >> The JobManager seems to run fine. I don't know. When I tried to run
> >> start-local.sh again, It shows the PID of the running JobManager and
> also
> >> :8081 runs fine. I want to contribute to the project and I could get a
> >> little boost if I could see the capabilities of FLINK. :)
> >> Will it be OK to use 0.8.1 as a developer?
> >>
> >> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:
> >>
> >> Hi Dulaj,
> >>
> >> That error message indicates that the JobManager is not running. Are you
> >> sure that the JobManager runs properly? Anything in the JobManager logs?
> >>
> >> BTW: The 0.9 branch is under heavy development / changes. That is why it
> >> may behave a bit different on different days right now. I would
> recommend
> >> to use the 0.8.1 release for a stable experience.
> >>
> >> Greetings,
> >> Stephan
> >>
> >>
> >> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]>
> >> wrote:
> >>
> >> Thank you for the quick reply.
> >>
> >> The log you've send is from the webclient. Can you also send the log of
> the
> >>
> >> JobManager?
> >>
> >> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
> >>
> >> wrote:
> >>
> >>> Yes. It seams it is not a problem with the arguments. I tried two days
> >>
> >> but
> >>
> >>> different error occurs. It seams the web client can’t connect to the
> job
> >>
> >>> manager although it is running
> >>
> >>> Right now, I can’t even get the webclient to run.
> >>
> >> ./bin/start-webclient.sh
> >>
> >>> executes fine but I cannot connect to localhost:8080 (even with telnet
> or
> >>
> >>> curl)
> >>
> >>> Here is the log for jobManager
> >>
> >>>
> >>
> >>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
> >>
> >>> - Setting up web frontend server, using web-root directory
> >>
> >>>
> >>
> >> 'jar:
> >>
> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
> >> '.
> >>
> >>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
> >>
> >>> - Web frontend server will store temporary files in
> >>
> >>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
> >>
> >>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
> >>
> >>> plan-json-dumps in
> >>
> >>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
> >>
> >>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
> >>
> >>> - Web-frontend will submit jobs to nephele job-manager on
> >>
> >> localhost,
> >>
> >>> port 6123.
> >>
> >>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
> >>
> >>> - Slf4jLogger started
> >>
> >>> 23:22:32,625 INFO Remoting
> >>
> >>> - Starting remoting
> >>
> >>> 23:22:32,838 INFO Remoting
> >>
> >>> - Remoting started; listening on addresses :[akka.tcp://
> >>
> >>
> >>> JobsInfoServletActorSystem@127.0.0.1:51517]
> >>
> >>> 23:23:48,119 WARN Remoting
> >>
> >>> - Tried to associate with unreachable remote address [akka.tcp://
> >>
> >>
> >>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
> >>
> >> messages
> >>
> >>> to this address will be delivered to dead letters. Reason: Operation
> >>
> >> timed
> >>
> >>> out: /10.218.98.169:6123
> >>
> >>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
> >>
> >>> - Unexpected exception: Could not find job manager at specified
> >>
> >>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> >> flink@10.218.98.169:6123/user/jobmanager.
> >>
> >>> java.lang.RuntimeException: Could not find job manager at specified
> >>
> >>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> >> flink@10.218.98.169:6123/user/jobmanager.
> >>
> >>> at
> >>
> >>>
> >>
> >>
> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
> >>
> >>> at
> >>
> >>>
> >>
> >>
> >>
> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
> >>
> >>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
> >>
> >>>
> >>
> >>>
> >>
> >>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
> >>
> >>> wrote:
> >>
> >>>>
> >>
> >>>> Hi,
> >>
> >>>> you said in the other email thread that the error only occurs for
> >>
> >>>> Wordcount, not for Kmeans.
> >>
> >>>> Can you copy me the commands for both examples?
> >>
> >>>> I can not really believe that there is a difference between the two
> >>
> >> jobs.
> >>
> >>>>
> >>
> >>>> Can you also send us the contents of the jobmanager log file?
> >>
> >>>>
> >>
> >>>> Best,
> >>
> >>>> Robert
> >>
> >>>>
> >>
> >>>>
> >>
> >>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
> [hidden email]
> >>
> >>
> >>>
> >>
> >>>> wrote:
> >>
> >>>>
> >>
> >>>>> I’m getting "Could not build up connection to JobManager.” When i
> >>
> >> tried
> >>
> >>> to
> >>
> >>>>> run the wordCount example. Can anyone help?
> >>
> >>>>>
> >>
> >>>>> Dulaj
> >>
> >>>
> >>
> >>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Dulaj Viduranga
In reply to this post by Dulaj Viduranga
Oh, I can’t attach files. Here are my log data

*************************Job manager************************************

21:24:59,837 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21:24:59,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                - -------------------------------------------------------
21:24:59,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Starting JobManager (Version: 0.9-SNAPSHOT, Rev:bed3da4, Date:20.02.2015 @ 20:40:35 IST)
21:24:59,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Current user: Vidura
21:24:59,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  JVM: Java HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.5-b02
21:24:59,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Startup Options: -Xms768m -Xmx768m
21:24:59,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  Maximum heap size: 736 MiBytes
21:24:59,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                -  JAVA_HOME: /Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home
21:24:59,924 INFO  org.apache.flink.runtime.jobmanager.JobManager                - -------------------------------------------------------
21:25:00,194 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager
21:25:00,625 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
21:25:00,675 INFO  Remoting                                                      - Starting remoting
21:25:00,839 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://flink@10.216.192.98:6123]
21:25:00,856 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/blobStore-efad515a-da85-4c39-afeb-1aef1d965aef
21:25:00,864 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:49678 - max concurrent requests: 50 - max backlog: 1000
21:25:00,876 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting embedded TaskManager for JobManager's LOCAL mode execution
21:25:00,881 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManager at akka://flink/user/jobmanager.
21:25:00,883 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Using 0.7 of the free heap space for managed memory.
21:25:00,941 INFO  org.apache.flink.runtime.jobmanager.JobManager                - Starting JobManger web frontend
21:25:00,945 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Starting task manager at akka://flink/user/taskmanager.
21:25:00,945 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Creating 1 task slot(s).
21:25:00,946 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - TaskManager connection information 127.0.0.1 (dataPort=49679).
21:25:00,948 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Temporary file directory '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T': total 232 GB, usable 14 GB (6.03% usable)
21:25:00,951 INFO  org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager uses directory /var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/flink-io-e9999120-d46d-47f1-8784-094a833cedb1 for spill files.
21:25:00,953 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer         - Setting up web info server, using web-root directoryjar:file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-runtime-0.9-SNAPSHOT.jar!/web-docs-infoserver.
21:25:01,304 INFO  org.eclipse.jetty.util.log                                    - jetty-8.0.0.M1
21:25:01,338 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Profiling of jobs is disabled.
21:25:01,338 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Memory usage stats: [HEAP: 523/736/736 MB, NON HEAP: 28/28/-1 MB (used/committed/max)]
21:25:01,342 INFO  org.eclipse.jetty.util.log                                    - Started SelectChannelConnector@0.0.0.0:8081
21:25:01,342 INFO  org.apache.flink.runtime.jobmanager.web.WebInfoServer         - Started web info server for JobManager on null:8081
21:25:01,405 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Try to register at master akka.tcp://flink@10.216.192.98:6123/user/jobmanager. 1. Attempt
21:25:01,410 INFO  org.apache.flink.runtime.instance.InstanceManager             - Registered TaskManager at 127 (akka://flink/user/taskmanager) as 60cbc9f7d2513d5fdec5b26970d30b85. Current number of registered hosts is 1.
21:25:01,555 INFO  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated 64 MB for network buffer pool (number of memory segments: 2048, bytes per segment: 32768).
21:25:01,558 INFO  org.apache.flink.runtime.taskmanager.TaskManager              - Determined BLOB server address to be localhost/10.216.192.98:49678.
21:25:01,559 INFO  org.apache.flink.runtime.blob.BlobCache                       - Created BLOB cache storage directory /var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/blobStore-575d89fe-e45a-4729-823c-1e24a6c394ad
 
********************************************************************





**********************Webclient***********************************

21:25:09,556 INFO  org.apache.flink.client.web.WebInterfaceServer                - Setting up web frontend server, using web-root directory 'jar:file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs'.
21:25:09,558 INFO  org.apache.flink.client.web.WebInterfaceServer                - Web frontend server will store temporary files in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs', plan-json-dumps in '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
21:25:09,558 INFO  org.apache.flink.client.web.WebInterfaceServer                - Web-frontend will submit jobs to nephele job-manager on localhost, port 6123.
21:25:10,231 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
21:25:10,291 INFO  Remoting                                                      - Starting remoting
21:25:10,469 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://JobsInfoServletActorSystem@127.0.0.1:49683]
21:26:25,710 WARN  Remoting                                                      - Tried to associate with unreachable remote address [akka.tcp://flink@10.216.192.98:6123]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Operation timed out: /10.216.192.98:6123
21:26:25,715 ERROR org.apache.flink.client.WebFrontend                           - Unexpected exception: Could not find job manager at specified address akka.tcp://flink@10.216.192.98:6123/user/jobmanager.
java.lang.RuntimeException: Could not find job manager at specified address akka.tcp://flink@10.216.192.98:6123/user/jobmanager.
        at org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
        at org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
        at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
 
************************************************************************************

> On Feb 24, 2015, at 9:32 PM, Dulaj Viduranga <[hidden email]> wrote:
>
> Hi,
> I still couldn’t figure out a solution. The logs for Jobmanager and webclient follows… It would be great if someone could take a look… Thanks
>
>
>
>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
>>
>> Hey Dulaj!
>>
>> As a contributor, I would go against the latest version, which is
>> 0.9-SNAPSHOT.
>>
>> It may be in your case that the JobManager actor is down, but the process
>> still lingers. (BTW: I have a patch pending that makes sure the process
>> disappears when the actor via down).
>>
>> Could you have a look at the log "flink-<user>-jobmanager-<host>-.log" and
>> see if there are any errors logged?
>>
>> Greetings,
>> Stephan
>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]>:
>>
>>> The JobManager seems to run fine. I don't know. When I tried to run
>>> start-local.sh again, It shows the PID of the running JobManager and also
>>> :8081 runs fine. I want to contribute to the project and I could get a
>>> little boost if I could see the capabilities of FLINK. :)
>>> Will it be OK to use 0.8.1 as a developer?
>>>
>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:
>>>
>>> Hi Dulaj,
>>>
>>> That error message indicates that the JobManager is not running. Are you
>>> sure that the JobManager runs properly? Anything in the JobManager logs?
>>>
>>> BTW: The 0.9 branch is under heavy development / changes. That is why it
>>> may behave a bit different on different days right now. I would recommend
>>> to use the 0.8.1 release for a stable experience.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]>
>>> wrote:
>>>
>>> Thank you for the quick reply.
>>>
>>> The log you've send is from the webclient. Can you also send the log of the
>>>
>>> JobManager?
>>>
>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
>>>
>>> wrote:
>>>
>>>> Yes. It seams it is not a problem with the arguments. I tried two days
>>>
>>> but
>>>
>>>> different error occurs. It seams the web client can’t connect to the job
>>>
>>>> manager although it is running
>>>
>>>> Right now, I can’t even get the webclient to run.
>>>
>>> ./bin/start-webclient.sh
>>>
>>>> executes fine but I cannot connect to localhost:8080 (even with telnet or
>>>
>>>> curl)
>>>
>>>> Here is the log for jobManager
>>>
>>>>
>>>
>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>>>
>>>> - Setting up web frontend server, using web-root directory
>>>
>>>>
>>>
>>> 'jar:
>>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>>> '.
>>>
>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>>
>>>> - Web frontend server will store temporary files in
>>>
>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
>>>
>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>>
>>>> plan-json-dumps in
>>>
>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>>
>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>>
>>>> - Web-frontend will submit jobs to nephele job-manager on
>>>
>>> localhost,
>>>
>>>> port 6123.
>>>
>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>>
>>>> - Slf4jLogger started
>>>
>>>> 23:22:32,625 INFO Remoting
>>>
>>>> - Starting remoting
>>>
>>>> 23:22:32,838 INFO Remoting
>>>
>>>> - Remoting started; listening on addresses :[akka.tcp://
>>>
>>>
>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>>>
>>>> 23:23:48,119 WARN Remoting
>>>
>>>> - Tried to associate with unreachable remote address [akka.tcp://
>>>
>>>
>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
>>>
>>> messages
>>>
>>>> to this address will be delivered to dead letters. Reason: Operation
>>>
>>> timed
>>>
>>>> out: /10.218.98.169:6123
>>>
>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>>
>>>> - Unexpected exception: Could not find job manager at specified
>>>
>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>> flink@10.218.98.169:6123/user/jobmanager.
>>>
>>>> java.lang.RuntimeException: Could not find job manager at specified
>>>
>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>> flink@10.218.98.169:6123/user/jobmanager.
>>>
>>>> at
>>>
>>>>
>>>
>>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>>
>>>> at
>>>
>>>>
>>>
>>>
>>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>>
>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>>
>>>>
>>>
>>>>
>>>
>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
>>>
>>>> wrote:
>>>
>>>>>
>>>
>>>>> Hi,
>>>
>>>>> you said in the other email thread that the error only occurs for
>>>
>>>>> Wordcount, not for Kmeans.
>>>
>>>>> Can you copy me the commands for both examples?
>>>
>>>>> I can not really believe that there is a difference between the two
>>>
>>> jobs.
>>>
>>>>>
>>>
>>>>> Can you also send us the contents of the jobmanager log file?
>>>
>>>>>
>>>
>>>>> Best,
>>>
>>>>> Robert
>>>
>>>>>
>>>
>>>>>
>>>
>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <[hidden email]
>>>
>>>
>>>>
>>>
>>>>> wrote:
>>>
>>>>>
>>>
>>>>>> I’m getting "Could not build up connection to JobManager.” When i
>>>
>>> tried
>>>
>>>> to
>>>
>>>>>> run the wordCount example. Can anyone help?
>>>
>>>>>>
>>>
>>>>>> Dulaj
>>>
>>>>
>>>
>>>>
>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Dulaj Viduranga
In reply to this post by Robert Metzger
Is taskmanager.numberOfTaskSlots: -1 normal?

> On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]> wrote:
>
> Hi,
> I could not find the logfiles attached to your mails. I think the
> mailinglists are not accepting attachments.
> Can you put the logs on gist.github.com?
>
> The configuration values are documented here:
> http://flink.apache.org/docs/0.8/config.html
> For the webclient's port its called webclient.port
>
> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <[hidden email]>
> wrote:
>
>> I tried to kill the job manager manually in the terminal and start it
>> again but no luck. Also could you tell me if it’s possible to change
>> webclient’s port (8080) ?
>>
>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
>>>
>>> Hey Dulaj!
>>>
>>> As a contributor, I would go against the latest version, which is
>>> 0.9-SNAPSHOT.
>>>
>>> It may be in your case that the JobManager actor is down, but the process
>>> still lingers. (BTW: I have a patch pending that makes sure the process
>>> disappears when the actor via down).
>>>
>>> Could you have a look at the log "flink-<user>-jobmanager-<host>-.log"
>> and
>>> see if there are any errors logged?
>>>
>>> Greetings,
>>> Stephan
>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]>:
>>>
>>>> The JobManager seems to run fine. I don't know. When I tried to run
>>>> start-local.sh again, It shows the PID of the running JobManager and
>> also
>>>> :8081 runs fine. I want to contribute to the project and I could get a
>>>> little boost if I could see the capabilities of FLINK. :)
>>>> Will it be OK to use 0.8.1 as a developer?
>>>>
>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:
>>>>
>>>> Hi Dulaj,
>>>>
>>>> That error message indicates that the JobManager is not running. Are you
>>>> sure that the JobManager runs properly? Anything in the JobManager logs?
>>>>
>>>> BTW: The 0.9 branch is under heavy development / changes. That is why it
>>>> may behave a bit different on different days right now. I would
>> recommend
>>>> to use the 0.8.1 release for a stable experience.
>>>>
>>>> Greetings,
>>>> Stephan
>>>>
>>>>
>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]>
>>>> wrote:
>>>>
>>>> Thank you for the quick reply.
>>>>
>>>> The log you've send is from the webclient. Can you also send the log of
>> the
>>>>
>>>> JobManager?
>>>>
>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <[hidden email]>
>>>>
>>>> wrote:
>>>>
>>>>> Yes. It seams it is not a problem with the arguments. I tried two days
>>>>
>>>> but
>>>>
>>>>> different error occurs. It seams the web client can’t connect to the
>> job
>>>>
>>>>> manager although it is running
>>>>
>>>>> Right now, I can’t even get the webclient to run.
>>>>
>>>> ./bin/start-webclient.sh
>>>>
>>>>> executes fine but I cannot connect to localhost:8080 (even with telnet
>> or
>>>>
>>>>> curl)
>>>>
>>>>> Here is the log for jobManager
>>>>
>>>>>
>>>>
>>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>
>>>>> - Setting up web frontend server, using web-root directory
>>>>
>>>>>
>>>>
>>>> 'jar:
>>>>
>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>>>> '.
>>>>
>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>
>>>>> - Web frontend server will store temporary files in
>>>>
>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
>>>>
>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>>>
>>>>> plan-json-dumps in
>>>>
>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>>>
>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>
>>>>> - Web-frontend will submit jobs to nephele job-manager on
>>>>
>>>> localhost,
>>>>
>>>>> port 6123.
>>>>
>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>>>
>>>>> - Slf4jLogger started
>>>>
>>>>> 23:22:32,625 INFO Remoting
>>>>
>>>>> - Starting remoting
>>>>
>>>>> 23:22:32,838 INFO Remoting
>>>>
>>>>> - Remoting started; listening on addresses :[akka.tcp://
>>>>
>>>>
>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>>>>
>>>>> 23:23:48,119 WARN Remoting
>>>>
>>>>> - Tried to associate with unreachable remote address [akka.tcp://
>>>>
>>>>
>>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
>>>>
>>>> messages
>>>>
>>>>> to this address will be delivered to dead letters. Reason: Operation
>>>>
>>>> timed
>>>>
>>>>> out: /10.218.98.169:6123
>>>>
>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>>>
>>>>> - Unexpected exception: Could not find job manager at specified
>>>>
>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>>> flink@10.218.98.169:6123/user/jobmanager.
>>>>
>>>>> java.lang.RuntimeException: Could not find job manager at specified
>>>>
>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>>> flink@10.218.98.169:6123/user/jobmanager.
>>>>
>>>>> at
>>>>
>>>>>
>>>>
>>>>
>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>>>
>>>>> at
>>>>
>>>>>
>>>>
>>>>
>>>>
>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>>>
>>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>>>
>>>>>
>>>>
>>>>>
>>>>
>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
>>>>
>>>>> wrote:
>>>>
>>>>>>
>>>>
>>>>>> Hi,
>>>>
>>>>>> you said in the other email thread that the error only occurs for
>>>>
>>>>>> Wordcount, not for Kmeans.
>>>>
>>>>>> Can you copy me the commands for both examples?
>>>>
>>>>>> I can not really believe that there is a difference between the two
>>>>
>>>> jobs.
>>>>
>>>>>>
>>>>
>>>>>> Can you also send us the contents of the jobmanager log file?
>>>>
>>>>>>
>>>>
>>>>>> Best,
>>>>
>>>>>> Robert
>>>>
>>>>>>
>>>>
>>>>>>
>>>>
>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
>> [hidden email]
>>>>
>>>>
>>>>>
>>>>
>>>>>> wrote:
>>>>
>>>>>>
>>>>
>>>>>>> I’m getting "Could not build up connection to JobManager.” When i
>>>>
>>>> tried
>>>>
>>>>> to
>>>>
>>>>>>> run the wordCount example. Can anyone help?
>>>>
>>>>>>>
>>>>
>>>>>>> Dulaj
>>>>
>>>>>
>>>>
>>>>>
>>>>
>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Stephan Ewen
Hi Dulaj!

The log suggests that the JobManager binds itself to the IP
address 10.216.192.98 and the WebClient runs at 127.0.0.1

The 127.0.0.1 actor system cannot connect to the 10.216.192.98.

Let me verify whether this is a quirk of your particular setup, or a bug
recently introduces in the 0.9-SNAPSHOT.

Does the command line work for you? ("bin/flink run <jar>")

taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that the
default of '1' is used.

Greetings,
Stephan



On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <[hidden email]>
wrote:

> Is taskmanager.numberOfTaskSlots: -1 normal?
>
> > On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]> wrote:
> >
> > Hi,
> > I could not find the logfiles attached to your mails. I think the
> > mailinglists are not accepting attachments.
> > Can you put the logs on gist.github.com?
> >
> > The configuration values are documented here:
> > http://flink.apache.org/docs/0.8/config.html
> > For the webclient's port its called webclient.port
> >
> > On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <[hidden email]>
> > wrote:
> >
> >> I tried to kill the job manager manually in the terminal and start it
> >> again but no luck. Also could you tell me if it’s possible to change
> >> webclient’s port (8080) ?
> >>
> >>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
> >>>
> >>> Hey Dulaj!
> >>>
> >>> As a contributor, I would go against the latest version, which is
> >>> 0.9-SNAPSHOT.
> >>>
> >>> It may be in your case that the JobManager actor is down, but the
> process
> >>> still lingers. (BTW: I have a patch pending that makes sure the process
> >>> disappears when the actor via down).
> >>>
> >>> Could you have a look at the log "flink-<user>-jobmanager-<host>-.log"
> >> and
> >>> see if there are any errors logged?
> >>>
> >>> Greetings,
> >>> Stephan
> >>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]>:
> >>>
> >>>> The JobManager seems to run fine. I don't know. When I tried to run
> >>>> start-local.sh again, It shows the PID of the running JobManager and
> >> also
> >>>> :8081 runs fine. I want to contribute to the project and I could get a
> >>>> little boost if I could see the capabilities of FLINK. :)
> >>>> Will it be OK to use 0.8.1 as a developer?
> >>>>
> >>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:
> >>>>
> >>>> Hi Dulaj,
> >>>>
> >>>> That error message indicates that the JobManager is not running. Are
> you
> >>>> sure that the JobManager runs properly? Anything in the JobManager
> logs?
> >>>>
> >>>> BTW: The 0.9 branch is under heavy development / changes. That is why
> it
> >>>> may behave a bit different on different days right now. I would
> >> recommend
> >>>> to use the 0.8.1 release for a stable experience.
> >>>>
> >>>> Greetings,
> >>>> Stephan
> >>>>
> >>>>
> >>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]>
> >>>> wrote:
> >>>>
> >>>> Thank you for the quick reply.
> >>>>
> >>>> The log you've send is from the webclient. Can you also send the log
> of
> >> the
> >>>>
> >>>> JobManager?
> >>>>
> >>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
> [hidden email]>
> >>>>
> >>>> wrote:
> >>>>
> >>>>> Yes. It seams it is not a problem with the arguments. I tried two
> days
> >>>>
> >>>> but
> >>>>
> >>>>> different error occurs. It seams the web client can’t connect to the
> >> job
> >>>>
> >>>>> manager although it is running
> >>>>
> >>>>> Right now, I can’t even get the webclient to run.
> >>>>
> >>>> ./bin/start-webclient.sh
> >>>>
> >>>>> executes fine but I cannot connect to localhost:8080 (even with
> telnet
> >> or
> >>>>
> >>>>> curl)
> >>>>
> >>>>> Here is the log for jobManager
> >>>>
> >>>>>
> >>>>
> >>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
> >>>>
> >>>>> - Setting up web frontend server, using web-root directory
> >>>>
> >>>>>
> >>>>
> >>>> 'jar:
> >>>>
> >>
> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
> >>>> '.
> >>>>
> >>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
> >>>>
> >>>>> - Web frontend server will store temporary files in
> >>>>
> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
> >>>>
> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
> >>>>
> >>>>> plan-json-dumps in
> >>>>
> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
> >>>>
> >>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
> >>>>
> >>>>> - Web-frontend will submit jobs to nephele job-manager on
> >>>>
> >>>> localhost,
> >>>>
> >>>>> port 6123.
> >>>>
> >>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
> >>>>
> >>>>> - Slf4jLogger started
> >>>>
> >>>>> 23:22:32,625 INFO Remoting
> >>>>
> >>>>> - Starting remoting
> >>>>
> >>>>> 23:22:32,838 INFO Remoting
> >>>>
> >>>>> - Remoting started; listening on addresses :[akka.tcp://
> >>>>
> >>>>
> >>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
> >>>>
> >>>>> 23:23:48,119 WARN Remoting
> >>>>
> >>>>> - Tried to associate with unreachable remote address [akka.tcp://
> >>>>
> >>>>
> >>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
> >>>>
> >>>> messages
> >>>>
> >>>>> to this address will be delivered to dead letters. Reason: Operation
> >>>>
> >>>> timed
> >>>>
> >>>>> out: /10.218.98.169:6123
> >>>>
> >>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
> >>>>
> >>>>> - Unexpected exception: Could not find job manager at specified
> >>>>
> >>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> >>>> flink@10.218.98.169:6123/user/jobmanager.
> >>>>
> >>>>> java.lang.RuntimeException: Could not find job manager at specified
> >>>>
> >>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> >>>> flink@10.218.98.169:6123/user/jobmanager.
> >>>>
> >>>>> at
> >>>>
> >>>>>
> >>>>
> >>>>
> >>
> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
> >>>>
> >>>>> at
> >>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>
> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
> >>>>
> >>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
> >>>>
> >>>>>
> >>>>
> >>>>>
> >>>>
> >>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
> >>>>
> >>>>> wrote:
> >>>>
> >>>>>>
> >>>>
> >>>>>> Hi,
> >>>>
> >>>>>> you said in the other email thread that the error only occurs for
> >>>>
> >>>>>> Wordcount, not for Kmeans.
> >>>>
> >>>>>> Can you copy me the commands for both examples?
> >>>>
> >>>>>> I can not really believe that there is a difference between the two
> >>>>
> >>>> jobs.
> >>>>
> >>>>>>
> >>>>
> >>>>>> Can you also send us the contents of the jobmanager log file?
> >>>>
> >>>>>>
> >>>>
> >>>>>> Best,
> >>>>
> >>>>>> Robert
> >>>>
> >>>>>>
> >>>>
> >>>>>>
> >>>>
> >>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
> >> [hidden email]
> >>>>
> >>>>
> >>>>>
> >>>>
> >>>>>> wrote:
> >>>>
> >>>>>>
> >>>>
> >>>>>>> I’m getting "Could not build up connection to JobManager.” When i
> >>>>
> >>>> tried
> >>>>
> >>>>> to
> >>>>
> >>>>>>> run the wordCount example. Can anyone help?
> >>>>
> >>>>>>>
> >>>>
> >>>>>>> Dulaj
> >>>>
> >>>>>
> >>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Stephan Ewen
Hi!

I think that this is a problem in the current master (probably in there
since a few days ago). I am fixing it...

Thanks for reporting it!

Stephan


On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <[hidden email]> wrote:

> Hi Dulaj!
>
> The log suggests that the JobManager binds itself to the IP
> address 10.216.192.98 and the WebClient runs at 127.0.0.1
>
> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
>
> Let me verify whether this is a quirk of your particular setup, or a bug
> recently introduces in the 0.9-SNAPSHOT.
>
> Does the command line work for you? ("bin/flink run <jar>")
>
> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that the
> default of '1' is used.
>
> Greetings,
> Stephan
>
>
>
> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <[hidden email]>
> wrote:
>
>> Is taskmanager.numberOfTaskSlots: -1 normal?
>>
>> > On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]>
>> wrote:
>> >
>> > Hi,
>> > I could not find the logfiles attached to your mails. I think the
>> > mailinglists are not accepting attachments.
>> > Can you put the logs on gist.github.com?
>> >
>> > The configuration values are documented here:
>> > http://flink.apache.org/docs/0.8/config.html
>> > For the webclient's port its called webclient.port
>> >
>> > On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <[hidden email]>
>> > wrote:
>> >
>> >> I tried to kill the job manager manually in the terminal and start it
>> >> again but no luck. Also could you tell me if it’s possible to change
>> >> webclient’s port (8080) ?
>> >>
>> >>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
>> >>>
>> >>> Hey Dulaj!
>> >>>
>> >>> As a contributor, I would go against the latest version, which is
>> >>> 0.9-SNAPSHOT.
>> >>>
>> >>> It may be in your case that the JobManager actor is down, but the
>> process
>> >>> still lingers. (BTW: I have a patch pending that makes sure the
>> process
>> >>> disappears when the actor via down).
>> >>>
>> >>> Could you have a look at the log "flink-<user>-jobmanager-<host>-.log"
>> >> and
>> >>> see if there are any errors logged?
>> >>>
>> >>> Greetings,
>> >>> Stephan
>> >>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]>:
>> >>>
>> >>>> The JobManager seems to run fine. I don't know. When I tried to run
>> >>>> start-local.sh again, It shows the PID of the running JobManager and
>> >> also
>> >>>> :8081 runs fine. I want to contribute to the project and I could get
>> a
>> >>>> little boost if I could see the capabilities of FLINK. :)
>> >>>> Will it be OK to use 0.8.1 as a developer?
>> >>>>
>> >>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]> wrote:
>> >>>>
>> >>>> Hi Dulaj,
>> >>>>
>> >>>> That error message indicates that the JobManager is not running. Are
>> you
>> >>>> sure that the JobManager runs properly? Anything in the JobManager
>> logs?
>> >>>>
>> >>>> BTW: The 0.9 branch is under heavy development / changes. That is
>> why it
>> >>>> may behave a bit different on different days right now. I would
>> >> recommend
>> >>>> to use the 0.8.1 release for a stable experience.
>> >>>>
>> >>>> Greetings,
>> >>>> Stephan
>> >>>>
>> >>>>
>> >>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <[hidden email]
>> >
>> >>>> wrote:
>> >>>>
>> >>>> Thank you for the quick reply.
>> >>>>
>> >>>> The log you've send is from the webclient. Can you also send the log
>> of
>> >> the
>> >>>>
>> >>>> JobManager?
>> >>>>
>> >>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
>> [hidden email]>
>> >>>>
>> >>>> wrote:
>> >>>>
>> >>>>> Yes. It seams it is not a problem with the arguments. I tried two
>> days
>> >>>>
>> >>>> but
>> >>>>
>> >>>>> different error occurs. It seams the web client can’t connect to the
>> >> job
>> >>>>
>> >>>>> manager although it is running
>> >>>>
>> >>>>> Right now, I can’t even get the webclient to run.
>> >>>>
>> >>>> ./bin/start-webclient.sh
>> >>>>
>> >>>>> executes fine but I cannot connect to localhost:8080 (even with
>> telnet
>> >> or
>> >>>>
>> >>>>> curl)
>> >>>>
>> >>>>> Here is the log for jobManager
>> >>>>
>> >>>>>
>> >>>>
>> >>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>
>> >>>>> - Setting up web frontend server, using web-root directory
>> >>>>
>> >>>>>
>> >>>>
>> >>>> 'jar:
>> >>>>
>> >>
>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>> >>>> '.
>> >>>>
>> >>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>
>> >>>>> - Web frontend server will store temporary files in
>> >>>>
>> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs in
>> >>>>
>> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>> >>>>
>> >>>>> plan-json-dumps in
>> >>>>
>> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>> >>>>
>> >>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>
>> >>>>> - Web-frontend will submit jobs to nephele job-manager on
>> >>>>
>> >>>> localhost,
>> >>>>
>> >>>>> port 6123.
>> >>>>
>> >>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>> >>>>
>> >>>>> - Slf4jLogger started
>> >>>>
>> >>>>> 23:22:32,625 INFO Remoting
>> >>>>
>> >>>>> - Starting remoting
>> >>>>
>> >>>>> 23:22:32,838 INFO Remoting
>> >>>>
>> >>>>> - Remoting started; listening on addresses :[akka.tcp://
>> >>>>
>> >>>>
>> >>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>> >>>>
>> >>>>> 23:23:48,119 WARN Remoting
>> >>>>
>> >>>>> - Tried to associate with unreachable remote address [akka.tcp://
>> >>>>
>> >>>>
>> >>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
>> >>>>
>> >>>> messages
>> >>>>
>> >>>>> to this address will be delivered to dead letters. Reason: Operation
>> >>>>
>> >>>> timed
>> >>>>
>> >>>>> out: /10.218.98.169:6123
>> >>>>
>> >>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>> >>>>
>> >>>>> - Unexpected exception: Could not find job manager at specified
>> >>>>
>> >>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> >>>> flink@10.218.98.169:6123/user/jobmanager.
>> >>>>
>> >>>>> java.lang.RuntimeException: Could not find job manager at specified
>> >>>>
>> >>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> >>>> flink@10.218.98.169:6123/user/jobmanager.
>> >>>>
>> >>>>> at
>> >>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>
>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>> >>>>
>> >>>>> at
>> >>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>
>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>> >>>>
>> >>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>> >>>>
>> >>>>>
>> >>>>
>> >>>>>
>> >>>>
>> >>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]>
>> >>>>
>> >>>>> wrote:
>> >>>>
>> >>>>>>
>> >>>>
>> >>>>>> Hi,
>> >>>>
>> >>>>>> you said in the other email thread that the error only occurs for
>> >>>>
>> >>>>>> Wordcount, not for Kmeans.
>> >>>>
>> >>>>>> Can you copy me the commands for both examples?
>> >>>>
>> >>>>>> I can not really believe that there is a difference between the two
>> >>>>
>> >>>> jobs.
>> >>>>
>> >>>>>>
>> >>>>
>> >>>>>> Can you also send us the contents of the jobmanager log file?
>> >>>>
>> >>>>>>
>> >>>>
>> >>>>>> Best,
>> >>>>
>> >>>>>> Robert
>> >>>>
>> >>>>>>
>> >>>>
>> >>>>>>
>> >>>>
>> >>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
>> >> [hidden email]
>> >>>>
>> >>>>
>> >>>>>
>> >>>>
>> >>>>>> wrote:
>> >>>>
>> >>>>>>
>> >>>>
>> >>>>>>> I’m getting "Could not build up connection to JobManager.” When i
>> >>>>
>> >>>> tried
>> >>>>
>> >>>>> to
>> >>>>
>> >>>>>>> run the wordCount example. Can anyone help?
>> >>>>
>> >>>>>>>
>> >>>>
>> >>>>>>> Dulaj
>> >>>>
>> >>>>>
>> >>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Stephan Ewen
BTW: Does still work if you enter "localhost" for "jobmanager.rpc.address"
in your flink-conf.yaml ?

On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <[hidden email]> wrote:

> Hi!
>
> I think that this is a problem in the current master (probably in there
> since a few days ago). I am fixing it...
>
> Thanks for reporting it!
>
> Stephan
>
>
> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <[hidden email]> wrote:
>
>> Hi Dulaj!
>>
>> The log suggests that the JobManager binds itself to the IP
>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
>>
>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
>>
>> Let me verify whether this is a quirk of your particular setup, or a bug
>> recently introduces in the 0.9-SNAPSHOT.
>>
>> Does the command line work for you? ("bin/flink run <jar>")
>>
>> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that the
>> default of '1' is used.
>>
>> Greetings,
>> Stephan
>>
>>
>>
>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <[hidden email]>
>> wrote:
>>
>>> Is taskmanager.numberOfTaskSlots: -1 normal?
>>>
>>> > On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]>
>>> wrote:
>>> >
>>> > Hi,
>>> > I could not find the logfiles attached to your mails. I think the
>>> > mailinglists are not accepting attachments.
>>> > Can you put the logs on gist.github.com?
>>> >
>>> > The configuration values are documented here:
>>> > http://flink.apache.org/docs/0.8/config.html
>>> > For the webclient's port its called webclient.port
>>> >
>>> > On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <[hidden email]
>>> >
>>> > wrote:
>>> >
>>> >> I tried to kill the job manager manually in the terminal and start it
>>> >> again but no luck. Also could you tell me if it’s possible to change
>>> >> webclient’s port (8080) ?
>>> >>
>>> >>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
>>> >>>
>>> >>> Hey Dulaj!
>>> >>>
>>> >>> As a contributor, I would go against the latest version, which is
>>> >>> 0.9-SNAPSHOT.
>>> >>>
>>> >>> It may be in your case that the JobManager actor is down, but the
>>> process
>>> >>> still lingers. (BTW: I have a patch pending that makes sure the
>>> process
>>> >>> disappears when the actor via down).
>>> >>>
>>> >>> Could you have a look at the log
>>> "flink-<user>-jobmanager-<host>-.log"
>>> >> and
>>> >>> see if there are any errors logged?
>>> >>>
>>> >>> Greetings,
>>> >>> Stephan
>>> >>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]
>>> >:
>>> >>>
>>> >>>> The JobManager seems to run fine. I don't know. When I tried to run
>>> >>>> start-local.sh again, It shows the PID of the running JobManager and
>>> >> also
>>> >>>> :8081 runs fine. I want to contribute to the project and I could
>>> get a
>>> >>>> little boost if I could see the capabilities of FLINK. :)
>>> >>>> Will it be OK to use 0.8.1 as a developer?
>>> >>>>
>>> >>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]>
>>> wrote:
>>> >>>>
>>> >>>> Hi Dulaj,
>>> >>>>
>>> >>>> That error message indicates that the JobManager is not running.
>>> Are you
>>> >>>> sure that the JobManager runs properly? Anything in the JobManager
>>> logs?
>>> >>>>
>>> >>>> BTW: The 0.9 branch is under heavy development / changes. That is
>>> why it
>>> >>>> may behave a bit different on different days right now. I would
>>> >> recommend
>>> >>>> to use the 0.8.1 release for a stable experience.
>>> >>>>
>>> >>>> Greetings,
>>> >>>> Stephan
>>> >>>>
>>> >>>>
>>> >>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <
>>> [hidden email]>
>>> >>>> wrote:
>>> >>>>
>>> >>>> Thank you for the quick reply.
>>> >>>>
>>> >>>> The log you've send is from the webclient. Can you also send the
>>> log of
>>> >> the
>>> >>>>
>>> >>>> JobManager?
>>> >>>>
>>> >>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
>>> [hidden email]>
>>> >>>>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Yes. It seams it is not a problem with the arguments. I tried two
>>> days
>>> >>>>
>>> >>>> but
>>> >>>>
>>> >>>>> different error occurs. It seams the web client can’t connect to
>>> the
>>> >> job
>>> >>>>
>>> >>>>> manager although it is running
>>> >>>>
>>> >>>>> Right now, I can’t even get the webclient to run.
>>> >>>>
>>> >>>> ./bin/start-webclient.sh
>>> >>>>
>>> >>>>> executes fine but I cannot connect to localhost:8080 (even with
>>> telnet
>>> >> or
>>> >>>>
>>> >>>>> curl)
>>> >>>>
>>> >>>>> Here is the log for jobManager
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>>> >>>>
>>> >>>>> - Setting up web frontend server, using web-root directory
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>> 'jar:
>>> >>>>
>>> >>
>>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>>> >>>> '.
>>> >>>>
>>> >>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>> >>>>
>>> >>>>> - Web frontend server will store temporary files in
>>> >>>>
>>> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs
>>> in
>>> >>>>
>>> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>> >>>>
>>> >>>>> plan-json-dumps in
>>> >>>>
>>> >>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>> >>>>
>>> >>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>> >>>>
>>> >>>>> - Web-frontend will submit jobs to nephele job-manager on
>>> >>>>
>>> >>>> localhost,
>>> >>>>
>>> >>>>> port 6123.
>>> >>>>
>>> >>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>> >>>>
>>> >>>>> - Slf4jLogger started
>>> >>>>
>>> >>>>> 23:22:32,625 INFO Remoting
>>> >>>>
>>> >>>>> - Starting remoting
>>> >>>>
>>> >>>>> 23:22:32,838 INFO Remoting
>>> >>>>
>>> >>>>> - Remoting started; listening on addresses :[akka.tcp://
>>> >>>>
>>> >>>>
>>> >>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>>> >>>>
>>> >>>>> 23:23:48,119 WARN Remoting
>>> >>>>
>>> >>>>> - Tried to associate with unreachable remote address [akka.tcp://
>>> >>>>
>>> >>>>
>>> >>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
>>> >>>>
>>> >>>> messages
>>> >>>>
>>> >>>>> to this address will be delivered to dead letters. Reason:
>>> Operation
>>> >>>>
>>> >>>> timed
>>> >>>>
>>> >>>>> out: /10.218.98.169:6123
>>> >>>>
>>> >>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>> >>>>
>>> >>>>> - Unexpected exception: Could not find job manager at specified
>>> >>>>
>>> >>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>> >>>> flink@10.218.98.169:6123/user/jobmanager.
>>> >>>>
>>> >>>>> java.lang.RuntimeException: Could not find job manager at specified
>>> >>>>
>>> >>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>> >>>> flink@10.218.98.169:6123/user/jobmanager.
>>> >>>>
>>> >>>>> at
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>
>>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>> >>>>
>>> >>>>> at
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>
>>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>> >>>>
>>> >>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]
>>> >
>>> >>>>
>>> >>>>> wrote:
>>> >>>>
>>> >>>>>>
>>> >>>>
>>> >>>>>> Hi,
>>> >>>>
>>> >>>>>> you said in the other email thread that the error only occurs for
>>> >>>>
>>> >>>>>> Wordcount, not for Kmeans.
>>> >>>>
>>> >>>>>> Can you copy me the commands for both examples?
>>> >>>>
>>> >>>>>> I can not really believe that there is a difference between the
>>> two
>>> >>>>
>>> >>>> jobs.
>>> >>>>
>>> >>>>>>
>>> >>>>
>>> >>>>>> Can you also send us the contents of the jobmanager log file?
>>> >>>>
>>> >>>>>>
>>> >>>>
>>> >>>>>> Best,
>>> >>>>
>>> >>>>>> Robert
>>> >>>>
>>> >>>>>>
>>> >>>>
>>> >>>>>>
>>> >>>>
>>> >>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
>>> >> [hidden email]
>>> >>>>
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>>>> wrote:
>>> >>>>
>>> >>>>>>
>>> >>>>
>>> >>>>>>> I’m getting "Could not build up connection to JobManager.” When i
>>> >>>>
>>> >>>> tried
>>> >>>>
>>> >>>>> to
>>> >>>>
>>> >>>>>>> run the wordCount example. Can anyone help?
>>> >>>>
>>> >>>>>>>
>>> >>>>
>>> >>>>>>> Dulaj
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>
>>> >>
>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Dulaj Viduranga
Hi,
Sorry for the delay to reply on this issue.
the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
This can’t be an issue because the job manager web interface works fine which also runs on localhost

 bin/flink run <jar> doesn’t seem to work either. Let me send you my command and the result in terminal.

bin/flink run /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt $FLINK_DIRECTORY/count

20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
org.apache.flink.client.program.ProgramInvocationException: Could not build up connection to JobManager.
        at org.apache.flink.client.program.Client.run(Client.java:327)
        at org.apache.flink.client.program.Client.run(Client.java:306)
        at org.apache.flink.client.program.Client.run(Client.java:300)
        at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
        at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
        at org.apache.flink.client.program.Client.run(Client.java:250)
        at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
Caused by: java.io.IOException: JobManager at akka.tcp://flink@10.216.177.146:6123/user/jobmanager not reachable. Please make sure that the JobManager is running and its port is reachable.
        at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
        at org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
        at org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
        at org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
        at org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
        at org.apache.flink.client.program.Client.run(Client.java:322)
        ... 15 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
        ... 20 more

The exception above occurred while trying to run your command.


> On Feb 25, 2015, at 1:29 AM, Stephan Ewen <[hidden email]> wrote:
>
> BTW: Does still work if you enter "localhost" for "jobmanager.rpc.address"
> in your flink-conf.yaml ?
>
> On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <[hidden email]> wrote:
>
>> Hi!
>>
>> I think that this is a problem in the current master (probably in there
>> since a few days ago). I am fixing it...
>>
>> Thanks for reporting it!
>>
>> Stephan
>>
>>
>> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <[hidden email]> wrote:
>>
>>> Hi Dulaj!
>>>
>>> The log suggests that the JobManager binds itself to the IP
>>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
>>>
>>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
>>>
>>> Let me verify whether this is a quirk of your particular setup, or a bug
>>> recently introduces in the 0.9-SNAPSHOT.
>>>
>>> Does the command line work for you? ("bin/flink run <jar>")
>>>
>>> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that the
>>> default of '1' is used.
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>>
>>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <[hidden email]>
>>> wrote:
>>>
>>>> Is taskmanager.numberOfTaskSlots: -1 normal?
>>>>
>>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>> I could not find the logfiles attached to your mails. I think the
>>>>> mailinglists are not accepting attachments.
>>>>> Can you put the logs on gist.github.com?
>>>>>
>>>>> The configuration values are documented here:
>>>>> http://flink.apache.org/docs/0.8/config.html
>>>>> For the webclient's port its called webclient.port
>>>>>
>>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <[hidden email]
>>>>>
>>>>> wrote:
>>>>>
>>>>>> I tried to kill the job manager manually in the terminal and start it
>>>>>> again but no luck. Also could you tell me if it’s possible to change
>>>>>> webclient’s port (8080) ?
>>>>>>
>>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]> wrote:
>>>>>>>
>>>>>>> Hey Dulaj!
>>>>>>>
>>>>>>> As a contributor, I would go against the latest version, which is
>>>>>>> 0.9-SNAPSHOT.
>>>>>>>
>>>>>>> It may be in your case that the JobManager actor is down, but the
>>>> process
>>>>>>> still lingers. (BTW: I have a patch pending that makes sure the
>>>> process
>>>>>>> disappears when the actor via down).
>>>>>>>
>>>>>>> Could you have a look at the log
>>>> "flink-<user>-jobmanager-<host>-.log"
>>>>>> and
>>>>>>> see if there are any errors logged?
>>>>>>>
>>>>>>> Greetings,
>>>>>>> Stephan
>>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <[hidden email]
>>>>> :
>>>>>>>
>>>>>>>> The JobManager seems to run fine. I don't know. When I tried to run
>>>>>>>> start-local.sh again, It shows the PID of the running JobManager and
>>>>>> also
>>>>>>>> :8081 runs fine. I want to contribute to the project and I could
>>>> get a
>>>>>>>> little boost if I could see the capabilities of FLINK. :)
>>>>>>>> Will it be OK to use 0.8.1 as a developer?
>>>>>>>>
>>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]>
>>>> wrote:
>>>>>>>>
>>>>>>>> Hi Dulaj,
>>>>>>>>
>>>>>>>> That error message indicates that the JobManager is not running.
>>>> Are you
>>>>>>>> sure that the JobManager runs properly? Anything in the JobManager
>>>> logs?
>>>>>>>>
>>>>>>>> BTW: The 0.9 branch is under heavy development / changes. That is
>>>> why it
>>>>>>>> may behave a bit different on different days right now. I would
>>>>>> recommend
>>>>>>>> to use the 0.8.1 release for a stable experience.
>>>>>>>>
>>>>>>>> Greetings,
>>>>>>>> Stephan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <
>>>> [hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Thank you for the quick reply.
>>>>>>>>
>>>>>>>> The log you've send is from the webclient. Can you also send the
>>>> log of
>>>>>> the
>>>>>>>>
>>>>>>>> JobManager?
>>>>>>>>
>>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
>>>> [hidden email]>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Yes. It seams it is not a problem with the arguments. I tried two
>>>> days
>>>>>>>>
>>>>>>>> but
>>>>>>>>
>>>>>>>>> different error occurs. It seams the web client can’t connect to
>>>> the
>>>>>> job
>>>>>>>>
>>>>>>>>> manager although it is running
>>>>>>>>
>>>>>>>>> Right now, I can’t even get the webclient to run.
>>>>>>>>
>>>>>>>> ./bin/start-webclient.sh
>>>>>>>>
>>>>>>>>> executes fine but I cannot connect to localhost:8080 (even with
>>>> telnet
>>>>>> or
>>>>>>>>
>>>>>>>>> curl)
>>>>>>>>
>>>>>>>>> Here is the log for jobManager
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>
>>>>>>>>> - Setting up web frontend server, using web-root directory
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> 'jar:
>>>>>>>>
>>>>>>
>>>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>>>>>>>> '.
>>>>>>>>
>>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>
>>>>>>>>> - Web frontend server will store temporary files in
>>>>>>>>
>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs
>>>> in
>>>>>>>>
>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>>>>>>>
>>>>>>>>> plan-json-dumps in
>>>>>>>>
>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>>>>>>>
>>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>
>>>>>>>>> - Web-frontend will submit jobs to nephele job-manager on
>>>>>>>>
>>>>>>>> localhost,
>>>>>>>>
>>>>>>>>> port 6123.
>>>>>>>>
>>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>>>>>>>
>>>>>>>>> - Slf4jLogger started
>>>>>>>>
>>>>>>>>> 23:22:32,625 INFO Remoting
>>>>>>>>
>>>>>>>>> - Starting remoting
>>>>>>>>
>>>>>>>>> 23:22:32,838 INFO Remoting
>>>>>>>>
>>>>>>>>> - Remoting started; listening on addresses :[akka.tcp://
>>>>>>>>
>>>>>>>>
>>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>>>>>>>>
>>>>>>>>> 23:23:48,119 WARN Remoting
>>>>>>>>
>>>>>>>>> - Tried to associate with unreachable remote address [akka.tcp://
>>>>>>>>
>>>>>>>>
>>>>>>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
>>>>>>>>
>>>>>>>> messages
>>>>>>>>
>>>>>>>>> to this address will be delivered to dead letters. Reason:
>>>> Operation
>>>>>>>>
>>>>>>>> timed
>>>>>>>>
>>>>>>>>> out: /10.218.98.169:6123
>>>>>>>>
>>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>>>>>>>
>>>>>>>>> - Unexpected exception: Could not find job manager at specified
>>>>>>>>
>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>>>>>>>>
>>>>>>>>> java.lang.RuntimeException: Could not find job manager at specified
>>>>>>>>
>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>>>>>>>>
>>>>>>>>> at
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>>>>>>>
>>>>>>>>> at
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>>>>>>>
>>>>>>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <[hidden email]
>>>>>
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>>> you said in the other email thread that the error only occurs for
>>>>>>>>
>>>>>>>>>> Wordcount, not for Kmeans.
>>>>>>>>
>>>>>>>>>> Can you copy me the commands for both examples?
>>>>>>>>
>>>>>>>>>> I can not really believe that there is a difference between the
>>>> two
>>>>>>>>
>>>>>>>> jobs.
>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>>> Can you also send us the contents of the jobmanager log file?
>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>
>>>>>>>>>> Robert
>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
>>>>>> [hidden email]
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>> I’m getting "Could not build up connection to JobManager.” When i
>>>>>>>>
>>>>>>>> tried
>>>>>>>>
>>>>>>>>> to
>>>>>>>>
>>>>>>>>>>> run the wordCount example. Can anyone help?
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>>>>> Dulaj
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Stephan Ewen
Okay, the problem seems to be that even though both the client and the
jobmanager use "localhost" as the host name, they resolve this to different
IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146

 Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
apparently.

Can you help us debug this by checking the following:

 - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if
that solves it?
 - Can you try and set "jobmanager.rpc.address" to the other address
(10.216.177.146
or so) and see if that solves it?
 - Can you do "start-cluster.sh", rather than "start-local.sh" and see
whether the webfrontend displays that the TaskManager connects?
 - As a hard core test: Can you bring up the jobmanager, check where it
connects (10.216.192.98:6123 or so) and see whether the port is reachable?

We have recently updated how the Akka URLs are build, to work around a
limitation in Akka. Seems that did not yet fully solve the issue.

Thanks for helping us debug this, it is not the easiest immigration
experience, but the outcome is probably extremely valuable for the project
:-)

Greetings,
Stephan


On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <[hidden email]>
wrote:

> Hi,
> Sorry for the delay to reply on this issue.
> the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
> This can’t be an issue because the job manager web interface works fine
> which also runs on localhost
>
>  bin/flink run <jar> doesn’t seem to work either. Let me send you my
> command and the result in terminal.
>
> bin/flink run
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
> $FLINK_DIRECTORY/count
>
> 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
>        - Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> org.apache.flink.client.program.ProgramInvocationException: Could not
> build up connection to JobManager.
>         at org.apache.flink.client.program.Client.run(Client.java:327)
>         at org.apache.flink.client.program.Client.run(Client.java:306)
>         at org.apache.flink.client.program.Client.run(Client.java:300)
>         at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>         at
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:483)
>         at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>         at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>         at org.apache.flink.client.program.Client.run(Client.java:250)
>         at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>         at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> Caused by: java.io.IOException: JobManager at akka.tcp://
> flink@10.216.177.146:6123/user/jobmanager not reachable. Please make sure
> that the JobManager is running and its port is reachable.
>         at
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
>         at
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>         at
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>         at
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>         at
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>         at org.apache.flink.client.program.Client.run(Client.java:322)
>         ... 15 more
> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
> [10000 milliseconds]
>         at
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>         at
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>         at
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>         at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>         at scala.concurrent.Await$.result(package.scala:107)
>         at
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
>         ... 20 more
>
> The exception above occurred while trying to run your command.
>
>
> > On Feb 25, 2015, at 1:29 AM, Stephan Ewen <[hidden email]> wrote:
> >
> > BTW: Does still work if you enter "localhost" for
> "jobmanager.rpc.address"
> > in your flink-conf.yaml ?
> >
> > On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <[hidden email]> wrote:
> >
> >> Hi!
> >>
> >> I think that this is a problem in the current master (probably in there
> >> since a few days ago). I am fixing it...
> >>
> >> Thanks for reporting it!
> >>
> >> Stephan
> >>
> >>
> >> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <[hidden email]> wrote:
> >>
> >>> Hi Dulaj!
> >>>
> >>> The log suggests that the JobManager binds itself to the IP
> >>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
> >>>
> >>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
> >>>
> >>> Let me verify whether this is a quirk of your particular setup, or a
> bug
> >>> recently introduces in the 0.9-SNAPSHOT.
> >>>
> >>> Does the command line work for you? ("bin/flink run <jar>")
> >>>
> >>> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that
> the
> >>> default of '1' is used.
> >>>
> >>> Greetings,
> >>> Stephan
> >>>
> >>>
> >>>
> >>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <[hidden email]
> >
> >>> wrote:
> >>>
> >>>> Is taskmanager.numberOfTaskSlots: -1 normal?
> >>>>
> >>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]>
> >>>> wrote:
> >>>>>
> >>>>> Hi,
> >>>>> I could not find the logfiles attached to your mails. I think the
> >>>>> mailinglists are not accepting attachments.
> >>>>> Can you put the logs on gist.github.com?
> >>>>>
> >>>>> The configuration values are documented here:
> >>>>> http://flink.apache.org/docs/0.8/config.html
> >>>>> For the webclient's port its called webclient.port
> >>>>>
> >>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <
> [hidden email]
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> I tried to kill the job manager manually in the terminal and start
> it
> >>>>>> again but no luck. Also could you tell me if it’s possible to change
> >>>>>> webclient’s port (8080) ?
> >>>>>>
> >>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]>
> wrote:
> >>>>>>>
> >>>>>>> Hey Dulaj!
> >>>>>>>
> >>>>>>> As a contributor, I would go against the latest version, which is
> >>>>>>> 0.9-SNAPSHOT.
> >>>>>>>
> >>>>>>> It may be in your case that the JobManager actor is down, but the
> >>>> process
> >>>>>>> still lingers. (BTW: I have a patch pending that makes sure the
> >>>> process
> >>>>>>> disappears when the actor via down).
> >>>>>>>
> >>>>>>> Could you have a look at the log
> >>>> "flink-<user>-jobmanager-<host>-.log"
> >>>>>> and
> >>>>>>> see if there are any errors logged?
> >>>>>>>
> >>>>>>> Greetings,
> >>>>>>> Stephan
> >>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <
> [hidden email]
> >>>>> :
> >>>>>>>
> >>>>>>>> The JobManager seems to run fine. I don't know. When I tried to
> run
> >>>>>>>> start-local.sh again, It shows the PID of the running JobManager
> and
> >>>>>> also
> >>>>>>>> :8081 runs fine. I want to contribute to the project and I could
> >>>> get a
> >>>>>>>> little boost if I could see the capabilities of FLINK. :)
> >>>>>>>> Will it be OK to use 0.8.1 as a developer?
> >>>>>>>>
> >>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]>
> >>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi Dulaj,
> >>>>>>>>
> >>>>>>>> That error message indicates that the JobManager is not running.
> >>>> Are you
> >>>>>>>> sure that the JobManager runs properly? Anything in the JobManager
> >>>> logs?
> >>>>>>>>
> >>>>>>>> BTW: The 0.9 branch is under heavy development / changes. That is
> >>>> why it
> >>>>>>>> may behave a bit different on different days right now. I would
> >>>>>> recommend
> >>>>>>>> to use the 0.8.1 release for a stable experience.
> >>>>>>>>
> >>>>>>>> Greetings,
> >>>>>>>> Stephan
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <
> >>>> [hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Thank you for the quick reply.
> >>>>>>>>
> >>>>>>>> The log you've send is from the webclient. Can you also send the
> >>>> log of
> >>>>>> the
> >>>>>>>>
> >>>>>>>> JobManager?
> >>>>>>>>
> >>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
> >>>> [hidden email]>
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Yes. It seams it is not a problem with the arguments. I tried two
> >>>> days
> >>>>>>>>
> >>>>>>>> but
> >>>>>>>>
> >>>>>>>>> different error occurs. It seams the web client can’t connect to
> >>>> the
> >>>>>> job
> >>>>>>>>
> >>>>>>>>> manager although it is running
> >>>>>>>>
> >>>>>>>>> Right now, I can’t even get the webclient to run.
> >>>>>>>>
> >>>>>>>> ./bin/start-webclient.sh
> >>>>>>>>
> >>>>>>>>> executes fine but I cannot connect to localhost:8080 (even with
> >>>> telnet
> >>>>>> or
> >>>>>>>>
> >>>>>>>>> curl)
> >>>>>>>>
> >>>>>>>>> Here is the log for jobManager
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>
> >>>>>>>>> - Setting up web frontend server, using web-root directory
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> 'jar:
> >>>>>>>>
> >>>>>>
> >>>>
> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
> >>>>>>>> '.
> >>>>>>>>
> >>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>
> >>>>>>>>> - Web frontend server will store temporary files in
> >>>>>>>>
> >>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded jobs
> >>>> in
> >>>>>>>>
> >>>>>>>>>
> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
> >>>>>>>>
> >>>>>>>>> plan-json-dumps in
> >>>>>>>>
> >>>>>>>>>
> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
> >>>>>>>>
> >>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>
> >>>>>>>>> - Web-frontend will submit jobs to nephele job-manager on
> >>>>>>>>
> >>>>>>>> localhost,
> >>>>>>>>
> >>>>>>>>> port 6123.
> >>>>>>>>
> >>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
> >>>>>>>>
> >>>>>>>>> - Slf4jLogger started
> >>>>>>>>
> >>>>>>>>> 23:22:32,625 INFO Remoting
> >>>>>>>>
> >>>>>>>>> - Starting remoting
> >>>>>>>>
> >>>>>>>>> 23:22:32,838 INFO Remoting
> >>>>>>>>
> >>>>>>>>> - Remoting started; listening on addresses :[akka.tcp://
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
> >>>>>>>>
> >>>>>>>>> 23:23:48,119 WARN Remoting
> >>>>>>>>
> >>>>>>>>> - Tried to associate with unreachable remote address [akka.tcp://
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms, all
> >>>>>>>>
> >>>>>>>> messages
> >>>>>>>>
> >>>>>>>>> to this address will be delivered to dead letters. Reason:
> >>>> Operation
> >>>>>>>>
> >>>>>>>> timed
> >>>>>>>>
> >>>>>>>>> out: /10.218.98.169:6123
> >>>>>>>>
> >>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
> >>>>>>>>
> >>>>>>>>> - Unexpected exception: Could not find job manager at specified
> >>>>>>>>
> >>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> >>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
> >>>>>>>>
> >>>>>>>>> java.lang.RuntimeException: Could not find job manager at
> specified
> >>>>>>>>
> >>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> >>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
> >>>>>>>>
> >>>>>>>>> at
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
> >>>>>>>>
> >>>>>>>>> at
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>
> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
> >>>>>>>>
> >>>>>>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <
> [hidden email]
> >>>>>
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>>>> you said in the other email thread that the error only occurs
> for
> >>>>>>>>
> >>>>>>>>>> Wordcount, not for Kmeans.
> >>>>>>>>
> >>>>>>>>>> Can you copy me the commands for both examples?
> >>>>>>>>
> >>>>>>>>>> I can not really believe that there is a difference between the
> >>>> two
> >>>>>>>>
> >>>>>>>> jobs.
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>>> Can you also send us the contents of the jobmanager log file?
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>
> >>>>>>>>>> Robert
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
> >>>>>> [hidden email]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>>>> I’m getting "Could not build up connection to JobManager.”
> When i
> >>>>>>>>
> >>>>>>>> tried
> >>>>>>>>
> >>>>>>>>> to
> >>>>>>>>
> >>>>>>>>>>> run the wordCount example. Can anyone help?
> >>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>
> >>>>>>>>>>> Dulaj
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Stephan Ewen
Addition: To check whether a port is reachable, I think the easiest thing
is to try and connect with a telnet client and see if the connection is
refused.

On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <[hidden email]> wrote:

> Okay, the problem seems to be that even though both the client and the
> jobmanager use "localhost" as the host name, they resolve this to different
> IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146
>
>  Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
> apparently.
>
> Can you help us debug this by checking the following:
>
>  - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if
> that solves it?
>  - Can you try and set "jobmanager.rpc.address" to the other address (10.216.177.146
> or so) and see if that solves it?
>  - Can you do "start-cluster.sh", rather than "start-local.sh" and see
> whether the webfrontend displays that the TaskManager connects?
>  - As a hard core test: Can you bring up the jobmanager, check where it
> connects (10.216.192.98:6123 or so) and see whether the port is reachable?
>
> We have recently updated how the Akka URLs are build, to work around a
> limitation in Akka. Seems that did not yet fully solve the issue.
>
> Thanks for helping us debug this, it is not the easiest immigration
> experience, but the outcome is probably extremely valuable for the project
> :-)
>
> Greetings,
> Stephan
>
>
> On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <[hidden email]>
> wrote:
>
>> Hi,
>> Sorry for the delay to reply on this issue.
>> the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
>> This can’t be an issue because the job manager web interface works fine
>> which also runs on localhost
>>
>>  bin/flink run <jar> doesn’t seem to work either. Let me send you my
>> command and the result in terminal.
>>
>> bin/flink run
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
>> $FLINK_DIRECTORY/count
>>
>> 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
>>        - Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>> org.apache.flink.client.program.ProgramInvocationException: Could not
>> build up connection to JobManager.
>>         at org.apache.flink.client.program.Client.run(Client.java:327)
>>         at org.apache.flink.client.program.Client.run(Client.java:306)
>>         at org.apache.flink.client.program.Client.run(Client.java:300)
>>         at
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>         at
>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:483)
>>         at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>         at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>         at org.apache.flink.client.program.Client.run(Client.java:250)
>>         at
>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>         at
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>> Caused by: java.io.IOException: JobManager at akka.tcp://
>> flink@10.216.177.146:6123/user/jobmanager not reachable. Please make
>> sure that the JobManager is running and its port is reachable.
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
>>         at
>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>>         at
>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>>         at
>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>>         at
>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>>         at org.apache.flink.client.program.Client.run(Client.java:322)
>>         ... 15 more
>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
>> [10000 milliseconds]
>>         at
>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>         at
>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>         at
>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>         at
>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>         at scala.concurrent.Await$.result(package.scala:107)
>>         at
>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
>>         ... 20 more
>>
>> The exception above occurred while trying to run your command.
>>
>>
>> > On Feb 25, 2015, at 1:29 AM, Stephan Ewen <[hidden email]> wrote:
>> >
>> > BTW: Does still work if you enter "localhost" for
>> "jobmanager.rpc.address"
>> > in your flink-conf.yaml ?
>> >
>> > On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <[hidden email]> wrote:
>> >
>> >> Hi!
>> >>
>> >> I think that this is a problem in the current master (probably in there
>> >> since a few days ago). I am fixing it...
>> >>
>> >> Thanks for reporting it!
>> >>
>> >> Stephan
>> >>
>> >>
>> >> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <[hidden email]>
>> wrote:
>> >>
>> >>> Hi Dulaj!
>> >>>
>> >>> The log suggests that the JobManager binds itself to the IP
>> >>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
>> >>>
>> >>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
>> >>>
>> >>> Let me verify whether this is a quirk of your particular setup, or a
>> bug
>> >>> recently introduces in the 0.9-SNAPSHOT.
>> >>>
>> >>> Does the command line work for you? ("bin/flink run <jar>")
>> >>>
>> >>> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that
>> the
>> >>> default of '1' is used.
>> >>>
>> >>> Greetings,
>> >>> Stephan
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <
>> [hidden email]>
>> >>> wrote:
>> >>>
>> >>>> Is taskmanager.numberOfTaskSlots: -1 normal?
>> >>>>
>> >>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]>
>> >>>> wrote:
>> >>>>>
>> >>>>> Hi,
>> >>>>> I could not find the logfiles attached to your mails. I think the
>> >>>>> mailinglists are not accepting attachments.
>> >>>>> Can you put the logs on gist.github.com?
>> >>>>>
>> >>>>> The configuration values are documented here:
>> >>>>> http://flink.apache.org/docs/0.8/config.html
>> >>>>> For the webclient's port its called webclient.port
>> >>>>>
>> >>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <
>> [hidden email]
>> >>>>>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> I tried to kill the job manager manually in the terminal and start
>> it
>> >>>>>> again but no luck. Also could you tell me if it’s possible to
>> change
>> >>>>>> webclient’s port (8080) ?
>> >>>>>>
>> >>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]>
>> wrote:
>> >>>>>>>
>> >>>>>>> Hey Dulaj!
>> >>>>>>>
>> >>>>>>> As a contributor, I would go against the latest version, which is
>> >>>>>>> 0.9-SNAPSHOT.
>> >>>>>>>
>> >>>>>>> It may be in your case that the JobManager actor is down, but the
>> >>>> process
>> >>>>>>> still lingers. (BTW: I have a patch pending that makes sure the
>> >>>> process
>> >>>>>>> disappears when the actor via down).
>> >>>>>>>
>> >>>>>>> Could you have a look at the log
>> >>>> "flink-<user>-jobmanager-<host>-.log"
>> >>>>>> and
>> >>>>>>> see if there are any errors logged?
>> >>>>>>>
>> >>>>>>> Greetings,
>> >>>>>>> Stephan
>> >>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <
>> [hidden email]
>> >>>>> :
>> >>>>>>>
>> >>>>>>>> The JobManager seems to run fine. I don't know. When I tried to
>> run
>> >>>>>>>> start-local.sh again, It shows the PID of the running JobManager
>> and
>> >>>>>> also
>> >>>>>>>> :8081 runs fine. I want to contribute to the project and I could
>> >>>> get a
>> >>>>>>>> little boost if I could see the capabilities of FLINK. :)
>> >>>>>>>> Will it be OK to use 0.8.1 as a developer?
>> >>>>>>>>
>> >>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]>
>> >>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Hi Dulaj,
>> >>>>>>>>
>> >>>>>>>> That error message indicates that the JobManager is not running.
>> >>>> Are you
>> >>>>>>>> sure that the JobManager runs properly? Anything in the
>> JobManager
>> >>>> logs?
>> >>>>>>>>
>> >>>>>>>> BTW: The 0.9 branch is under heavy development / changes. That is
>> >>>> why it
>> >>>>>>>> may behave a bit different on different days right now. I would
>> >>>>>> recommend
>> >>>>>>>> to use the 0.8.1 release for a stable experience.
>> >>>>>>>>
>> >>>>>>>> Greetings,
>> >>>>>>>> Stephan
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <
>> >>>> [hidden email]>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>> Thank you for the quick reply.
>> >>>>>>>>
>> >>>>>>>> The log you've send is from the webclient. Can you also send the
>> >>>> log of
>> >>>>>> the
>> >>>>>>>>
>> >>>>>>>> JobManager?
>> >>>>>>>>
>> >>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
>> >>>> [hidden email]>
>> >>>>>>>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Yes. It seams it is not a problem with the arguments. I tried
>> two
>> >>>> days
>> >>>>>>>>
>> >>>>>>>> but
>> >>>>>>>>
>> >>>>>>>>> different error occurs. It seams the web client can’t connect to
>> >>>> the
>> >>>>>> job
>> >>>>>>>>
>> >>>>>>>>> manager although it is running
>> >>>>>>>>
>> >>>>>>>>> Right now, I can’t even get the webclient to run.
>> >>>>>>>>
>> >>>>>>>> ./bin/start-webclient.sh
>> >>>>>>>>
>> >>>>>>>>> executes fine but I cannot connect to localhost:8080 (even with
>> >>>> telnet
>> >>>>>> or
>> >>>>>>>>
>> >>>>>>>>> curl)
>> >>>>>>>>
>> >>>>>>>>> Here is the log for jobManager
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>>>>>
>> >>>>>>>>> - Setting up web frontend server, using web-root directory
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>> 'jar:
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>> >>>>>>>> '.
>> >>>>>>>>
>> >>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>>>>>
>> >>>>>>>>> - Web frontend server will store temporary files in
>> >>>>>>>>
>> >>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded
>> jobs
>> >>>> in
>> >>>>>>>>
>> >>>>>>>>>
>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>> >>>>>>>>
>> >>>>>>>>> plan-json-dumps in
>> >>>>>>>>
>> >>>>>>>>>
>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>> >>>>>>>>
>> >>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>> >>>>>>>>
>> >>>>>>>>> - Web-frontend will submit jobs to nephele job-manager on
>> >>>>>>>>
>> >>>>>>>> localhost,
>> >>>>>>>>
>> >>>>>>>>> port 6123.
>> >>>>>>>>
>> >>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>> >>>>>>>>
>> >>>>>>>>> - Slf4jLogger started
>> >>>>>>>>
>> >>>>>>>>> 23:22:32,625 INFO Remoting
>> >>>>>>>>
>> >>>>>>>>> - Starting remoting
>> >>>>>>>>
>> >>>>>>>>> 23:22:32,838 INFO Remoting
>> >>>>>>>>
>> >>>>>>>>> - Remoting started; listening on addresses :[akka.tcp://
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>> >>>>>>>>
>> >>>>>>>>> 23:23:48,119 WARN Remoting
>> >>>>>>>>
>> >>>>>>>>> - Tried to associate with unreachable remote address
>> [akka.tcp://
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms,
>> all
>> >>>>>>>>
>> >>>>>>>> messages
>> >>>>>>>>
>> >>>>>>>>> to this address will be delivered to dead letters. Reason:
>> >>>> Operation
>> >>>>>>>>
>> >>>>>>>> timed
>> >>>>>>>>
>> >>>>>>>>> out: /10.218.98.169:6123
>> >>>>>>>>
>> >>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>> >>>>>>>>
>> >>>>>>>>> - Unexpected exception: Could not find job manager at specified
>> >>>>>>>>
>> >>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> >>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>> >>>>>>>>
>> >>>>>>>>> java.lang.RuntimeException: Could not find job manager at
>> specified
>> >>>>>>>>
>> >>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>> >>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>> >>>>>>>>
>> >>>>>>>>> at
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>> >>>>>>>>
>> >>>>>>>>> at
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>
>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>> >>>>>>>>
>> >>>>>>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <
>> [hidden email]
>> >>>>>
>> >>>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> Hi,
>> >>>>>>>>
>> >>>>>>>>>> you said in the other email thread that the error only occurs
>> for
>> >>>>>>>>
>> >>>>>>>>>> Wordcount, not for Kmeans.
>> >>>>>>>>
>> >>>>>>>>>> Can you copy me the commands for both examples?
>> >>>>>>>>
>> >>>>>>>>>> I can not really believe that there is a difference between the
>> >>>> two
>> >>>>>>>>
>> >>>>>>>> jobs.
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> Can you also send us the contents of the jobmanager log file?
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>
>> >>>>>>>>>> Robert
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
>> >>>>>> [hidden email]
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>>> I’m getting "Could not build up connection to JobManager.”
>> When i
>> >>>>>>>>
>> >>>>>>>> tried
>> >>>>>>>>
>> >>>>>>>>> to
>> >>>>>>>>
>> >>>>>>>>>>> run the wordCount example. Can anyone help?
>> >>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>>> Dulaj
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>
>> >>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Dulaj Viduranga
Hi,
        It’s great to help out. :)

        Setting 127.0.0.1 instead of “localhost” in jobmanager.rpc.address, helped to build the connection to the jobmanager. Apparently localhost resolving is different in webclient and the jobmanager. I think it’s good to set "jobmanager.rpc.address: 127.0.0.1" in future builds.
        But then I get this error when I tried to run examples. I don’t know if I should move this issue to another thread. If so please tell me.

bin/flink run /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt $FLINK_DIRECTORY/count


20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader                       - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
02/26/2015 20:46:23 Job execution switched to status RUNNING.
02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to SCHEDULED
02/26/2015 20:46:23 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to DEPLOYING
02/26/2015 20:48:03 CHAIN DataSource (at getTextDataSet(WordCount.java:141) (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1) switched to FAILED
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
        at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
        at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
        at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
        at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
        at java.lang.Thread.run(Thread.java:745)

02/26/2015 20:48:03 Job execution switched to status FAILING.
02/26/2015 20:48:03 Reduce (SUM(1), at main(WordCount.java:72)(1/1) switched to CANCELED
02/26/2015 20:48:03 DataSink(CsvOutputFormat (path: /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count, delimiter:  ))(1/1) switched to CANCELED
02/26/2015 20:48:03 Job execution switched to status FAILED.
org.apache.flink.client.program.ProgramInvocationException: The program execution failed.
        at org.apache.flink.client.program.Client.run(Client.java:344)
        at org.apache.flink.client.program.Client.run(Client.java:306)
        at org.apache.flink.client.program.Client.run(Client.java:300)
        at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
        at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
        at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
        at org.apache.flink.client.program.Client.run(Client.java:250)
        at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
        at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:284)
        at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
        at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
        at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
        at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
        at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
        at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
        at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
        at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:88)
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
        at akka.actor.ActorCell.invoke(ActorCell.scala:487)
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
        at akka.dispatch.Mailbox.run(Mailbox.scala:221)
        at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
        at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
        at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
        at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
        at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
        at java.lang.Thread.run(Thread.java:745)

The exception above occurred while trying to run your command.


> On Feb 26, 2015, at 12:46 AM, Stephan Ewen <[hidden email]> wrote:
>
> Addition: To check whether a port is reachable, I think the easiest thing
> is to try and connect with a telnet client and see if the connection is
> refused.
>
> On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <[hidden email]> wrote:
>
>> Okay, the problem seems to be that even though both the client and the
>> jobmanager use "localhost" as the host name, they resolve this to different
>> IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146
>>
>> Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
>> apparently.
>>
>> Can you help us debug this by checking the following:
>>
>> - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if
>> that solves it?
>> - Can you try and set "jobmanager.rpc.address" to the other address (10.216.177.146
>> or so) and see if that solves it?
>> - Can you do "start-cluster.sh", rather than "start-local.sh" and see
>> whether the webfrontend displays that the TaskManager connects?
>> - As a hard core test: Can you bring up the jobmanager, check where it
>> connects (10.216.192.98:6123 or so) and see whether the port is reachable?
>>
>> We have recently updated how the Akka URLs are build, to work around a
>> limitation in Akka. Seems that did not yet fully solve the issue.
>>
>> Thanks for helping us debug this, it is not the easiest immigration
>> experience, but the outcome is probably extremely valuable for the project
>> :-)
>>
>> Greetings,
>> Stephan
>>
>>
>> On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <[hidden email]>
>> wrote:
>>
>>> Hi,
>>> Sorry for the delay to reply on this issue.
>>> the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
>>> This can’t be an issue because the job manager web interface works fine
>>> which also runs on localhost
>>>
>>> bin/flink run <jar> doesn’t seem to work either. Let me send you my
>>> command and the result in terminal.
>>>
>>> bin/flink run
>>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
>>> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
>>> $FLINK_DIRECTORY/count
>>>
>>> 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
>>>       - Unable to load native-hadoop library for your platform... using
>>> builtin-java classes where applicable
>>> org.apache.flink.client.program.ProgramInvocationException: Could not
>>> build up connection to JobManager.
>>>        at org.apache.flink.client.program.Client.run(Client.java:327)
>>>        at org.apache.flink.client.program.Client.run(Client.java:306)
>>>        at org.apache.flink.client.program.Client.run(Client.java:300)
>>>        at
>>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>>>        at
>>> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>        at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>        at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>        at java.lang.reflect.Method.invoke(Method.java:483)
>>>        at
>>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>>>        at
>>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>>>        at org.apache.flink.client.program.Client.run(Client.java:250)
>>>        at
>>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>>>        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>>>        at
>>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>>>        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
>>> Caused by: java.io.IOException: JobManager at akka.tcp://
>>> flink@10.216.177.146:6123/user/jobmanager not reachable. Please make
>>> sure that the JobManager is running and its port is reachable.
>>>        at
>>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
>>>        at
>>> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
>>>        at
>>> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
>>>        at
>>> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
>>>        at
>>> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
>>>        at org.apache.flink.client.program.Client.run(Client.java:322)
>>>        ... 15 more
>>> Caused by: java.util.concurrent.TimeoutException: Futures timed out after
>>> [10000 milliseconds]
>>>        at
>>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
>>>        at
>>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>>>        at
>>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>>>        at
>>> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>>>        at scala.concurrent.Await$.result(package.scala:107)
>>>        at
>>> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
>>>        ... 20 more
>>>
>>> The exception above occurred while trying to run your command.
>>>
>>>
>>>> On Feb 25, 2015, at 1:29 AM, Stephan Ewen <[hidden email]> wrote:
>>>>
>>>> BTW: Does still work if you enter "localhost" for
>>> "jobmanager.rpc.address"
>>>> in your flink-conf.yaml ?
>>>>
>>>> On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <[hidden email]> wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I think that this is a problem in the current master (probably in there
>>>>> since a few days ago). I am fixing it...
>>>>>
>>>>> Thanks for reporting it!
>>>>>
>>>>> Stephan
>>>>>
>>>>>
>>>>> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <[hidden email]>
>>> wrote:
>>>>>
>>>>>> Hi Dulaj!
>>>>>>
>>>>>> The log suggests that the JobManager binds itself to the IP
>>>>>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
>>>>>>
>>>>>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
>>>>>>
>>>>>> Let me verify whether this is a quirk of your particular setup, or a
>>> bug
>>>>>> recently introduces in the 0.9-SNAPSHOT.
>>>>>>
>>>>>> Does the command line work for you? ("bin/flink run <jar>")
>>>>>>
>>>>>> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that
>>> the
>>>>>> default of '1' is used.
>>>>>>
>>>>>> Greetings,
>>>>>> Stephan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <
>>> [hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> Is taskmanager.numberOfTaskSlots: -1 normal?
>>>>>>>
>>>>>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I could not find the logfiles attached to your mails. I think the
>>>>>>>> mailinglists are not accepting attachments.
>>>>>>>> Can you put the logs on gist.github.com?
>>>>>>>>
>>>>>>>> The configuration values are documented here:
>>>>>>>> http://flink.apache.org/docs/0.8/config.html
>>>>>>>> For the webclient's port its called webclient.port
>>>>>>>>
>>>>>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <
>>> [hidden email]
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I tried to kill the job manager manually in the terminal and start
>>> it
>>>>>>>>> again but no luck. Also could you tell me if it’s possible to
>>> change
>>>>>>>>> webclient’s port (8080) ?
>>>>>>>>>
>>>>>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hey Dulaj!
>>>>>>>>>>
>>>>>>>>>> As a contributor, I would go against the latest version, which is
>>>>>>>>>> 0.9-SNAPSHOT.
>>>>>>>>>>
>>>>>>>>>> It may be in your case that the JobManager actor is down, but the
>>>>>>> process
>>>>>>>>>> still lingers. (BTW: I have a patch pending that makes sure the
>>>>>>> process
>>>>>>>>>> disappears when the actor via down).
>>>>>>>>>>
>>>>>>>>>> Could you have a look at the log
>>>>>>> "flink-<user>-jobmanager-<host>-.log"
>>>>>>>>> and
>>>>>>>>>> see if there are any errors logged?
>>>>>>>>>>
>>>>>>>>>> Greetings,
>>>>>>>>>> Stephan
>>>>>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <
>>> [hidden email]
>>>>>>>> :
>>>>>>>>>>
>>>>>>>>>>> The JobManager seems to run fine. I don't know. When I tried to
>>> run
>>>>>>>>>>> start-local.sh again, It shows the PID of the running JobManager
>>> and
>>>>>>>>> also
>>>>>>>>>>> :8081 runs fine. I want to contribute to the project and I could
>>>>>>> get a
>>>>>>>>>>> little boost if I could see the capabilities of FLINK. :)
>>>>>>>>>>> Will it be OK to use 0.8.1 as a developer?
>>>>>>>>>>>
>>>>>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Dulaj,
>>>>>>>>>>>
>>>>>>>>>>> That error message indicates that the JobManager is not running.
>>>>>>> Are you
>>>>>>>>>>> sure that the JobManager runs properly? Anything in the
>>> JobManager
>>>>>>> logs?
>>>>>>>>>>>
>>>>>>>>>>> BTW: The 0.9 branch is under heavy development / changes. That is
>>>>>>> why it
>>>>>>>>>>> may behave a bit different on different days right now. I would
>>>>>>>>> recommend
>>>>>>>>>>> to use the 0.8.1 release for a stable experience.
>>>>>>>>>>>
>>>>>>>>>>> Greetings,
>>>>>>>>>>> Stephan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <
>>>>>>> [hidden email]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Thank you for the quick reply.
>>>>>>>>>>>
>>>>>>>>>>> The log you've send is from the webclient. Can you also send the
>>>>>>> log of
>>>>>>>>> the
>>>>>>>>>>>
>>>>>>>>>>> JobManager?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
>>>>>>> [hidden email]>
>>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Yes. It seams it is not a problem with the arguments. I tried
>>> two
>>>>>>> days
>>>>>>>>>>>
>>>>>>>>>>> but
>>>>>>>>>>>
>>>>>>>>>>>> different error occurs. It seams the web client can’t connect to
>>>>>>> the
>>>>>>>>> job
>>>>>>>>>>>
>>>>>>>>>>>> manager although it is running
>>>>>>>>>>>
>>>>>>>>>>>> Right now, I can’t even get the webclient to run.
>>>>>>>>>>>
>>>>>>>>>>> ./bin/start-webclient.sh
>>>>>>>>>>>
>>>>>>>>>>>> executes fine but I cannot connect to localhost:8080 (even with
>>>>>>> telnet
>>>>>>>>> or
>>>>>>>>>>>
>>>>>>>>>>>> curl)
>>>>>>>>>>>
>>>>>>>>>>>> Here is the log for jobManager
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> 23:22:31,933 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>
>>>>>>>>>>>> - Setting up web frontend server, using web-root directory
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 'jar:
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
>>>>>>>>>>> '.
>>>>>>>>>>>
>>>>>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>
>>>>>>>>>>>> - Web frontend server will store temporary files in
>>>>>>>>>>>
>>>>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded
>>> jobs
>>>>>>> in
>>>>>>>>>>>
>>>>>>>>>>>>
>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
>>>>>>>>>>>
>>>>>>>>>>>> plan-json-dumps in
>>>>>>>>>>>
>>>>>>>>>>>>
>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
>>>>>>>>>>>
>>>>>>>>>>>> 23:22:31,934 INFO org.apache.flink.client.web.WebInterfaceServer
>>>>>>>>>>>
>>>>>>>>>>>> - Web-frontend will submit jobs to nephele job-manager on
>>>>>>>>>>>
>>>>>>>>>>> localhost,
>>>>>>>>>>>
>>>>>>>>>>>> port 6123.
>>>>>>>>>>>
>>>>>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
>>>>>>>>>>>
>>>>>>>>>>>> - Slf4jLogger started
>>>>>>>>>>>
>>>>>>>>>>>> 23:22:32,625 INFO Remoting
>>>>>>>>>>>
>>>>>>>>>>>> - Starting remoting
>>>>>>>>>>>
>>>>>>>>>>>> 23:22:32,838 INFO Remoting
>>>>>>>>>>>
>>>>>>>>>>>> - Remoting started; listening on addresses :[akka.tcp://
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
>>>>>>>>>>>
>>>>>>>>>>>> 23:23:48,119 WARN Remoting
>>>>>>>>>>>
>>>>>>>>>>>> - Tried to associate with unreachable remote address
>>> [akka.tcp://
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms,
>>> all
>>>>>>>>>>>
>>>>>>>>>>> messages
>>>>>>>>>>>
>>>>>>>>>>>> to this address will be delivered to dead letters. Reason:
>>>>>>> Operation
>>>>>>>>>>>
>>>>>>>>>>> timed
>>>>>>>>>>>
>>>>>>>>>>>> out: /10.218.98.169:6123
>>>>>>>>>>>
>>>>>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
>>>>>>>>>>>
>>>>>>>>>>>> - Unexpected exception: Could not find job manager at specified
>>>>>>>>>>>
>>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>>>>>>>>>>>
>>>>>>>>>>>> java.lang.RuntimeException: Could not find job manager at
>>> specified
>>>>>>>>>>>
>>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
>>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
>>>>>>>>>>>
>>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
>>>>>>>>>>>
>>>>>>>>>>>> at
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
>>>>>>>>>>>
>>>>>>>>>>>> at org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <
>>> [hidden email]
>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>>>> you said in the other email thread that the error only occurs
>>> for
>>>>>>>>>>>
>>>>>>>>>>>>> Wordcount, not for Kmeans.
>>>>>>>>>>>
>>>>>>>>>>>>> Can you copy me the commands for both examples?
>>>>>>>>>>>
>>>>>>>>>>>>> I can not really believe that there is a difference between the
>>>>>>> two
>>>>>>>>>>>
>>>>>>>>>>> jobs.
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Can you also send us the contents of the jobmanager log file?
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>>>> Robert
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
>>>>>>>>> [hidden email]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> I’m getting "Could not build up connection to JobManager.”
>>> When i
>>>>>>>>>>>
>>>>>>>>>>> tried
>>>>>>>>>>>
>>>>>>>>>>>> to
>>>>>>>>>>>
>>>>>>>>>>>>>> run the wordCount example. Can anyone help?
>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>>> Dulaj
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Could not build up connection to JobManager

Stephan Ewen
Hi Dulaj!

Thanks for helping to debug.

My guess is that you are seeing now the same thing between JobManager and
TaskManager as you saw before between JobManager and JobClient. I have a
patch pending that should help the issue (see
https://issues.apache.org/jira/browse/FLINK-1608), let's see if that solves
it.

What seems not right is that the JobManager initially accepted the
TaskManager and later the communication. Can you paste the TaskManager log
as well?

Also: There must be something fairly unique about your network
configuration, as it works on all other setups that we use (locally, cloud,
test servers, YARN, ...). Can you paste your ipconfig / ifconfig by any
chance?

Greetings,
Stephan



On Thu, Feb 26, 2015 at 4:33 PM, Dulaj Viduranga <[hidden email]>
wrote:

> Hi,
>         It’s great to help out. :)
>
>         Setting 127.0.0.1 instead of “localhost” in
> jobmanager.rpc.address, helped to build the connection to the jobmanager.
> Apparently localhost resolving is different in webclient and the
> jobmanager. I think it’s good to set "jobmanager.rpc.address: 127.0.0.1" in
> future builds.
>         But then I get this error when I tried to run examples. I don’t
> know if I should move this issue to another thread. If so please tell me.
>
> bin/flink run
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
> $FLINK_DIRECTORY/count
>
>
> 20:46:21,998 WARN  org.apache.hadoop.util.NativeCodeLoader
>        - Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 02/26/2015 20:46:23     Job execution switched to status RUNNING.
> 02/26/2015 20:46:23     CHAIN DataSource (at
> getTextDataSet(WordCount.java:141)
> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1)
> switched to SCHEDULED
> 02/26/2015 20:46:23     CHAIN DataSource (at
> getTextDataSet(WordCount.java:141)
> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1)
> switched to DEPLOYING
> 02/26/2015 20:48:03     CHAIN DataSource (at
> getTextDataSet(WordCount.java:141)
> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at
> main(WordCount.java:69)) -> Combine(SUM(1), at main(WordCount.java:72)(1/1)
> switched to FAILED
> akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>         at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>         at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
>         at java.lang.Thread.run(Thread.java:745)
>
> 02/26/2015 20:48:03     Job execution switched to status FAILING.
> 02/26/2015 20:48:03     Reduce (SUM(1), at main(WordCount.java:72)(1/1)
> switched to CANCELED
> 02/26/2015 20:48:03     DataSink(CsvOutputFormat (path:
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/count,
> delimiter:  ))(1/1) switched to CANCELED
> 02/26/2015 20:48:03     Job execution switched to status FAILED.
> org.apache.flink.client.program.ProgramInvocationException: The program
> execution failed.
>         at org.apache.flink.client.program.Client.run(Client.java:344)
>         at org.apache.flink.client.program.Client.run(Client.java:306)
>         at org.apache.flink.client.program.Client.run(Client.java:300)
>         at
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
>         at
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:483)
>         at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
>         at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
>         at org.apache.flink.client.program.Client.run(Client.java:250)
>         at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
>         at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
> execution failed.
>         at
> org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:284)
>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>         at
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>         at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
>         at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
>         at
> scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>         at
> org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
>         at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
>         at
> org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:88)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:221)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: akka.pattern.AskTimeoutException: Ask timed out on
> [Actor[akka://flink/user/taskmanager#-1628133761]] after [100000 ms]
>         at
> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
>         at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
>         at
> scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
>         at
> akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
>         at
> akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
>         at java.lang.Thread.run(Thread.java:745)
>
> The exception above occurred while trying to run your command.
>
>
> > On Feb 26, 2015, at 12:46 AM, Stephan Ewen <[hidden email]> wrote:
> >
> > Addition: To check whether a port is reachable, I think the easiest thing
> > is to try and connect with a telnet client and see if the connection is
> > refused.
> >
> > On Wed, Feb 25, 2015 at 8:15 PM, Stephan Ewen <[hidden email]> wrote:
> >
> >> Okay, the problem seems to be that even though both the client and the
> >> jobmanager use "localhost" as the host name, they resolve this to
> different
> >> IP addresses: In one case 127.0.0.1 in the other case 10.216.177.146
> >>
> >> Also, the 127.0.0.1 address cannot communicate to 10.216.177.146
> >> apparently.
> >>
> >> Can you help us debug this by checking the following:
> >>
> >> - Can you try and set "jobmanager.rpc.address" to 127.0.0.1 and see if
> >> that solves it?
> >> - Can you try and set "jobmanager.rpc.address" to the other address
> (10.216.177.146
> >> or so) and see if that solves it?
> >> - Can you do "start-cluster.sh", rather than "start-local.sh" and see
> >> whether the webfrontend displays that the TaskManager connects?
> >> - As a hard core test: Can you bring up the jobmanager, check where it
> >> connects (10.216.192.98:6123 or so) and see whether the port is
> reachable?
> >>
> >> We have recently updated how the Akka URLs are build, to work around a
> >> limitation in Akka. Seems that did not yet fully solve the issue.
> >>
> >> Thanks for helping us debug this, it is not the easiest immigration
> >> experience, but the outcome is probably extremely valuable for the
> project
> >> :-)
> >>
> >> Greetings,
> >> Stephan
> >>
> >>
> >> On Wed, Feb 25, 2015 at 4:03 PM, Dulaj Viduranga <[hidden email]>
> >> wrote:
> >>
> >>> Hi,
> >>> Sorry for the delay to reply on this issue.
> >>> the jobmanager.rpc.address is set to “localhost” already in conf.yaml.
> >>> This can’t be an issue because the job manager web interface works fine
> >>> which also runs on localhost
> >>>
> >>> bin/flink run <jar> doesn’t seem to work either. Let me send you my
> >>> command and the result in terminal.
> >>>
> >>> bin/flink run
> >>>
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar
> >>>
> /Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/hamlet.txt
> >>> $FLINK_DIRECTORY/count
> >>>
> >>> 20:32:16,442 WARN  org.apache.hadoop.util.NativeCodeLoader
> >>>       - Unable to load native-hadoop library for your platform... using
> >>> builtin-java classes where applicable
> >>> org.apache.flink.client.program.ProgramInvocationException: Could not
> >>> build up connection to JobManager.
> >>>        at org.apache.flink.client.program.Client.run(Client.java:327)
> >>>        at org.apache.flink.client.program.Client.run(Client.java:306)
> >>>        at org.apache.flink.client.program.Client.run(Client.java:300)
> >>>        at
> >>>
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
> >>>        at
> >>>
> org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:82)
> >>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>>        at
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> >>>        at
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >>>        at java.lang.reflect.Method.invoke(Method.java:483)
> >>>        at
> >>>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
> >>>        at
> >>>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
> >>>        at org.apache.flink.client.program.Client.run(Client.java:250)
> >>>        at
> >>>
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:371)
> >>>        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:344)
> >>>        at
> >>>
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1087)
> >>>        at
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
> >>> Caused by: java.io.IOException: JobManager at akka.tcp://
> >>> flink@10.216.177.146:6123/user/jobmanager not reachable. Please make
> >>> sure that the JobManager is running and its port is reachable.
> >>>        at
> >>>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:897)
> >>>        at
> >>>
> org.apache.flink.runtime.client.JobClient$.createJobClient(JobClient.scala:151)
> >>>        at
> >>>
> org.apache.flink.runtime.client.JobClient$.createJobClientFromConfig(JobClient.scala:142)
> >>>        at
> >>>
> org.apache.flink.runtime.client.JobClient$.startActorSystemAndActor(JobClient.scala:125)
> >>>        at
> >>>
> org.apache.flink.runtime.client.JobClient.startActorSystemAndActor(JobClient.scala)
> >>>        at org.apache.flink.client.program.Client.run(Client.java:322)
> >>>        ... 15 more
> >>> Caused by: java.util.concurrent.TimeoutException: Futures timed out
> after
> >>> [10000 milliseconds]
> >>>        at
> >>> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> >>>        at
> >>> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> >>>        at
> >>> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
> >>>        at
> >>>
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> >>>        at scala.concurrent.Await$.result(package.scala:107)
> >>>        at
> >>>
> org.apache.flink.runtime.jobmanager.JobManager$.getJobManagerRemoteReference(JobManager.scala:893)
> >>>        ... 20 more
> >>>
> >>> The exception above occurred while trying to run your command.
> >>>
> >>>
> >>>> On Feb 25, 2015, at 1:29 AM, Stephan Ewen <[hidden email]> wrote:
> >>>>
> >>>> BTW: Does still work if you enter "localhost" for
> >>> "jobmanager.rpc.address"
> >>>> in your flink-conf.yaml ?
> >>>>
> >>>> On Tue, Feb 24, 2015 at 7:50 PM, Stephan Ewen <[hidden email]>
> wrote:
> >>>>
> >>>>> Hi!
> >>>>>
> >>>>> I think that this is a problem in the current master (probably in
> there
> >>>>> since a few days ago). I am fixing it...
> >>>>>
> >>>>> Thanks for reporting it!
> >>>>>
> >>>>> Stephan
> >>>>>
> >>>>>
> >>>>> On Tue, Feb 24, 2015 at 6:52 PM, Stephan Ewen <[hidden email]>
> >>> wrote:
> >>>>>
> >>>>>> Hi Dulaj!
> >>>>>>
> >>>>>> The log suggests that the JobManager binds itself to the IP
> >>>>>> address 10.216.192.98 and the WebClient runs at 127.0.0.1
> >>>>>>
> >>>>>> The 127.0.0.1 actor system cannot connect to the 10.216.192.98.
> >>>>>>
> >>>>>> Let me verify whether this is a quirk of your particular setup, or a
> >>> bug
> >>>>>> recently introduces in the 0.9-SNAPSHOT.
> >>>>>>
> >>>>>> Does the command line work for you? ("bin/flink run <jar>")
> >>>>>>
> >>>>>> taskmanager.numberOfTaskSlots: -1  is also okay, this will mean that
> >>> the
> >>>>>> default of '1' is used.
> >>>>>>
> >>>>>> Greetings,
> >>>>>> Stephan
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Feb 24, 2015 at 5:18 PM, Dulaj Viduranga <
> >>> [hidden email]>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Is taskmanager.numberOfTaskSlots: -1 normal?
> >>>>>>>
> >>>>>>>> On Feb 24, 2015, at 9:44 PM, Robert Metzger <[hidden email]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>> I could not find the logfiles attached to your mails. I think the
> >>>>>>>> mailinglists are not accepting attachments.
> >>>>>>>> Can you put the logs on gist.github.com?
> >>>>>>>>
> >>>>>>>> The configuration values are documented here:
> >>>>>>>> http://flink.apache.org/docs/0.8/config.html
> >>>>>>>> For the webclient's port its called webclient.port
> >>>>>>>>
> >>>>>>>> On Tue, Feb 24, 2015 at 5:04 PM, Dulaj Viduranga <
> >>> [hidden email]
> >>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I tried to kill the job manager manually in the terminal and
> start
> >>> it
> >>>>>>>>> again but no luck. Also could you tell me if it’s possible to
> >>> change
> >>>>>>>>> webclient’s port (8080) ?
> >>>>>>>>>
> >>>>>>>>>> On Feb 24, 2015, at 1:41 PM, Stephan Ewen <[hidden email]>
> >>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hey Dulaj!
> >>>>>>>>>>
> >>>>>>>>>> As a contributor, I would go against the latest version, which
> is
> >>>>>>>>>> 0.9-SNAPSHOT.
> >>>>>>>>>>
> >>>>>>>>>> It may be in your case that the JobManager actor is down, but
> the
> >>>>>>> process
> >>>>>>>>>> still lingers. (BTW: I have a patch pending that makes sure the
> >>>>>>> process
> >>>>>>>>>> disappears when the actor via down).
> >>>>>>>>>>
> >>>>>>>>>> Could you have a look at the log
> >>>>>>> "flink-<user>-jobmanager-<host>-.log"
> >>>>>>>>> and
> >>>>>>>>>> see if there are any errors logged?
> >>>>>>>>>>
> >>>>>>>>>> Greetings,
> >>>>>>>>>> Stephan
> >>>>>>>>>> Am 24.02.2015 06:29 schrieb "Dulaj Viduranga" <
> >>> [hidden email]
> >>>>>>>> :
> >>>>>>>>>>
> >>>>>>>>>>> The JobManager seems to run fine. I don't know. When I tried to
> >>> run
> >>>>>>>>>>> start-local.sh again, It shows the PID of the running
> JobManager
> >>> and
> >>>>>>>>> also
> >>>>>>>>>>> :8081 runs fine. I want to contribute to the project and I
> could
> >>>>>>> get a
> >>>>>>>>>>> little boost if I could see the capabilities of FLINK. :)
> >>>>>>>>>>> Will it be OK to use 0.8.1 as a developer?
> >>>>>>>>>>>
> >>>>>>>>>>> On Feb 24, 2015, at 04:15 AM, Stephan Ewen <[hidden email]>
> >>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Dulaj,
> >>>>>>>>>>>
> >>>>>>>>>>> That error message indicates that the JobManager is not
> running.
> >>>>>>> Are you
> >>>>>>>>>>> sure that the JobManager runs properly? Anything in the
> >>> JobManager
> >>>>>>> logs?
> >>>>>>>>>>>
> >>>>>>>>>>> BTW: The 0.9 branch is under heavy development / changes. That
> is
> >>>>>>> why it
> >>>>>>>>>>> may behave a bit different on different days right now. I would
> >>>>>>>>> recommend
> >>>>>>>>>>> to use the 0.8.1 release for a stable experience.
> >>>>>>>>>>>
> >>>>>>>>>>> Greetings,
> >>>>>>>>>>> Stephan
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Feb 23, 2015 at 7:39 PM, Robert Metzger <
> >>>>>>> [hidden email]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Thank you for the quick reply.
> >>>>>>>>>>>
> >>>>>>>>>>> The log you've send is from the webclient. Can you also send
> the
> >>>>>>> log of
> >>>>>>>>> the
> >>>>>>>>>>>
> >>>>>>>>>>> JobManager?
> >>>>>>>>>>>
> >>>>>>>>>>> On Mon, Feb 23, 2015 at 7:28 PM, Dulaj Viduranga <
> >>>>>>> [hidden email]>
> >>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Yes. It seams it is not a problem with the arguments. I tried
> >>> two
> >>>>>>> days
> >>>>>>>>>>>
> >>>>>>>>>>> but
> >>>>>>>>>>>
> >>>>>>>>>>>> different error occurs. It seams the web client can’t connect
> to
> >>>>>>> the
> >>>>>>>>> job
> >>>>>>>>>>>
> >>>>>>>>>>>> manager although it is running
> >>>>>>>>>>>
> >>>>>>>>>>>> Right now, I can’t even get the webclient to run.
> >>>>>>>>>>>
> >>>>>>>>>>> ./bin/start-webclient.sh
> >>>>>>>>>>>
> >>>>>>>>>>>> executes fine but I cannot connect to localhost:8080 (even
> with
> >>>>>>> telnet
> >>>>>>>>> or
> >>>>>>>>>>>
> >>>>>>>>>>>> curl)
> >>>>>>>>>>>
> >>>>>>>>>>>> Here is the log for jobManager
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> 23:22:31,933 INFO
> org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>>>>
> >>>>>>>>>>>> - Setting up web frontend server, using web-root directory
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> 'jar:
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>
> file:/Users/Vidura/Documents/Development/flink/flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/lib/flink-clients-0.9-SNAPSHOT.jar!/web-docs
> >>>>>>>>>>> '.
> >>>>>>>>>>>
> >>>>>>>>>>>> 23:22:31,934 INFO
> org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>>>>
> >>>>>>>>>>>> - Web frontend server will store temporary files in
> >>>>>>>>>>>
> >>>>>>>>>>>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T', uploaded
> >>> jobs
> >>>>>>> in
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-jobs',
> >>>>>>>>>>>
> >>>>>>>>>>>> plan-json-dumps in
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>> '/var/folders/3_/7gzbv7ks7q71lpm5d9hzrw2c0000gn/T/webclient-plans'.
> >>>>>>>>>>>
> >>>>>>>>>>>> 23:22:31,934 INFO
> org.apache.flink.client.web.WebInterfaceServer
> >>>>>>>>>>>
> >>>>>>>>>>>> - Web-frontend will submit jobs to nephele job-manager on
> >>>>>>>>>>>
> >>>>>>>>>>> localhost,
> >>>>>>>>>>>
> >>>>>>>>>>>> port 6123.
> >>>>>>>>>>>
> >>>>>>>>>>>> 23:22:32,580 INFO akka.event.slf4j.Slf4jLogger
> >>>>>>>>>>>
> >>>>>>>>>>>> - Slf4jLogger started
> >>>>>>>>>>>
> >>>>>>>>>>>> 23:22:32,625 INFO Remoting
> >>>>>>>>>>>
> >>>>>>>>>>>> - Starting remoting
> >>>>>>>>>>>
> >>>>>>>>>>>> 23:22:32,838 INFO Remoting
> >>>>>>>>>>>
> >>>>>>>>>>>> - Remoting started; listening on addresses :[akka.tcp://
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> JobsInfoServletActorSystem@127.0.0.1:51517]
> >>>>>>>>>>>
> >>>>>>>>>>>> 23:23:48,119 WARN Remoting
> >>>>>>>>>>>
> >>>>>>>>>>>> - Tried to associate with unreachable remote address
> >>> [akka.tcp://
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> flink@10.218.98.169:6123]. Address is now gated for 5000 ms,
> >>> all
> >>>>>>>>>>>
> >>>>>>>>>>> messages
> >>>>>>>>>>>
> >>>>>>>>>>>> to this address will be delivered to dead letters. Reason:
> >>>>>>> Operation
> >>>>>>>>>>>
> >>>>>>>>>>> timed
> >>>>>>>>>>>
> >>>>>>>>>>>> out: /10.218.98.169:6123
> >>>>>>>>>>>
> >>>>>>>>>>>> 23:23:48,124 ERROR org.apache.flink.client.WebFrontend
> >>>>>>>>>>>
> >>>>>>>>>>>> - Unexpected exception: Could not find job manager at
> specified
> >>>>>>>>>>>
> >>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> >>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
> >>>>>>>>>>>
> >>>>>>>>>>>> java.lang.RuntimeException: Could not find job manager at
> >>> specified
> >>>>>>>>>>>
> >>>>>>>>>>>> address akka.flink@10.218.98.169:6123/user/jobmanager'>tcp://
> >>>>>>>>>>> flink@10.218.98.169:6123/user/jobmanager.
> >>>>>>>>>>>
> >>>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>
> org.apache.flink.client.web.JobsInfoServlet.<init>(JobsInfoServlet.java:82)
> >>>>>>>>>>>
> >>>>>>>>>>>> at
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>
> org.apache.flink.client.web.WebInterfaceServer.<init>(WebInterfaceServer.java:158)
> >>>>>>>>>>>
> >>>>>>>>>>>> at
> org.apache.flink.client.WebFrontend.main(WebFrontend.java:74)
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> On Feb 23, 2015, at 11:46 PM, Robert Metzger <
> >>> [hidden email]
> >>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>>>> you said in the other email thread that the error only occurs
> >>> for
> >>>>>>>>>>>
> >>>>>>>>>>>>> Wordcount, not for Kmeans.
> >>>>>>>>>>>
> >>>>>>>>>>>>> Can you copy me the commands for both examples?
> >>>>>>>>>>>
> >>>>>>>>>>>>> I can not really believe that there is a difference between
> the
> >>>>>>> two
> >>>>>>>>>>>
> >>>>>>>>>>> jobs.
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> Can you also send us the contents of the jobmanager log file?
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>
> >>>>>>>>>>>>> Robert
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> On Mon, Feb 23, 2015 at 6:04 PM, Dulaj Viduranga <
> >>>>>>>>> [hidden email]
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>> I’m getting "Could not build up connection to JobManager.”
> >>> When i
> >>>>>>>>>>>
> >>>>>>>>>>> tried
> >>>>>>>>>>>
> >>>>>>>>>>>> to
> >>>>>>>>>>>
> >>>>>>>>>>>>>> run the wordCount example. Can anyone help?
> >>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>>> Dulaj
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>>
> >>
>
>
123