Hello,
I'm a bit confused about the performance of Flink. My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb reserved for OS). using flink-0.6 in standalone-cluster mode. i played a little bit with the config-settings but without much impact on execution time. flink-conf.yaml: jobmanager.rpc.port: 6123 jobmanager.heap.mb: 1024 taskmanager.heap.mb: 14336 taskmanager.memory.size: -1 taskmanager.numberOfTaskSlots: 4 parallelization.degree.default: 16 taskmanager.network.numberOfBuffers: 4096 fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/ I tried two applications: wordcount and k-Means scala example code wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb. kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb input it needs 33minutes with 2.2gb nearly 90 minutes! the monitoring tool ganglia says, that cpu has low cpu utilization and a lot of waiting time. in wordcount cpu utilizes with nearly 100 percent. Is this a ordinary dimension of execution time in spark? or are optimizations in my config necessary? or maybe a bottleneck in the cluster? i hope somebody could help me :) greets Norman |
Hi Norman,
I saw you were running our Scala Examples. Unfortunately those do not run as well as our Java examples right now. The Scala API was a bit of a prototype that has some issues with efficiency. For now, you could maybe try running our Java examples. For your cluster, good configuration values would be numberOfTaskSlots = 4 (number of CPU cores) and parallelization.degree.default = 32 (number of nodes X number of CPU cores). The Scala API is being rewritten for our next release, so if you really want to check out Scala examples I could point you to my personal branch on github where development of the new Scala API is taking place. Cheers, Aljoscha On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg <[hidden email]> wrote: > Hello, > I'm a bit confused about the performance of Flink. > My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb > reserved for OS). using flink-0.6 in standalone-cluster mode. > i played a little bit with the config-settings but without much impact on > execution time. > flink-conf.yaml: > jobmanager.rpc.port: 6123 > jobmanager.heap.mb: 1024 > taskmanager.heap.mb: 14336 > taskmanager.memory.size: -1 > taskmanager.numberOfTaskSlots: 4 > parallelization.degree.default: 16 > taskmanager.network.numberOfBuffers: 4096 > fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/ > > I tried two applications: wordcount and k-Means scala example code > wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb. > kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb input > it needs 33minutes with 2.2gb nearly 90 minutes! > > the monitoring tool ganglia says, that cpu has low cpu utilization and a lot > of waiting time. in wordcount cpu utilizes with nearly 100 percent. > Is this a ordinary dimension of execution time in spark? or are > optimizations in my config necessary? or maybe a bottleneck in the cluster? > > i hope somebody could help me :) > greets Norman |
There is probably a little typo in Aljoscha's answer. The
taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine) The parallelization.degree.default is correct. On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[hidden email]> wrote: > Hi Norman, > I saw you were running our Scala Examples. Unfortunately those do not > run as well as our Java examples right now. The Scala API was a bit of > a prototype that has some issues with efficiency. For now, you could > maybe try running our Java examples. > > For your cluster, good configuration values would be numberOfTaskSlots > = 4 (number of CPU cores) and parallelization.degree.default = 32 > (number of nodes X number of CPU cores). > > The Scala API is being rewritten for our next release, so if you > really want to check out Scala examples I could point you to my > personal branch on github where development of the new Scala API is > taking place. > > Cheers, > Aljoscha > > On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg > <[hidden email]> wrote: > > Hello, > > I'm a bit confused about the performance of Flink. > > My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb > > reserved for OS). using flink-0.6 in standalone-cluster mode. > > i played a little bit with the config-settings but without much impact on > > execution time. > > flink-conf.yaml: > > jobmanager.rpc.port: 6123 > > jobmanager.heap.mb: 1024 > > taskmanager.heap.mb: 14336 > > taskmanager.memory.size: -1 > > taskmanager.numberOfTaskSlots: 4 > > parallelization.degree.default: 16 > > taskmanager.network.numberOfBuffers: 4096 > > fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/ > > > > I tried two applications: wordcount and k-Means scala example code > > wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb. > > kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb > input > > it needs 33minutes with 2.2gb nearly 90 minutes! > > > > the monitoring tool ganglia says, that cpu has low cpu utilization and a > lot > > of waiting time. in wordcount cpu utilizes with nearly 100 percent. > > Is this a ordinary dimension of execution time in spark? or are > > optimizations in my config necessary? or maybe a bottleneck in the > cluster? > > > > i hope somebody could help me :) > > greets Norman > |
I tried different values for the numberOfTaskSlots (1, 2, 4, 8) and DOP
to optimize flink. @Aljoscha: it would be great to try out the new Scala-API for flink. I wrote already some other apps in scala, so I doesn't have to rewrite them. Am 08.09.2014 16:13, schrieb Robert Metzger: > There is probably a little typo in Aljoscha's answer. The > taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine) > The parallelization.degree.default is correct. > > On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[hidden email]> > wrote: > >> Hi Norman, >> I saw you were running our Scala Examples. Unfortunately those do not >> run as well as our Java examples right now. The Scala API was a bit of >> a prototype that has some issues with efficiency. For now, you could >> maybe try running our Java examples. >> >> For your cluster, good configuration values would be numberOfTaskSlots >> = 4 (number of CPU cores) and parallelization.degree.default = 32 >> (number of nodes X number of CPU cores). >> >> The Scala API is being rewritten for our next release, so if you >> really want to check out Scala examples I could point you to my >> personal branch on github where development of the new Scala API is >> taking place. >> >> Cheers, >> Aljoscha >> >> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg >> <[hidden email]> wrote: >>> Hello, >>> I'm a bit confused about the performance of Flink. >>> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb >>> reserved for OS). using flink-0.6 in standalone-cluster mode. >>> i played a little bit with the config-settings but without much impact on >>> execution time. >>> flink-conf.yaml: >>> jobmanager.rpc.port: 6123 >>> jobmanager.heap.mb: 1024 >>> taskmanager.heap.mb: 14336 >>> taskmanager.memory.size: -1 >>> taskmanager.numberOfTaskSlots: 4 >>> parallelization.degree.default: 16 >>> taskmanager.network.numberOfBuffers: 4096 >>> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/ >>> >>> I tried two applications: wordcount and k-Means scala example code >>> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb. >>> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb >> input >>> it needs 33minutes with 2.2gb nearly 90 minutes! >>> >>> the monitoring tool ganglia says, that cpu has low cpu utilization and a >> lot >>> of waiting time. in wordcount cpu utilizes with nearly 100 percent. >>> Is this a ordinary dimension of execution time in spark? or are >>> optimizations in my config necessary? or maybe a bottleneck in the >> cluster? >>> i hope somebody could help me :) >>> greets Norman |
Ok.
My work is available here: https://github.com/aljoscha/incubator-flink/tree/scala-rework Please look at the WordCount and KMeans example to see how the API has changed but basically only the way you create Data Sources is changed. I'm looking forward to your feedback. :D On Mon, Sep 8, 2014 at 4:22 PM, Norman Spangenberg <[hidden email]> wrote: > I tried different values for the numberOfTaskSlots (1, 2, 4, 8) and DOP to > optimize flink. > @Aljoscha: it would be great to try out the new Scala-API for flink. I wrote > already some other apps in scala, so I doesn't have to rewrite them. > > Am 08.09.2014 16:13, schrieb Robert Metzger: > >> There is probably a little typo in Aljoscha's answer. The >> taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine) >> The parallelization.degree.default is correct. >> >> On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[hidden email]> >> wrote: >> >>> Hi Norman, >>> I saw you were running our Scala Examples. Unfortunately those do not >>> run as well as our Java examples right now. The Scala API was a bit of >>> a prototype that has some issues with efficiency. For now, you could >>> maybe try running our Java examples. >>> >>> For your cluster, good configuration values would be numberOfTaskSlots >>> = 4 (number of CPU cores) and parallelization.degree.default = 32 >>> (number of nodes X number of CPU cores). >>> >>> The Scala API is being rewritten for our next release, so if you >>> really want to check out Scala examples I could point you to my >>> personal branch on github where development of the new Scala API is >>> taking place. >>> >>> Cheers, >>> Aljoscha >>> >>> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg >>> <[hidden email]> wrote: >>>> >>>> Hello, >>>> I'm a bit confused about the performance of Flink. >>>> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 >>>> gb >>>> reserved for OS). using flink-0.6 in standalone-cluster mode. >>>> i played a little bit with the config-settings but without much impact >>>> on >>>> execution time. >>>> flink-conf.yaml: >>>> jobmanager.rpc.port: 6123 >>>> jobmanager.heap.mb: 1024 >>>> taskmanager.heap.mb: 14336 >>>> taskmanager.memory.size: -1 >>>> taskmanager.numberOfTaskSlots: 4 >>>> parallelization.degree.default: 16 >>>> taskmanager.network.numberOfBuffers: 4096 >>>> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/ >>>> >>>> I tried two applications: wordcount and k-Means scala example code >>>> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb. >>>> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb >>> >>> input >>>> >>>> it needs 33minutes with 2.2gb nearly 90 minutes! >>>> >>>> the monitoring tool ganglia says, that cpu has low cpu utilization and a >>> >>> lot >>>> >>>> of waiting time. in wordcount cpu utilizes with nearly 100 percent. >>>> Is this a ordinary dimension of execution time in spark? or are >>>> optimizations in my config necessary? or maybe a bottleneck in the >>> >>> cluster? >>>> >>>> i hope somebody could help me :) >>>> greets Norman > > |
Just to make sure that there is no confusion: with Aljoscha's refactoring
the Scala API will be a thin layer ontop of the Java API and should have comparable/same performance as the Java API (one difference is that Scala tuples are immutable whereas Java tuples are mutable and instances can be reused). I don't want someone in the future reading this thread to think that the Scala API incurs a large performance hit. ;) On Mon, Sep 8, 2014 at 5:13 PM, Aljoscha Krettek <[hidden email]> wrote: > Ok. > > My work is available here: > https://github.com/aljoscha/incubator-flink/tree/scala-rework > > Please look at the WordCount and KMeans example to see how the API has > changed but basically only the way you create Data Sources is changed. > > I'm looking forward to your feedback. :D > > On Mon, Sep 8, 2014 at 4:22 PM, Norman Spangenberg > <[hidden email]> wrote: > > I tried different values for the numberOfTaskSlots (1, 2, 4, 8) and DOP > to > > optimize flink. > > @Aljoscha: it would be great to try out the new Scala-API for flink. I > wrote > > already some other apps in scala, so I doesn't have to rewrite them. > > > > Am 08.09.2014 16:13, schrieb Robert Metzger: > > > >> There is probably a little typo in Aljoscha's answer. The > >> taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per > machine) > >> The parallelization.degree.default is correct. > >> > >> On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[hidden email]> > >> wrote: > >> > >>> Hi Norman, > >>> I saw you were running our Scala Examples. Unfortunately those do not > >>> run as well as our Java examples right now. The Scala API was a bit of > >>> a prototype that has some issues with efficiency. For now, you could > >>> maybe try running our Java examples. > >>> > >>> For your cluster, good configuration values would be numberOfTaskSlots > >>> = 4 (number of CPU cores) and parallelization.degree.default = 32 > >>> (number of nodes X number of CPU cores). > >>> > >>> The Scala API is being rewritten for our next release, so if you > >>> really want to check out Scala examples I could point you to my > >>> personal branch on github where development of the new Scala API is > >>> taking place. > >>> > >>> Cheers, > >>> Aljoscha > >>> > >>> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg > >>> <[hidden email]> wrote: > >>>> > >>>> Hello, > >>>> I'm a bit confused about the performance of Flink. > >>>> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 > >>>> gb > >>>> reserved for OS). using flink-0.6 in standalone-cluster mode. > >>>> i played a little bit with the config-settings but without much impact > >>>> on > >>>> execution time. > >>>> flink-conf.yaml: > >>>> jobmanager.rpc.port: 6123 > >>>> jobmanager.heap.mb: 1024 > >>>> taskmanager.heap.mb: 14336 > >>>> taskmanager.memory.size: -1 > >>>> taskmanager.numberOfTaskSlots: 4 > >>>> parallelization.degree.default: 16 > >>>> taskmanager.network.numberOfBuffers: 4096 > >>>> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/ > >>>> > >>>> I tried two applications: wordcount and k-Means scala example code > >>>> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb. > >>>> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb > >>> > >>> input > >>>> > >>>> it needs 33minutes with 2.2gb nearly 90 minutes! > >>>> > >>>> the monitoring tool ganglia says, that cpu has low cpu utilization > and a > >>> > >>> lot > >>>> > >>>> of waiting time. in wordcount cpu utilizes with nearly 100 percent. > >>>> Is this a ordinary dimension of execution time in spark? or are > >>>> optimizations in my config necessary? or maybe a bottleneck in the > >>> > >>> cluster? > >>>> > >>>> i hope somebody could help me :) > >>>> greets Norman > > > > > |
In reply to this post by Aljoscha Krettek-2
Thank you.
But I have already trouble with the shipped example-programs. I can't use files in hdfs. fs.hdfs.hadoopconf option is set. the same config works in 0.6 and example-programs run with hdfs-files too. |
Hi,
can you post the exact error message why HDFS is not working? On Tue, Sep 9, 2014 at 3:50 PM, normanSp <[hidden email]> wrote: > Thank you. > But I have already trouble with the shipped example-programs. I can't use > files in hdfs. fs.hdfs.hadoopconf option is set. the same config works in > 0.6 and example-programs run with hdfs-files too. > > > > -- > View this message in context: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1744.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > archive at Nabble.com. > |
of course, sorry.
command: ./bin/flink run ./examples/flink-java-examples-0.7-incubating-SNAPSHOT-WordCount.jar hdfs:/input.xml hdfs:/result org.apache.flink.client.program.ProgramInvocationException: The program execution failed: java.io.IOException: The given HDFS file URI (hdfs:/input.xml) did not describe the HDFS Namenode. The attempt to use a default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or 'fs.hdfs.hdfssite' config parameter failed due to the following problem: Either no default hdfs configuration was registered, or the provided configuration contains no valid hdfs namenode address (fs.default.name or fs.defaultFS) describing the hdfs namenode host and port. at org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:279) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52) at org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101) at org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495) at org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281) at org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177) at org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422) at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958) at org.apache.flink.client.program.Client.run(Client.java:325) at org.apache.flink.client.program.Client.run(Client.java:291) at org.apache.flink.client.program.Client.run(Client.java:285) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54) at org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307) at org.apache.flink.client.program.Client.run(Client.java:244) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025) |
Try to specify the HDFS path as follows:
hdfs://namenode-host:namenode-port/foo/bar/ Best, Fabian 2014-09-09 16:03 GMT+02:00 normanSp <[hidden email]>: > of course, sorry. > > command: ./bin/flink run > ./examples/flink-java-examples-0.7-incubating-SNAPSHOT-WordCount.jar > hdfs:/input.xml hdfs:/result > > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: java.io.IOException: The given HDFS file URI > (hdfs:/input.xml) did not describe the HDFS Namenode. The attempt to use a > default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or > 'fs.hdfs.hdfssite' config parameter failed due to the following problem: > Either no default hdfs configuration was registered, or the provided > configuration contains no valid hdfs namenode address (fs.default.name or > fs.defaultFS) describing the hdfs namenode host and port. > at > > org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:279) > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248) > at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299) > at > > org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391) > at > > org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52) > at > > org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101) > at > > org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495) > at > > org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281) > at > > org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177) > at > > org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422) > at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958) > > at org.apache.flink.client.program.Client.run(Client.java:325) > at org.apache.flink.client.program.Client.run(Client.java:291) > at org.apache.flink.client.program.Client.run(Client.java:285) > at > > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54) > at > org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389) > at > > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307) > at org.apache.flink.client.program.Client.run(Client.java:244) > at > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025) > > > > -- > View this message in context: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1746.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > archive at Nabble.com. > |
In reply to this post by normanSp
Hey Norman,
I'm not sure but have you tried it with 3 backslashes as in hdfs:///input.xml. Or you can specify the address of the namenode via the config key fs.default.name or fs.defaultFS, i.e.: <property><name>fs.default.name </name><value>hdfs://...:port</value></property> On Tue, Sep 9, 2014 at 4:03 PM, normanSp <[hidden email]> wrote: > (fs.default.name or > fs.defaultFS > |
In reply to this post by Fabian Hueske
I guess in the core-site.xml, the property "fs.defaultFS" has not been set
to the namenode. You basically put the default address there, like: "hdfs://master:8037/". On Tue, Sep 9, 2014 at 4:08 PM, Fabian Hueske <[hidden email]> wrote: > Try to specify the HDFS path as follows: > > hdfs://namenode-host:namenode-port/foo/bar/ > > Best, Fabian > > 2014-09-09 16:03 GMT+02:00 normanSp <[hidden email]>: > > > of course, sorry. > > > > command: ./bin/flink run > > ./examples/flink-java-examples-0.7-incubating-SNAPSHOT-WordCount.jar > > hdfs:/input.xml hdfs:/result > > > > org.apache.flink.client.program.ProgramInvocationException: The program > > execution failed: java.io.IOException: The given HDFS file URI > > (hdfs:/input.xml) did not describe the HDFS Namenode. The attempt to use > a > > default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or > > 'fs.hdfs.hdfssite' config parameter failed due to the following problem: > > Either no default hdfs configuration was registered, or the provided > > configuration contains no valid hdfs namenode address (fs.default.name > or > > fs.defaultFS) describing the hdfs namenode host and port. > > at > > > > > org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:279) > > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248) > > at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299) > > at > > > > > org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391) > > at > > > > > org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52) > > at > > > > > org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101) > > at > > > > > org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495) > > at > > > > > org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281) > > at > > > > > org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177) > > at > > > > > org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422) > > at > org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958) > > > > at org.apache.flink.client.program.Client.run(Client.java:325) > > at org.apache.flink.client.program.Client.run(Client.java:291) > > at org.apache.flink.client.program.Client.run(Client.java:285) > > at > > > > > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54) > > at > > org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > > > > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389) > > at > > > > > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307) > > at org.apache.flink.client.program.Client.run(Client.java:244) > > at > > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347) > > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334) > > at > > > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001) > > at > org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025) > > > > > > > > -- > > View this message in context: > > > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1746.html > > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > > archive at Nabble.com. > > > |
In reply to this post by Fabian Hueske
thanks, but i tried all that versions before I posted here. and exactly the same conf-file works in 0.6.
@fabian with full hostname and port I get this error: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: java.io.IOException: The given file URI (hdfs://MASTER:9000/input.xml) described the host and port of an HDFS Namenode, but the File System could not be initialized with that address: Server IPC version 9 cannot communicate with client version 4 at org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:305) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248) at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391) at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52) at org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101) at org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495) at org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281) at org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177) at org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422) at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958) Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 at org.apache.hadoop.ipc.Client.call(Client.java:1113) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422) at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100) at org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302) ... 15 more at org.apache.flink.client.program.Client.run(Client.java:325) at org.apache.flink.client.program.Client.run(Client.java:291) at org.apache.flink.client.program.Client.run(Client.java:285) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54) at org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307) at org.apache.flink.client.program.Client.run(Client.java:244) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025) |
This error message basically says that Flink's HDFS-client and the running
HDFS are not compatible. We have different builds for Hadoop 1.0 and Hadoop 2.0. Please check the version of the running HDFS and choose the corresponding build. Best, Fabian 2014-09-09 16:23 GMT+02:00 normanSp <[hidden email]>: > thanks, but i tried all that versions before I posted here. and exactly the > same conf-file works in 0.6. > @fabian > with full hostname and port I get this error: > > org.apache.flink.client.program.ProgramInvocationException: The program > execution failed: java.io.IOException: The given file URI > (hdfs://MASTER:9000/input.xml) described the host and port of an HDFS > Namenode, but the File System could not be initialized with that address: > Server IPC version 9 cannot communicate with client version 4 > at > > org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:305) > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248) > at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299) > at > > org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391) > at > > org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52) > at > > org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101) > at > > org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495) > at > > org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281) > at > > org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177) > at > > org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422) > at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958) > Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 > cannot communicate with client version 4 > at org.apache.hadoop.ipc.Client.call(Client.java:1113) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) > at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) > at > > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) > at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) > at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422) > at > org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245) > at > > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100) > at > > org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302) > ... 15 more > > at org.apache.flink.client.program.Client.run(Client.java:325) > at org.apache.flink.client.program.Client.run(Client.java:291) > at org.apache.flink.client.program.Client.run(Client.java:285) > at > > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54) > at > org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389) > at > > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307) > at org.apache.flink.client.program.Client.run(Client.java:244) > at > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334) > at > org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025) > > > > -- > View this message in context: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1750.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > archive at Nabble.com. > |
This error message basically says that Flink's HDFS-client and the running
HDFS are not compatible. We have different builds for Hadoop 1.0 and Hadoop 2.0. Please check the version of the running HDFS and choose the corresponding build. Best, Fabian 2014-09-09 16:32 GMT+02:00 Fabian Hueske <[hidden email]>: > This error message basically says that Flink's HDFS-client and the running > HDFS are not compatible. > > We have different builds for Hadoop 1.0 and Hadoop 2.0. > Please check the version of the running HDFS and choose the corresponding > build. > > Best, Fabian > > 2014-09-09 16:23 GMT+02:00 normanSp <[hidden email]>: > >> thanks, but i tried all that versions before I posted here. and exactly >> the >> same conf-file works in 0.6. >> @fabian >> with full hostname and port I get this error: >> >> org.apache.flink.client.program.ProgramInvocationException: The program >> execution failed: java.io.IOException: The given file URI >> (hdfs://MASTER:9000/input.xml) described the host and port of an HDFS >> Namenode, but the File System could not be initialized with that address: >> Server IPC version 9 cannot communicate with client version 4 >> at >> >> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:305) >> at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248) >> at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299) >> at >> >> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391) >> at >> >> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52) >> at >> >> org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101) >> at >> >> org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495) >> at >> >> org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281) >> at >> >> org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177) >> at >> >> org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422) >> at >> org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958) >> Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 >> cannot communicate with client version 4 >> at org.apache.hadoop.ipc.Client.call(Client.java:1113) >> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229) >> at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> >> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85) >> at >> >> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62) >> at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source) >> at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422) >> at >> org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183) >> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281) >> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245) >> at >> >> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100) >> at >> >> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302) >> ... 15 more >> >> at org.apache.flink.client.program.Client.run(Client.java:325) >> at org.apache.flink.client.program.Client.run(Client.java:291) >> at org.apache.flink.client.program.Client.run(Client.java:285) >> at >> >> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54) >> at >> org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at >> >> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389) >> at >> >> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307) >> at org.apache.flink.client.program.Client.run(Client.java:244) >> at >> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347) >> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334) >> at >> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001) >> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025) >> >> >> >> -- >> View this message in context: >> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1750.html >> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list >> archive at Nabble.com. >> > > |
okay, i thought that is only in yarn-mode necessary.
i did it but the next error follows. java.io.IOException: Error opening the Input Split hdfs://MASTER:9000/input.xml [46573551616,134217728]: org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol; at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:598) at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:446) at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:49) at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:140) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265) at java.lang.Thread.run(Thread.java:744) Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol; at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:135) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:280) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100) at org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302) at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248) at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:737) |
This still looks like the versions aren't in sync.
What's the exact version of your HDFS (incl. distribution: Hadoop, CDH, etc.) and which Stratosphere build are you using? 2014-09-09 16:57 GMT+02:00 normanSp <[hidden email]>: > okay, i thought that is only in yarn-mode necessary. > i did it but the next error follows. > > java.io.IOException: Error opening the Input Split > hdfs://MASTER:9000/input.xml [46573551616,134217728]: > > org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol; > at > > org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:598) > at > > org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:446) > at > > org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:49) > at > > org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:140) > at > > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265) > at java.lang.Thread.run(Thread.java:744) > Caused by: java.lang.NoSuchMethodError: > > org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol; > at > org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:135) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:280) > at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245) > at > > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100) > at > > org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302) > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248) > at > > org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:737) > > > > -- > View this message in context: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1753.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > archive at Nabble.com. > |
i want to use the above mentioned scala-rework-branch https://github.com/aljoscha/incubator-flink/tree/scala-rework.
the apache hadoop-version is 2.4.0 and I build Flink with: mvn clean package -DskipTests -Dhadoop.profile=2 -Dhadoop.version=2.4.0 |
Hi,
the maven call seems to be correct for your Hadoop version. Can you check if the build contains the hadoop 1.2.1 jar file in the "lib/" directory? Ideally, all jars that contain the term "hadoop" should have version 2.4.0 in their name. Robert On Tue, Sep 9, 2014 at 10:15 PM, normanSp <[hidden email]> wrote: > i want to use the above mentioned scala-rework-branch > https://github.com/aljoscha/incubator-flink/tree/scala-rework. > the apache hadoop-version is 2.4.0 and I build Flink with: > mvn clean package -DskipTests -Dhadoop.profile=2 -Dhadoop.version=2.4.0 > > > > -- > View this message in context: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1759.html > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list > archive at Nabble.com. > |
hi robert,
no there are no files with hadoop-1.2.1. all files that contain hadoop have version 2.4.0. |
Free forum by Nabble | Edit this page |