(DEPRECATED) Apache Flink Mailing List archive.

flink performance

Classic

List

Threaded

26 messages Options

normanSp

flink performance

Hello,
I'm a bit confused about the performance of Flink.
My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5
gb reserved for OS). using flink-0.6 in standalone-cluster mode.
i played a little bit with the config-settings but without much impact
on execution time.
flink-conf.yaml:
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 14336
taskmanager.memory.size: -1
taskmanager.numberOfTaskSlots: 4
parallelization.degree.default: 16
taskmanager.network.numberOfBuffers: 4096
fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/

I tried two applications: wordcount and k-Means scala example code
wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb
input it needs 33minutes with 2.2gb nearly 90 minutes!

the monitoring tool ganglia says, that cpu has low cpu utilization and a
lot of waiting time. in wordcount cpu utilizes with nearly 100 percent.
Is this a ordinary dimension of execution time in spark? or are
optimizations in my config necessary? or maybe a bottleneck in the cluster?

i hope somebody could help me :)
greets Norman

Aljoscha Krettek-2

Re: flink performance

Hi Norman,
I saw you were running our Scala Examples. Unfortunately those do not
run as well as our Java examples right now. The Scala API was a bit of
a prototype that has some issues with efficiency. For now, you could
maybe try running our Java examples.

For your cluster, good configuration values would be numberOfTaskSlots
= 4 (number of CPU cores) and parallelization.degree.default = 32
(number of nodes X number of CPU cores).

The Scala API is being rewritten for our next release, so if you
really want to check out Scala examples I could point you to my
personal branch on github where development of the new Scala API is
taking place.

Cheers,
Aljoscha

On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
<[hidden email]> wrote:

> Hello,
> I'm a bit confused about the performance of Flink.
> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb
> reserved for OS). using flink-0.6 in standalone-cluster mode.
> i played a little bit with the config-settings but without much impact on
> execution time.
> flink-conf.yaml:
> jobmanager.rpc.port: 6123
> jobmanager.heap.mb: 1024
> taskmanager.heap.mb: 14336
> taskmanager.memory.size: -1
> taskmanager.numberOfTaskSlots: 4
> parallelization.degree.default: 16
> taskmanager.network.numberOfBuffers: 4096
> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
>
> I tried two applications: wordcount and k-Means scala example code
> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb input
> it needs 33minutes with 2.2gb nearly 90 minutes!
>
> the monitoring tool ganglia says, that cpu has low cpu utilization and a lot
> of waiting time. in wordcount cpu utilizes with nearly 100 percent.
> Is this a ordinary dimension of execution time in spark? or are
> optimizations in my config necessary? or maybe a bottleneck in the cluster?
>
> i hope somebody could help me :)
> greets Norman

Robert Metzger

Re: flink performance

There is probably a little typo in Aljoscha's answer. The
taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine)
The parallelization.degree.default is correct.

On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[hidden email]>
wrote:

> Hi Norman,
> I saw you were running our Scala Examples. Unfortunately those do not
> run as well as our Java examples right now. The Scala API was a bit of
> a prototype that has some issues with efficiency. For now, you could
> maybe try running our Java examples.
>
> For your cluster, good configuration values would be numberOfTaskSlots
> = 4 (number of CPU cores) and parallelization.degree.default = 32
> (number of nodes X number of CPU cores).
>
> The Scala API is being rewritten for our next release, so if you
> really want to check out Scala examples I could point you to my
> personal branch on github where development of the new Scala API is
> taking place.
>
> Cheers,
> Aljoscha
>
> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
> <[hidden email]> wrote:
> > Hello,
> > I'm a bit confused about the performance of Flink.
> > My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb
> > reserved for OS). using flink-0.6 in standalone-cluster mode.
> > i played a little bit with the config-settings but without much impact on
> > execution time.
> > flink-conf.yaml:
> > jobmanager.rpc.port: 6123
> > jobmanager.heap.mb: 1024
> > taskmanager.heap.mb: 14336
> > taskmanager.memory.size: -1
> > taskmanager.numberOfTaskSlots: 4
> > parallelization.degree.default: 16
> > taskmanager.network.numberOfBuffers: 4096
> > fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
> >
> > I tried two applications: wordcount and k-Means scala example code
> > wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
> > kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb
> input
> > it needs 33minutes with 2.2gb nearly 90 minutes!
> >
> > the monitoring tool ganglia says, that cpu has low cpu utilization and a
> lot
> > of waiting time. in wordcount cpu utilizes with nearly 100 percent.
> > Is this a ordinary dimension of execution time in spark? or are
> > optimizations in my config necessary? or maybe a bottleneck in the
> cluster?
> >
> > i hope somebody could help me :)
> > greets Norman
>

normanSp

Re: flink performance

I tried different values for the numberOfTaskSlots (1, 2, 4, 8) and DOP
to optimize flink.
@Aljoscha: it would be great to try out the new Scala-API for flink. I
wrote already some other apps in scala, so I doesn't have to rewrite them.

Am 08.09.2014 16:13, schrieb Robert Metzger:

> There is probably a little typo in Aljoscha's answer. The
> taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine)
> The parallelization.degree.default is correct.
>
> On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[hidden email]>
> wrote:
>
>> Hi Norman,
>> I saw you were running our Scala Examples. Unfortunately those do not
>> run as well as our Java examples right now. The Scala API was a bit of
>> a prototype that has some issues with efficiency. For now, you could
>> maybe try running our Java examples.
>>
>> For your cluster, good configuration values would be numberOfTaskSlots
>> = 4 (number of CPU cores) and parallelization.degree.default = 32
>> (number of nodes X number of CPU cores).
>>
>> The Scala API is being rewritten for our next release, so if you
>> really want to check out Scala examples I could point you to my
>> personal branch on github where development of the new Scala API is
>> taking place.
>>
>> Cheers,
>> Aljoscha
>>
>> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
>> <[hidden email]> wrote:
>>> Hello,
>>> I'm a bit confused about the performance of Flink.
>>> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5 gb
>>> reserved for OS). using flink-0.6 in standalone-cluster mode.
>>> i played a little bit with the config-settings but without much impact on
>>> execution time.
>>> flink-conf.yaml:
>>> jobmanager.rpc.port: 6123
>>> jobmanager.heap.mb: 1024
>>> taskmanager.heap.mb: 14336
>>> taskmanager.memory.size: -1
>>> taskmanager.numberOfTaskSlots: 4
>>> parallelization.degree.default: 16
>>> taskmanager.network.numberOfBuffers: 4096
>>> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
>>>
>>> I tried two applications: wordcount and k-Means scala example code
>>> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
>>> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb
>> input
>>> it needs 33minutes with 2.2gb nearly 90 minutes!
>>>
>>> the monitoring tool ganglia says, that cpu has low cpu utilization and a
>> lot
>>> of waiting time. in wordcount cpu utilizes with nearly 100 percent.
>>> Is this a ordinary dimension of execution time in spark? or are
>>> optimizations in my config necessary? or maybe a bottleneck in the
>> cluster?
>>> i hope somebody could help me :)
>>> greets Norman

Aljoscha Krettek-2

Re: flink performance

Ok.

My work is available here:
https://github.com/aljoscha/incubator-flink/tree/scala-rework

Please look at the WordCount and KMeans example to see how the API has
changed but basically only the way you create Data Sources is changed.

I'm looking forward to your feedback. :D

On Mon, Sep 8, 2014 at 4:22 PM, Norman Spangenberg
<[hidden email]> wrote:

> I tried different values for the numberOfTaskSlots (1, 2, 4, 8) and DOP to
> optimize flink.
> @Aljoscha: it would be great to try out the new Scala-API for flink. I wrote
> already some other apps in scala, so I doesn't have to rewrite them.
>
> Am 08.09.2014 16:13, schrieb Robert Metzger:
>
>> There is probably a little typo in Aljoscha's answer. The
>> taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per machine)
>> The parallelization.degree.default is correct.
>>
>> On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[hidden email]>
>> wrote:
>>
>>> Hi Norman,
>>> I saw you were running our Scala Examples. Unfortunately those do not
>>> run as well as our Java examples right now. The Scala API was a bit of
>>> a prototype that has some issues with efficiency. For now, you could
>>> maybe try running our Java examples.
>>>
>>> For your cluster, good configuration values would be numberOfTaskSlots
>>> = 4 (number of CPU cores) and parallelization.degree.default = 32
>>> (number of nodes X number of CPU cores).
>>>
>>> The Scala API is being rewritten for our next release, so if you
>>> really want to check out Scala examples I could point you to my
>>> personal branch on github where development of the new Scala API is
>>> taking place.
>>>
>>> Cheers,
>>> Aljoscha
>>>
>>> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
>>> <[hidden email]> wrote:
>>>>
>>>> Hello,
>>>> I'm a bit confused about the performance of Flink.
>>>> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5
>>>> gb
>>>> reserved for OS). using flink-0.6 in standalone-cluster mode.
>>>> i played a little bit with the config-settings but without much impact
>>>> on
>>>> execution time.
>>>> flink-conf.yaml:
>>>> jobmanager.rpc.port: 6123
>>>> jobmanager.heap.mb: 1024
>>>> taskmanager.heap.mb: 14336
>>>> taskmanager.memory.size: -1
>>>> taskmanager.numberOfTaskSlots: 4
>>>> parallelization.degree.default: 16
>>>> taskmanager.network.numberOfBuffers: 4096
>>>> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
>>>>
>>>> I tried two applications: wordcount and k-Means scala example code
>>>> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
>>>> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb
>>>
>>> input
>>>>
>>>> it needs 33minutes with 2.2gb nearly 90 minutes!
>>>>
>>>> the monitoring tool ganglia says, that cpu has low cpu utilization and a
>>>
>>> lot
>>>>
>>>> of waiting time. in wordcount cpu utilizes with nearly 100 percent.
>>>> Is this a ordinary dimension of execution time in spark? or are
>>>> optimizations in my config necessary? or maybe a bottleneck in the
>>>
>>> cluster?
>>>>
>>>> i hope somebody could help me :)
>>>> greets Norman
>
>

Ufuk Celebi-2

Re: flink performance

Just to make sure that there is no confusion: with Aljoscha's refactoring
the Scala API will be a thin layer ontop of the Java API and should have
comparable/same performance as the Java API (one difference is that Scala
tuples are immutable whereas Java tuples are mutable and instances can be
reused).

I don't want someone in the future reading this thread to think that the
Scala API incurs a large performance hit. ;)

On Mon, Sep 8, 2014 at 5:13 PM, Aljoscha Krettek <[hidden email]>
wrote:

> Ok.
>
> My work is available here:
> https://github.com/aljoscha/incubator-flink/tree/scala-rework
>
> Please look at the WordCount and KMeans example to see how the API has
> changed but basically only the way you create Data Sources is changed.
>
> I'm looking forward to your feedback. :D
>
> On Mon, Sep 8, 2014 at 4:22 PM, Norman Spangenberg
> <[hidden email]> wrote:
> > I tried different values for the numberOfTaskSlots (1, 2, 4, 8) and DOP
> to
> > optimize flink.
> > @Aljoscha: it would be great to try out the new Scala-API for flink. I
> wrote
> > already some other apps in scala, so I doesn't have to rewrite them.
> >
> > Am 08.09.2014 16:13, schrieb Robert Metzger:
> >
> >> There is probably a little typo in Aljoscha's answer. The
> >> taskmanager.numberOfTaskSlots should be 8 (there are 8 cores per
> machine)
> >> The parallelization.degree.default is correct.
> >>
> >> On Mon, Sep 8, 2014 at 4:09 PM, Aljoscha Krettek <[hidden email]>
> >> wrote:
> >>
> >>> Hi Norman,
> >>> I saw you were running our Scala Examples. Unfortunately those do not
> >>> run as well as our Java examples right now. The Scala API was a bit of
> >>> a prototype that has some issues with efficiency. For now, you could
> >>> maybe try running our Java examples.
> >>>
> >>> For your cluster, good configuration values would be numberOfTaskSlots
> >>> = 4 (number of CPU cores) and parallelization.degree.default = 32
> >>> (number of nodes X number of CPU cores).
> >>>
> >>> The Scala API is being rewritten for our next release, so if you
> >>> really want to check out Scala examples I could point you to my
> >>> personal branch on github where development of the new Scala API is
> >>> taking place.
> >>>
> >>> Cheers,
> >>> Aljoscha
> >>>
> >>> On Mon, Sep 8, 2014 at 2:48 PM, Norman Spangenberg
> >>> <[hidden email]> wrote:
> >>>>
> >>>> Hello,
> >>>> I'm a bit confused about the performance of Flink.
> >>>> My cluster consists of 4 nodes, each with 8 cores and 16gb memory (1.5
> >>>> gb
> >>>> reserved for OS). using flink-0.6 in standalone-cluster mode.
> >>>> i played a little bit with the config-settings but without much impact
> >>>> on
> >>>> execution time.
> >>>> flink-conf.yaml:
> >>>> jobmanager.rpc.port: 6123
> >>>> jobmanager.heap.mb: 1024
> >>>> taskmanager.heap.mb: 14336
> >>>> taskmanager.memory.size: -1
> >>>> taskmanager.numberOfTaskSlots: 4
> >>>> parallelization.degree.default: 16
> >>>> taskmanager.network.numberOfBuffers: 4096
> >>>> fs.hdfs.hadoopconf: /opt/yarn/hadoop-2.4.0/etc/hadoop/
> >>>>
> >>>> I tried two applications: wordcount and k-Means scala example code
> >>>> wordcount needs 5 minutes for 25gb, and 13 minutes for 50gb.
> >>>> kmeans (10 iterations) needs for 56mb input 86 seconds, but with 1.1gb
> >>>
> >>> input
> >>>>
> >>>> it needs 33minutes with 2.2gb nearly 90 minutes!
> >>>>
> >>>> the monitoring tool ganglia says, that cpu has low cpu utilization
> and a
> >>>
> >>> lot
> >>>>
> >>>> of waiting time. in wordcount cpu utilizes with nearly 100 percent.
> >>>> Is this a ordinary dimension of execution time in spark? or are
> >>>> optimizations in my config necessary? or maybe a bottleneck in the
> >>>
> >>> cluster?
> >>>>
> >>>> i hope somebody could help me :)
> >>>> greets Norman
> >
> >
>

normanSp

Re: flink performance

In reply to this post by Aljoscha Krettek-2

Thank you.
But I have already trouble with the shipped example-programs. I can't use files in hdfs. fs.hdfs.hadoopconf option is set. the same config works in 0.6 and example-programs run with hdfs-files too.

Robert Metzger

Re: flink performance

Hi,

can you post the exact error message why HDFS is not working?

On Tue, Sep 9, 2014 at 3:50 PM, normanSp <[hidden email]>
wrote:

> Thank you.
> But I have already trouble with the shipped example-programs. I can't use
> files in hdfs. fs.hdfs.hadoopconf option is set. the same config works in
> 0.6 and example-programs run with hdfs-files too.
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1744.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>

normanSp

Re: flink performance

of course, sorry.

command: ./bin/flink run ./examples/flink-java-examples-0.7-incubating-SNAPSHOT-WordCount.jar hdfs:/input.xml hdfs:/result

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: java.io.IOException: The given HDFS file URI (hdfs:/input.xml) did not describe the HDFS Namenode. The attempt to use a default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or 'fs.hdfs.hdfssite' config parameter failed due to the following problem: Either no default hdfs configuration was registered, or the provided configuration contains no valid hdfs namenode address (fs.default.name or fs.defaultFS) describing the hdfs namenode host and port.
at org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:279)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299)
at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391)
at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52)
at org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177)
at org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422)
at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958)

at org.apache.flink.client.program.Client.run(Client.java:325)
at org.apache.flink.client.program.Client.run(Client.java:291)
at org.apache.flink.client.program.Client.run(Client.java:285)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54)
at org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307)
at org.apache.flink.client.program.Client.run(Client.java:244)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)

Fabian Hueske

Re: flink performance

Try to specify the HDFS path as follows:

hdfs://namenode-host:namenode-port/foo/bar/

Best, Fabian

2014-09-09 16:03 GMT+02:00 normanSp <[hidden email]>:

> of course, sorry.
>
> command: ./bin/flink run
> ./examples/flink-java-examples-0.7-incubating-SNAPSHOT-WordCount.jar
> hdfs:/input.xml hdfs:/result
>
> org.apache.flink.client.program.ProgramInvocationException: The program
> execution failed: java.io.IOException: The given HDFS file URI
> (hdfs:/input.xml) did not describe the HDFS Namenode. The attempt to use a
> default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or
> 'fs.hdfs.hdfssite' config parameter failed due to the following problem:
> Either no default hdfs configuration was registered, or the provided
> configuration contains no valid hdfs namenode address (fs.default.name or
> fs.defaultFS) describing the hdfs namenode host and port.
> at
>
> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:279)
> at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248)
> at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299)
> at
>
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391)
> at
>
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52)
> at
>
> org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101)
> at
>
> org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495)
> at
>
> org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281)
> at
>
> org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177)
> at
>
> org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422)
> at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958)
>
> at org.apache.flink.client.program.Client.run(Client.java:325)
> at org.apache.flink.client.program.Client.run(Client.java:291)
> at org.apache.flink.client.program.Client.run(Client.java:285)
> at
>
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54)
> at
> org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389)
> at
>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307)
> at org.apache.flink.client.program.Client.run(Client.java:244)
> at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
> at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1746.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>

Ufuk Celebi-2

Re: flink performance

In reply to this post by normanSp

Hey Norman,

I'm not sure but have you tried it with 3 backslashes as in
hdfs:///input.xml.

Or you can specify the address of the namenode via the config key
fs.default.name or fs.defaultFS, i.e.:

<property><name>fs.default.name
</name><value>hdfs://...:port</value></property>

On Tue, Sep 9, 2014 at 4:03 PM, normanSp <[hidden email]>
wrote:

> (fs.default.name or
> fs.defaultFS
>

Robert Metzger

Re: flink performance

In reply to this post by Fabian Hueske

I guess in the core-site.xml, the property "fs.defaultFS" has not been set
to the namenode.
You basically put the default address there, like: "hdfs://master:8037/".

On Tue, Sep 9, 2014 at 4:08 PM, Fabian Hueske <[hidden email]> wrote:

> Try to specify the HDFS path as follows:
>
> hdfs://namenode-host:namenode-port/foo/bar/
>
> Best, Fabian
>
> 2014-09-09 16:03 GMT+02:00 normanSp <[hidden email]>:
>
> > of course, sorry.
> >
> > command: ./bin/flink run
> > ./examples/flink-java-examples-0.7-incubating-SNAPSHOT-WordCount.jar
> > hdfs:/input.xml hdfs:/result
> >
> > org.apache.flink.client.program.ProgramInvocationException: The program
> > execution failed: java.io.IOException: The given HDFS file URI
> > (hdfs:/input.xml) did not describe the HDFS Namenode. The attempt to use
> a
> > default HDFS configuration, as specified in the 'fs.hdfs.hdfsdefault' or
> > 'fs.hdfs.hdfssite' config parameter failed due to the following problem:
> > Either no default hdfs configuration was registered, or the provided
> > configuration contains no valid hdfs namenode address (fs.default.name
> or
> > fs.defaultFS) describing the hdfs namenode host and port.
> > at
> >
> >
> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:279)
> > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248)
> > at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299)
> > at
> >
> >
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391)
> > at
> >
> >
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52)
> > at
> >
> >
> org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101)
> > at
> >
> >
> org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495)
> > at
> >
> >
> org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281)
> > at
> >
> >
> org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177)
> > at
> >
> >
> org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422)
> > at
> org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958)
> >
> > at org.apache.flink.client.program.Client.run(Client.java:325)
> > at org.apache.flink.client.program.Client.run(Client.java:291)
> > at org.apache.flink.client.program.Client.run(Client.java:285)
> > at
> >
> >
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54)
> > at
> > org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> >
> >
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389)
> > at
> >
> >
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307)
> > at org.apache.flink.client.program.Client.run(Client.java:244)
> > at
> > org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
> > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
> > at
> >
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
> > at
> org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1746.html
> > Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> > archive at Nabble.com.
> >
>

normanSp

Re: flink performance

In reply to this post by Fabian Hueske

thanks, but i tried all that versions before I posted here. and exactly the same conf-file works in 0.6.
@fabian
with full hostname and port I get this error:

org.apache.flink.client.program.ProgramInvocationException: The program execution failed: java.io.IOException: The given file URI (hdfs://MASTER:9000/input.xml) described the host and port of an HDFS Namenode, but the File System could not be initialized with that address: Server IPC version 9 cannot communicate with client version 4
at org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:305)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248)
at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299)
at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391)
at org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52)
at org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281)
at org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177)
at org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422)
at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958)
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1113)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
at org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302)
... 15 more

at org.apache.flink.client.program.Client.run(Client.java:325)
at org.apache.flink.client.program.Client.run(Client.java:291)
at org.apache.flink.client.program.Client.run(Client.java:285)
at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54)
at org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307)
at org.apache.flink.client.program.Client.run(Client.java:244)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)

Fabian Hueske-2

Re: flink performance

This error message basically says that Flink's HDFS-client and the running
HDFS are not compatible.

We have different builds for Hadoop 1.0 and Hadoop 2.0.
Please check the version of the running HDFS and choose the corresponding
build.

Best, Fabian

2014-09-09 16:23 GMT+02:00 normanSp <[hidden email]>:

> thanks, but i tried all that versions before I posted here. and exactly the
> same conf-file works in 0.6.
> @fabian
> with full hostname and port I get this error:
>
> org.apache.flink.client.program.ProgramInvocationException: The program
> execution failed: java.io.IOException: The given file URI
> (hdfs://MASTER:9000/input.xml) described the host and port of an HDFS
> Namenode, but the File System could not be initialized with that address:
> Server IPC version 9 cannot communicate with client version 4
> at
>
> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:305)
> at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248)
> at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299)
> at
>
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391)
> at
>
> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52)
> at
>
> org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101)
> at
>
> org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495)
> at
>
> org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281)
> at
>
> org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177)
> at
>
> org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422)
> at org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958)
> Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9
> cannot communicate with client version 4
> at org.apache.hadoop.ipc.Client.call(Client.java:1113)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
> at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
> at
>
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
> at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
> at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
> at
> org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
> at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
> at
>
> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302)
> ... 15 more
>
> at org.apache.flink.client.program.Client.run(Client.java:325)
> at org.apache.flink.client.program.Client.run(Client.java:291)
> at org.apache.flink.client.program.Client.run(Client.java:285)
> at
>
> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54)
> at
> org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
>
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389)
> at
>
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307)
> at org.apache.flink.client.program.Client.run(Client.java:244)
> at
> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
> at
> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1750.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>

Fabian Hueske

Re: flink performance

> This error message basically says that Flink's HDFS-client and the running
> HDFS are not compatible.
>
> We have different builds for Hadoop 1.0 and Hadoop 2.0.
> Please check the version of the running HDFS and choose the corresponding
> build.
>
> Best, Fabian
>
> 2014-09-09 16:23 GMT+02:00 normanSp <[hidden email]>:
>
>> thanks, but i tried all that versions before I posted here. and exactly
>> the
>> same conf-file works in 0.6.
>> @fabian
>> with full hostname and port I get this error:
>>
>> org.apache.flink.client.program.ProgramInvocationException: The program
>> execution failed: java.io.IOException: The given file URI
>> (hdfs://MASTER:9000/input.xml) described the host and port of an HDFS
>> Namenode, but the File System could not be initialized with that address:
>> Server IPC version 9 cannot communicate with client version 4
>> at
>>
>> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:305)
>> at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248)
>> at org.apache.flink.core.fs.Path.getFileSystem(Path.java:299)
>> at
>>
>> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:391)
>> at
>>
>> org.apache.flink.api.common.io.FileInputFormat.createInputSplits(FileInputFormat.java:52)
>> at
>>
>> org.apache.flink.runtime.jobgraph.JobInputVertex.getInputSplits(JobInputVertex.java:101)
>> at
>>
>> org.apache.flink.runtime.executiongraph.ExecutionGraph.createVertex(ExecutionGraph.java:495)
>> at
>>
>> org.apache.flink.runtime.executiongraph.ExecutionGraph.constructExecutionGraph(ExecutionGraph.java:281)
>> at
>>
>> org.apache.flink.runtime.executiongraph.ExecutionGraph.<init>(ExecutionGraph.java:177)
>> at
>>
>> org.apache.flink.runtime.jobmanager.JobManager.submitJob(JobManager.java:477)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at org.apache.flink.runtime.ipc.RPC$Server.call(RPC.java:422)
>> at
>> org.apache.flink.runtime.ipc.Server$Handler.run(Server.java:958)
>> Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9
>> cannot communicate with client version 4
>> at org.apache.hadoop.ipc.Client.call(Client.java:1113)
>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
>> at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
>> at
>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
>> at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
>> at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
>> at
>> org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
>> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
>> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
>> at
>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
>> at
>>
>> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302)
>> ... 15 more
>>
>> at org.apache.flink.client.program.Client.run(Client.java:325)
>> at org.apache.flink.client.program.Client.run(Client.java:291)
>> at org.apache.flink.client.program.Client.run(Client.java:285)
>> at
>>
>> org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54)
>> at
>> org.apache.flink.example.java.wordcount.WordCount.main(WordCount.java:82)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at
>>
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389)
>> at
>>
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307)
>> at org.apache.flink.client.program.Client.run(Client.java:244)
>> at
>> org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:347)
>> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:334)
>> at
>> org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1001)
>> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1025)
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1750.html
>> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>
>

normanSp

Re: flink performance

okay, i thought that is only in yarn-mode necessary.
i did it but the next error follows.

java.io.IOException: Error opening the Input Split hdfs://MASTER:9000/input.xml [46573551616,134217728]: org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol;
at org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:598)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:446)
at org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:49)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:140)
at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol;
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:135)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:280)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
at org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302)
at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248)
at org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:737)

Fabian Hueske

Re: flink performance

This still looks like the versions aren't in sync.
What's the exact version of your HDFS (incl. distribution: Hadoop, CDH,
etc.) and which Stratosphere build are you using?

2014-09-09 16:57 GMT+02:00 normanSp <[hidden email]>:

> okay, i thought that is only in yarn-mode necessary.
> i did it but the next error follows.
>
> java.io.IOException: Error opening the Input Split
> hdfs://MASTER:9000/input.xml [46573551616,134217728]:
>
> org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol;
> at
>
> org.apache.flink.api.common.io.FileInputFormat.open(FileInputFormat.java:598)
> at
>
> org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:446)
> at
>
> org.apache.flink.api.common.io.DelimitedInputFormat.open(DelimitedInputFormat.java:49)
> at
>
> org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:140)
> at
>
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
> at java.lang.Thread.run(Thread.java:744)
> Caused by: java.lang.NoSuchMethodError:
>
> org.apache.hadoop.ipc.RPC.getProxy(Ljava/lang/Class;JLjava/net/InetSocketAddress;Lorg/apache/hadoop/security/UserGroupInformation;Lorg/apache/hadoop/conf/Configuration;Ljavax/net/SocketFactory;ILorg/apache/hadoop/io/retry/RetryPolicy;Z)Lorg/apache/hadoop/ipc/VersionedProtocol;
> at
> org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:135)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:280)
> at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
> at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
> at
>
> org.apache.flink.runtime.fs.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:302)
> at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:248)
> at
>
> org.apache.flink.api.common.io.FileInputFormat$InputSplitOpenThread.run(FileInputFormat.java:737)
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1753.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>

normanSp

Re: flink performance

i want to use the above mentioned scala-rework-branch https://github.com/aljoscha/incubator-flink/tree/scala-rework.
the apache hadoop-version is 2.4.0 and I build Flink with:
mvn clean package -DskipTests -Dhadoop.profile=2 -Dhadoop.version=2.4.0

Robert Metzger

Re: flink performance

Hi,

the maven call seems to be correct for your Hadoop version.
Can you check if the build contains the hadoop 1.2.1 jar file in the "lib/"
directory?
Ideally, all jars that contain the term "hadoop" should have version 2.4.0
in their name.

Robert

On Tue, Sep 9, 2014 at 10:15 PM, normanSp <[hidden email]>
wrote:

> i want to use the above mentioned scala-rework-branch
> https://github.com/aljoscha/incubator-flink/tree/scala-rework.
> the apache hadoop-version is 2.4.0 and I build Flink with:
> mvn clean package -DskipTests -Dhadoop.profile=2 -Dhadoop.version=2.4.0
>
>
>
> --
> View this message in context:
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/flink-performance-tp1726p1759.html
> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
> archive at Nabble.com.
>

normanSp

Re: flink performance

hi robert,
no there are no files with hadoop-1.2.1. all files that contain hadoop have version 2.4.0.