Hello Everyone,
I have followed the steps specified below link to Install & Run Apache Flink on Multi-node Cluster. http://data-flair.training/blogs/install-run-deploy-flink-multi-node-cluster/ used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install using the command " bin/flink run /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples/streaming/WordCount.jar" able to run wordcount, but where can i see which input consider and output generated? and how can i specify the input and output paths? I'm trying to understand how the wordcount will work using Multi-node Cluster.? any suggestions will help me further understanding? Thanks & Regards, Ramanji. |
Hi Ramanji,
you can find the source code of the examples here: https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/wordcount/WordCount.java A general introduction how the cluster execution works can be found here: https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/programming-model.html#programs-and-dataflows https://ci.apache.org/projects/flink/flink-docs-release-1.4/concepts/runtime.html It might also be helpful to have a look at the web interface which can show you a nice graph of the job. I hope this helps. Feel free to ask further questions. Regards, Timo Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: > Hello Everyone, > > I have followed the steps specified below link to Install & Run Apache > Flink on Multi-node Cluster. > > http://data-flair.training/blogs/install-run-deploy-flink-multi-node-cluster/ > used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install > > using the command > " bin/flink run > /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples/streaming/WordCount.jar" > able to run wordcount, but where can i see which input consider and output > generated? > > and how can i specify the input and output paths? > > I'm trying to understand how the wordcount will work using Multi-node > Cluster.? > > any suggestions will help me further understanding? > > Thanks & Regards, > Ramanji. > |
Thank you Timo.
root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Cluster/rama/flink$ *./bin/flink run ./examples/streaming/WordCount.jar --input file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_out* Execution of worcountjar gives error... 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED java.io.FileNotFoundException: The provided file path file:/home/root1/hamlet.txt does not exist. at org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:748) On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[hidden email]> wrote: > Hi Ramanji, > > you can find the source code of the examples here: > https://github.com/apache/flink/blob/master/flink-examples/ > flink-examples-streaming/src/main/java/org/apache/flink/ > streaming/examples/wordcount/WordCount.java > > A general introduction how the cluster execution works can be found here: > https://ci.apache.org/projects/flink/flink-docs-release-1.4/ > concepts/programming-model.html#programs-and-dataflows > https://ci.apache.org/projects/flink/flink-docs-release-1.4/ > concepts/runtime.html > > It might also be helpful to have a look at the web interface which can > show you a nice graph of the job. > > I hope this helps. Feel free to ask further questions. > > Regards, > Timo > > > Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: > > Hello Everyone, >> >> I have followed the steps specified below link to Install & Run Apache >> Flink on Multi-node Cluster. >> >> http://data-flair.training/blogs/install-run-deploy-flink- >> multi-node-cluster/ >> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install >> >> using the command >> " bin/flink run >> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples >> /streaming/WordCount.jar" >> able to run wordcount, but where can i see which input consider and output >> generated? >> >> and how can i specify the input and output paths? >> >> I'm trying to understand how the wordcount will work using Multi-node >> Cluster.? >> >> any suggestions will help me further understanding? >> >> Thanks & Regards, >> Ramanji. >> >> > |
Make sure that the file exists and is accessible from all Flink tasks
managers. Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy: > Thank you Timo. > > > root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Cluster/rama/flink$ > *./bin/flink > run ./examples/streaming/WordCount.jar --input > file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_out* > > > Execution of worcountjar gives error... > > 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED > java.io.FileNotFoundException: The provided file path > file:/home/root1/hamlet.txt does not exist. > at > org.apache.flink.streaming.api.functions.source.ContinuousFileMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55) > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:748) > > > On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[hidden email]> wrote: > >> Hi Ramanji, >> >> you can find the source code of the examples here: >> https://github.com/apache/flink/blob/master/flink-examples/ >> flink-examples-streaming/src/main/java/org/apache/flink/ >> streaming/examples/wordcount/WordCount.java >> >> A general introduction how the cluster execution works can be found here: >> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >> concepts/programming-model.html#programs-and-dataflows >> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >> concepts/runtime.html >> >> It might also be helpful to have a look at the web interface which can >> show you a nice graph of the job. >> >> I hope this helps. Feel free to ask further questions. >> >> Regards, >> Timo >> >> >> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: >> >> Hello Everyone, >>> I have followed the steps specified below link to Install & Run Apache >>> Flink on Multi-node Cluster. >>> >>> http://data-flair.training/blogs/install-run-deploy-flink- >>> multi-node-cluster/ >>> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install >>> >>> using the command >>> " bin/flink run >>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples >>> /streaming/WordCount.jar" >>> able to run wordcount, but where can i see which input consider and output >>> generated? >>> >>> and how can i specify the input and output paths? >>> >>> I'm trying to understand how the wordcount will work using Multi-node >>> Cluster.? >>> >>> any suggestions will help me further understanding? >>> >>> Thanks & Regards, >>> Ramanji. >>> >>> |
Hi Timo,
Problem is resolved after copy input file to all tasks managers. and where should generate outputfile? Is it in jobmanager or task manager? Where can i see the execution logs to understand how word count done each task manager? By the way any option to overwride...? 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to FAILED java.io.IOException: File or directory already exists. Existing files and directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to overwrite existing files and directories. at org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:763) at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutPathLocalFS(SafetyNetWrapperFileSystem.java:135) at org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:231) at org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:78) at org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:61) at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:111) at org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:376) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) at java.lang.Thread.run(Thread.java:745) On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <[hidden email]> wrote: > Make sure that the file exists and is accessible from all Flink tasks > managers. > > > Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy: > >> Thank you Timo. >> >> >> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust >> er/rama/flink$ >> *./bin/flink >> run ./examples/streaming/WordCount.jar --input >> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_out* >> >> >> >> Execution of worcountjar gives error... >> >> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED >> java.io.FileNotFoundException: The provided file path >> file:/home/root1/hamlet.txt does not exist. >> at >> org.apache.flink.streaming.api.functions.source.ContinuousFi >> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) >> at >> org.apache.flink.streaming.api.operators.StreamSource.run( >> StreamSource.java:87) >> at >> org.apache.flink.streaming.api.operators.StreamSource.run( >> StreamSource.java:55) >> at >> org.apache.flink.streaming.runtime.tasks.SourceStreamTask. >> run(SourceStreamTask.java:95) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> StreamTask.java:263) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >> at java.lang.Thread.run(Thread.java:748) >> >> >> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[hidden email]> wrote: >> >> Hi Ramanji, >>> >>> you can find the source code of the examples here: >>> https://github.com/apache/flink/blob/master/flink-examples/ >>> flink-examples-streaming/src/main/java/org/apache/flink/ >>> streaming/examples/wordcount/WordCount.java >>> >>> A general introduction how the cluster execution works can be found here: >>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >>> concepts/programming-model.html#programs-and-dataflows >>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >>> concepts/runtime.html >>> >>> It might also be helpful to have a look at the web interface which can >>> show you a nice graph of the job. >>> >>> I hope this helps. Feel free to ask further questions. >>> >>> Regards, >>> Timo >>> >>> >>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: >>> >>> Hello Everyone, >>> >>>> I have followed the steps specified below link to Install & Run Apache >>>> Flink on Multi-node Cluster. >>>> >>>> http://data-flair.training/blogs/install-run-deploy-flink- >>>> multi-node-cluster/ >>>> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install >>>> >>>> using the command >>>> " bin/flink run >>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples >>>> /streaming/WordCount.jar" >>>> able to run wordcount, but where can i see which input consider and >>>> output >>>> generated? >>>> >>>> and how can i specify the input and output paths? >>>> >>>> I'm trying to understand how the wordcount will work using Multi-node >>>> Cluster.? >>>> >>>> any suggestions will help me further understanding? >>>> >>>> Thanks & Regards, >>>> Ramanji. >>>> >>>> >>>> > |
Flink is a distributed software for clusters. You need something like a
distributed file system. So that input file and output files can be accessed from all nodes. Each TM has a log directory where the execution logs are stored. You can set additional properties to your output format by importing the code in your IDE. Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy: > Hi Timo, > Problem is resolved after copy input file to all tasks managers. > > and where should generate outputfile? Is it in jobmanager or task manager? > > Where can i see the execution logs to understand how word count done each > task manager? > > > By the way any option to overwride...? > > 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to > FAILED > java.io.IOException: File or directory already exists. Existing files and > directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode to > overwrite existing files and directories. > at > org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:763) > at > org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutPathLocalFS(SafetyNetWrapperFileSystem.java:135) > at > org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:231) > at > org.apache.flink.api.java.io.TextOutputFormat.open(TextOutputFormat.java:78) > at > org.apache.flink.streaming.api.functions.sink.OutputFormatSinkFunction.open(OutputFormatSinkFunction.java:61) > at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36) > at > org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:111) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:376) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:253) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > at java.lang.Thread.run(Thread.java:745) > > > On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <[hidden email]> wrote: > >> Make sure that the file exists and is accessible from all Flink tasks >> managers. >> >> >> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy: >> >>> Thank you Timo. >>> >>> >>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust >>> er/rama/flink$ >>> *./bin/flink >>> run ./examples/streaming/WordCount.jar --input >>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_out* >>> >>> >>> >>> Execution of worcountjar gives error... >>> >>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED >>> java.io.FileNotFoundException: The provided file path >>> file:/home/root1/hamlet.txt does not exist. >>> at >>> org.apache.flink.streaming.api.functions.source.ContinuousFi >>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) >>> at >>> org.apache.flink.streaming.api.operators.StreamSource.run( >>> StreamSource.java:87) >>> at >>> org.apache.flink.streaming.api.operators.StreamSource.run( >>> StreamSource.java:55) >>> at >>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask. >>> run(SourceStreamTask.java:95) >>> at >>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >>> StreamTask.java:263) >>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >>> at java.lang.Thread.run(Thread.java:748) >>> >>> >>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[hidden email]> wrote: >>> >>> Hi Ramanji, >>>> you can find the source code of the examples here: >>>> https://github.com/apache/flink/blob/master/flink-examples/ >>>> flink-examples-streaming/src/main/java/org/apache/flink/ >>>> streaming/examples/wordcount/WordCount.java >>>> >>>> A general introduction how the cluster execution works can be found here: >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >>>> concepts/programming-model.html#programs-and-dataflows >>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >>>> concepts/runtime.html >>>> >>>> It might also be helpful to have a look at the web interface which can >>>> show you a nice graph of the job. >>>> >>>> I hope this helps. Feel free to ask further questions. >>>> >>>> Regards, >>>> Timo >>>> >>>> >>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: >>>> >>>> Hello Everyone, >>>> >>>>> I have followed the steps specified below link to Install & Run Apache >>>>> Flink on Multi-node Cluster. >>>>> >>>>> http://data-flair.training/blogs/install-run-deploy-flink- >>>>> multi-node-cluster/ >>>>> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install >>>>> >>>>> using the command >>>>> " bin/flink run >>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples >>>>> /streaming/WordCount.jar" >>>>> able to run wordcount, but where can i see which input consider and >>>>> output >>>>> generated? >>>>> >>>>> and how can i specify the input and output paths? >>>>> >>>>> I'm trying to understand how the wordcount will work using Multi-node >>>>> Cluster.? >>>>> >>>>> any suggestions will help me further understanding? >>>>> >>>>> Thanks & Regards, >>>>> Ramanji. >>>>> >>>>> >>>>> |
Hi Timo,
How to make access the files across TM? Thanks & Regards, Ramanji. On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther <[hidden email]> wrote: > Flink is a distributed software for clusters. You need something like a > distributed file system. So that input file and output files can be > accessed from all nodes. > > Each TM has a log directory where the execution logs are stored. > > You can set additional properties to your output format by importing the > code in your IDE. > > Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy: > > Hi Timo, >> Problem is resolved after copy input file to all tasks managers. >> >> and where should generate outputfile? Is it in jobmanager or task manager? >> >> Where can i see the execution logs to understand how word count done each >> task manager? >> >> >> By the way any option to overwride...? >> >> 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to >> FAILED >> java.io.IOException: File or directory already exists. Existing files and >> directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode >> to >> overwrite existing files and directories. >> at >> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileS >> ystem.java:763) >> at >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutP >> athLocalFS(SafetyNetWrapperFileSystem.java:135) >> at >> org.apache.flink.api.common.io.FileOutputFormat.open(FileOut >> putFormat.java:231) >> at >> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutpu >> tFormat.java:78) >> at >> org.apache.flink.streaming.api.functions.sink.OutputFormatSi >> nkFunction.open(OutputFormatSinkFunction.java:61) >> at >> org.apache.flink.api.common.functions.util.FunctionUtils.ope >> nFunction(FunctionUtils.java:36) >> at >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOp >> erator.open(AbstractUdfStreamOperator.java:111) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllO >> perators(StreamTask.java:376) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> StreamTask.java:253) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >> at java.lang.Thread.run(Thread.java:745) >> >> >> On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <[hidden email]> wrote: >> >> Make sure that the file exists and is accessible from all Flink tasks >>> managers. >>> >>> >>> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy: >>> >>> Thank you Timo. >>>> >>>> >>>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust >>>> er/rama/flink$ >>>> *./bin/flink >>>> run ./examples/streaming/WordCount.jar --input >>>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_o >>>> ut* >>>> >>>> >>>> >>>> Execution of worcountjar gives error... >>>> >>>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED >>>> java.io.FileNotFoundException: The provided file path >>>> file:/home/root1/hamlet.txt does not exist. >>>> at >>>> org.apache.flink.streaming.api.functions.source.ContinuousFi >>>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) >>>> at >>>> org.apache.flink.streaming.api.operators.StreamSource.run( >>>> StreamSource.java:87) >>>> at >>>> org.apache.flink.streaming.api.operators.StreamSource.run( >>>> StreamSource.java:55) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask. >>>> run(SourceStreamTask.java:95) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >>>> StreamTask.java:263) >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >>>> at java.lang.Thread.run(Thread.java:748) >>>> >>>> >>>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[hidden email]> >>>> wrote: >>>> >>>> Hi Ramanji, >>>> >>>>> you can find the source code of the examples here: >>>>> https://github.com/apache/flink/blob/master/flink-examples/ >>>>> flink-examples-streaming/src/main/java/org/apache/flink/ >>>>> streaming/examples/wordcount/WordCount.java >>>>> >>>>> A general introduction how the cluster execution works can be found >>>>> here: >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >>>>> concepts/programming-model.html#programs-and-dataflows >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >>>>> concepts/runtime.html >>>>> >>>>> It might also be helpful to have a look at the web interface which can >>>>> show you a nice graph of the job. >>>>> >>>>> I hope this helps. Feel free to ask further questions. >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> >>>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: >>>>> >>>>> Hello Everyone, >>>>> >>>>> I have followed the steps specified below link to Install & Run Apache >>>>>> Flink on Multi-node Cluster. >>>>>> >>>>>> http://data-flair.training/blogs/install-run-deploy-flink- >>>>>> multi-node-cluster/ >>>>>> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install >>>>>> >>>>>> using the command >>>>>> " bin/flink run >>>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples >>>>>> /streaming/WordCount.jar" >>>>>> able to run wordcount, but where can i see which input consider and >>>>>> output >>>>>> generated? >>>>>> >>>>>> and how can i specify the input and output paths? >>>>>> >>>>>> I'm trying to understand how the wordcount will work using Multi-node >>>>>> Cluster.? >>>>>> >>>>>> any suggestions will help me further understanding? >>>>>> >>>>>> Thanks & Regards, >>>>>> Ramanji. >>>>>> >>>>>> >>>>>> >>>>>> > |
Hi,
like Timo said e.g. you need a distributed filesystem like HDFS. Best regards, Felix On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" <[hidden email]> wrote: Hi Timo, How to make access the files across TM? Thanks & Regards, Ramanji. On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther <[hidden email]> wrote: > Flink is a distributed software for clusters. You need something like a > distributed file system. So that input file and output files can be > accessed from all nodes. > > Each TM has a log directory where the execution logs are stored. > > You can set additional properties to your output format by importing the > code in your IDE. > > Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy: > > Hi Timo, >> Problem is resolved after copy input file to all tasks managers. >> >> and where should generate outputfile? Is it in jobmanager or task >> >> Where can i see the execution logs to understand how word count done each >> task manager? >> >> >> By the way any option to overwride...? >> >> 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to >> FAILED >> java.io.IOException: File or directory already exists. Existing files and >> directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode >> to >> overwrite existing files and directories. >> at >> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileS >> ystem.java:763) >> at >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutP >> athLocalFS(SafetyNetWrapperFileSystem.java:135) >> at >> org.apache.flink.api.common.io.FileOutputFormat.open(FileOut >> putFormat.java:231) >> at >> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutpu >> tFormat.java:78) >> at >> org.apache.flink.streaming.api.functions.sink.OutputFormatSi >> nkFunction.open(OutputFormatSinkFunction.java:61) >> at >> org.apache.flink.api.common.functions.util.FunctionUtils.ope >> nFunction(FunctionUtils.java:36) >> at >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOp >> erator.open(AbstractUdfStreamOperator.java:111) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllO >> perators(StreamTask.java:376) >> at >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> StreamTask.java:253) >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >> at java.lang.Thread.run(Thread.java:745) >> >> >> On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <[hidden email]> wrote: >> >> Make sure that the file exists and is accessible from all Flink tasks >>> managers. >>> >>> >>> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy: >>> >>> Thank you Timo. >>>> >>>> >>>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust >>>> er/rama/flink$ >>>> *./bin/flink >>>> run ./examples/streaming/WordCount.jar --input >>>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_o >>>> ut* >>>> >>>> >>>> >>>> Execution of worcountjar gives error... >>>> >>>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED >>>> java.io.FileNotFoundException: The provided file path >>>> file:/home/root1/hamlet.txt does not exist. >>>> at >>>> org.apache.flink.streaming.api.functions.source.ContinuousFi >>>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) >>>> at >>>> org.apache.flink.streaming.api.operators.StreamSource.run( >>>> StreamSource.java:87) >>>> at >>>> org.apache.flink.streaming.api.operators.StreamSource.run( >>>> StreamSource.java:55) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask. >>>> run(SourceStreamTask.java:95) >>>> at >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >>>> StreamTask.java:263) >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >>>> at java.lang.Thread.run(Thread.java:748) >>>> >>>> >>>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[hidden email]> >>>> wrote: >>>> >>>> Hi Ramanji, >>>> >>>>> you can find the source code of the examples here: >>>>> https://github.com/apache/flink/blob/master/flink-examples/ >>>>> flink-examples-streaming/src/main/java/org/apache/flink/ >>>>> streaming/examples/wordcount/WordCount.java >>>>> >>>>> A general introduction how the cluster execution works can be found >>>>> here: >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >>>>> concepts/programming-model.html#programs-and-dataflows >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >>>>> concepts/runtime.html >>>>> >>>>> It might also be helpful to have a look at the web interface which can >>>>> show you a nice graph of the job. >>>>> >>>>> I hope this helps. Feel free to ask further questions. >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> >>>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: >>>>> >>>>> Hello Everyone, >>>>> >>>>> I have followed the steps specified below link to Install & Run Apache >>>>>> Flink on Multi-node Cluster. >>>>>> >>>>>> http://data-flair.training/blogs/install-run-deploy-flink- >>>>>> multi-node-cluster/ >>>>>> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install >>>>>> >>>>>> using the command >>>>>> " bin/flink run >>>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples >>>>>> /streaming/WordCount.jar" >>>>>> able to run wordcount, but where can i see which input consider and >>>>>> output >>>>>> generated? >>>>>> >>>>>> and how can i specify the input and output paths? >>>>>> >>>>>> I'm trying to understand how the wordcount will work using Multi-node >>>>>> Cluster.? >>>>>> >>>>>> any suggestions will help me further understanding? >>>>>> >>>>>> Thanks & Regards, >>>>>> Ramanji. >>>>>> >>>>>> >>>>>> >>>>>> > |
Yes Felix,
I have created input and output files in HDFS. http://localhost:50070/explorer.html#/user But how we can access it ? bin/flink run ./examples/streaming/WordCount.jar --input *hdfs:///user/wordcount_input.txt* --output *hdfs:///user/wordcount_output.txt * On Tue, Aug 8, 2017 at 2:55 PM, Felix Neutatz <[hidden email]> wrote: > Hi, > > like Timo said e.g. you need a distributed filesystem like HDFS. > > Best regards, > Felix > > On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" <[hidden email]> wrote: > > Hi Timo, > > How to make access the files across TM? > > Thanks & Regards, > Ramanji. > > On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther <[hidden email]> wrote: > > > Flink is a distributed software for clusters. You need something like a > > distributed file system. So that input file and output files can be > > accessed from all nodes. > > > > Each TM has a log directory where the execution logs are stored. > > > > You can set additional properties to your output format by importing the > > code in your IDE. > > > > Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy: > > > > Hi Timo, > >> Problem is resolved after copy input file to all tasks managers. > >> > >> and where should generate outputfile? Is it in jobmanager or task > manager? > >> > >> Where can i see the execution logs to understand how word count done > each > >> task manager? > >> > >> > >> By the way any option to overwride...? > >> > >> 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to > >> FAILED > >> java.io.IOException: File or directory already exists. Existing files > and > >> directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE mode > >> to > >> overwrite existing files and directories. > >> at > >> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileS > >> ystem.java:763) > >> at > >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutP > >> athLocalFS(SafetyNetWrapperFileSystem.java:135) > >> at > >> org.apache.flink.api.common.io.FileOutputFormat.open(FileOut > >> putFormat.java:231) > >> at > >> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutpu > >> tFormat.java:78) > >> at > >> org.apache.flink.streaming.api.functions.sink.OutputFormatSi > >> nkFunction.open(OutputFormatSinkFunction.java:61) > >> at > >> org.apache.flink.api.common.functions.util.FunctionUtils.ope > >> nFunction(FunctionUtils.java:36) > >> at > >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOp > >> erator.open(AbstractUdfStreamOperator.java:111) > >> at > >> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllO > >> perators(StreamTask.java:376) > >> at > >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( > >> StreamTask.java:253) > >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > >> at java.lang.Thread.run(Thread.java:745) > >> > >> > >> On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <[hidden email]> > wrote: > >> > >> Make sure that the file exists and is accessible from all Flink tasks > >>> managers. > >>> > >>> > >>> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy: > >>> > >>> Thank you Timo. > >>>> > >>>> > >>>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust > >>>> er/rama/flink$ > >>>> *./bin/flink > >>>> run ./examples/streaming/WordCount.jar --input > >>>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_o > >>>> ut* > >>>> > >>>> > >>>> > >>>> Execution of worcountjar gives error... > >>>> > >>>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to FAILED > >>>> java.io.FileNotFoundException: The provided file path > >>>> file:/home/root1/hamlet.txt does not exist. > >>>> at > >>>> org.apache.flink.streaming.api.functions.source.ContinuousFi > >>>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) > >>>> at > >>>> org.apache.flink.streaming.api.operators.StreamSource.run( > >>>> StreamSource.java:87) > >>>> at > >>>> org.apache.flink.streaming.api.operators.StreamSource.run( > >>>> StreamSource.java:55) > >>>> at > >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask. > >>>> run(SourceStreamTask.java:95) > >>>> at > >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( > >>>> StreamTask.java:263) > >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) > >>>> at java.lang.Thread.run(Thread.java:748) > >>>> > >>>> > >>>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[hidden email]> > >>>> wrote: > >>>> > >>>> Hi Ramanji, > >>>> > >>>>> you can find the source code of the examples here: > >>>>> https://github.com/apache/flink/blob/master/flink-examples/ > >>>>> flink-examples-streaming/src/main/java/org/apache/flink/ > >>>>> streaming/examples/wordcount/WordCount.java > >>>>> > >>>>> A general introduction how the cluster execution works can be found > >>>>> here: > >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ > >>>>> concepts/programming-model.html#programs-and-dataflows > >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ > >>>>> concepts/runtime.html > >>>>> > >>>>> It might also be helpful to have a look at the web interface which > can > >>>>> show you a nice graph of the job. > >>>>> > >>>>> I hope this helps. Feel free to ask further questions. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: > >>>>> > >>>>> Hello Everyone, > >>>>> > >>>>> I have followed the steps specified below link to Install & Run > Apache > >>>>>> Flink on Multi-node Cluster. > >>>>>> > >>>>>> http://data-flair.training/blogs/install-run-deploy-flink- > >>>>>> multi-node-cluster/ > >>>>>> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install > >>>>>> > >>>>>> using the command > >>>>>> " bin/flink run > >>>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples > >>>>>> /streaming/WordCount.jar" > >>>>>> able to run wordcount, but where can i see which input consider and > >>>>>> output > >>>>>> generated? > >>>>>> > >>>>>> and how can i specify the input and output paths? > >>>>>> > >>>>>> I'm trying to understand how the wordcount will work using > Multi-node > >>>>>> Cluster.? > >>>>>> > >>>>>> any suggestions will help me further understanding? > >>>>>> > >>>>>> Thanks & Regards, > >>>>>> Ramanji. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > > > |
Basically I have added input and outputfiles to HDFS.
But how to specify access the input file in commandline to run wordcount program? bin/flink run ./examples/streaming/WordCount.jar --input *hdfs:///user/wordcount_input.txt* --output *hdfs:///user/wordcount_output.txt * On Tue, Aug 8, 2017 at 3:48 PM, P. Ramanjaneya Reddy <[hidden email]> wrote: > Yes Felix, > > I have created input and output files in HDFS. > http://localhost:50070/explorer.html#/user > > > But how we can access it ? > > bin/flink run ./examples/streaming/WordCount.jar --input > *hdfs:///user/wordcount_input.txt* --output > *hdfs:///user/wordcount_output.txt * > > > > On Tue, Aug 8, 2017 at 2:55 PM, Felix Neutatz <[hidden email]> > wrote: > >> Hi, >> >> like Timo said e.g. you need a distributed filesystem like HDFS. >> >> Best regards, >> Felix >> >> On Aug 8, 2017 09:01, "P. Ramanjaneya Reddy" <[hidden email]> >> wrote: >> >> Hi Timo, >> >> How to make access the files across TM? >> >> Thanks & Regards, >> Ramanji. >> >> On Mon, Aug 7, 2017 at 7:45 PM, Timo Walther <[hidden email]> wrote: >> >> > Flink is a distributed software for clusters. You need something like a >> > distributed file system. So that input file and output files can be >> > accessed from all nodes. >> > >> > Each TM has a log directory where the execution logs are stored. >> > >> > You can set additional properties to your output format by importing the >> > code in your IDE. >> > >> > Am 07.08.17 um 16:03 schrieb P. Ramanjaneya Reddy: >> > >> > Hi Timo, >> >> Problem is resolved after copy input file to all tasks managers. >> >> >> >> and where should generate outputfile? Is it in jobmanager or task >> manager? >> >> >> >> Where can i see the execution logs to understand how word count done >> each >> >> task manager? >> >> >> >> >> >> By the way any option to overwride...? >> >> >> >> 08/07/2017 19:27:00 Keyed Aggregation -> Sink: Unnamed(1/1) switched to >> >> FAILED >> >> java.io.IOException: File or directory already exists. Existing files >> and >> >> directories are not overwritten in NO_OVERWRITE mode. Use OVERWRITE >> mode >> >> to >> >> overwrite existing files and directories. >> >> at >> >> org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileS >> >> ystem.java:763) >> >> at >> >> org.apache.flink.core.fs.SafetyNetWrapperFileSystem.initOutP >> >> athLocalFS(SafetyNetWrapperFileSystem.java:135) >> >> at >> >> org.apache.flink.api.common.io.FileOutputFormat.open(FileOut >> >> putFormat.java:231) >> >> at >> >> org.apache.flink.api.java.io.TextOutputFormat.open(TextOutpu >> >> tFormat.java:78) >> >> at >> >> org.apache.flink.streaming.api.functions.sink.OutputFormatSi >> >> nkFunction.open(OutputFormatSinkFunction.java:61) >> >> at >> >> org.apache.flink.api.common.functions.util.FunctionUtils.ope >> >> nFunction(FunctionUtils.java:36) >> >> at >> >> org.apache.flink.streaming.api.operators.AbstractUdfStreamOp >> >> erator.open(AbstractUdfStreamOperator.java:111) >> >> at >> >> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllO >> >> perators(StreamTask.java:376) >> >> at >> >> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> >> StreamTask.java:253) >> >> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> >> >> On Mon, Aug 7, 2017 at 6:49 PM, Timo Walther <[hidden email]> >> wrote: >> >> >> >> Make sure that the file exists and is accessible from all Flink tasks >> >>> managers. >> >>> >> >>> >> >>> Am 07.08.17 um 14:35 schrieb P. Ramanjaneya Reddy: >> >>> >> >>> Thank you Timo. >> >>>> >> >>>> >> >>>> root1@root1-HP-EliteBook-840-G2:~/NAI/Tools/BEAM/Flink_Clust >> >>>> er/rama/flink$ >> >>>> *./bin/flink >> >>>> run ./examples/streaming/WordCount.jar --input >> >>>> file:///home/root1/hamlet.txt --output file:///home/root1/wordcount_o >> >>>> ut* >> >>>> >> >>>> >> >>>> >> >>>> Execution of worcountjar gives error... >> >>>> >> >>>> 08/07/2017 18:03:16 Source: Custom File Source(1/1) switched to >> FAILED >> >>>> java.io.FileNotFoundException: The provided file path >> >>>> file:/home/root1/hamlet.txt does not exist. >> >>>> at >> >>>> org.apache.flink.streaming.api.functions.source.ContinuousFi >> >>>> leMonitoringFunction.run(ContinuousFileMonitoringFunction.java:192) >> >>>> at >> >>>> org.apache.flink.streaming.api.operators.StreamSource.run( >> >>>> StreamSource.java:87) >> >>>> at >> >>>> org.apache.flink.streaming.api.operators.StreamSource.run( >> >>>> StreamSource.java:55) >> >>>> at >> >>>> org.apache.flink.streaming.runtime.tasks.SourceStreamTask. >> >>>> run(SourceStreamTask.java:95) >> >>>> at >> >>>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke( >> >>>> StreamTask.java:263) >> >>>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702) >> >>>> at java.lang.Thread.run(Thread.java:748) >> >>>> >> >>>> >> >>>> On Mon, Aug 7, 2017 at 5:56 PM, Timo Walther <[hidden email]> >> >>>> wrote: >> >>>> >> >>>> Hi Ramanji, >> >>>> >> >>>>> you can find the source code of the examples here: >> >>>>> https://github.com/apache/flink/blob/master/flink-examples/ >> >>>>> flink-examples-streaming/src/main/java/org/apache/flink/ >> >>>>> streaming/examples/wordcount/WordCount.java >> >>>>> >> >>>>> A general introduction how the cluster execution works can be found >> >>>>> here: >> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >> >>>>> concepts/programming-model.html#programs-and-dataflows >> >>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.4/ >> >>>>> concepts/runtime.html >> >>>>> >> >>>>> It might also be helpful to have a look at the web interface which >> can >> >>>>> show you a nice graph of the job. >> >>>>> >> >>>>> I hope this helps. Feel free to ask further questions. >> >>>>> >> >>>>> Regards, >> >>>>> Timo >> >>>>> >> >>>>> >> >>>>> Am 07.08.17 um 14:00 schrieb P. Ramanjaneya Reddy: >> >>>>> >> >>>>> Hello Everyone, >> >>>>> >> >>>>> I have followed the steps specified below link to Install & Run >> Apache >> >>>>>> Flink on Multi-node Cluster. >> >>>>>> >> >>>>>> http://data-flair.training/blogs/install-run-deploy-flink- >> >>>>>> multi-node-cluster/ >> >>>>>> used flink-1.3.2-bin-hadoop27-scala_2.10.tgz for install >> >>>>>> >> >>>>>> using the command >> >>>>>> " bin/flink run >> >>>>>> /home/root1/NAI/Tools/BEAM/Flink_Cluster/rama/flink/examples >> >>>>>> /streaming/WordCount.jar" >> >>>>>> able to run wordcount, but where can i see which input consider and >> >>>>>> output >> >>>>>> generated? >> >>>>>> >> >>>>>> and how can i specify the input and output paths? >> >>>>>> >> >>>>>> I'm trying to understand how the wordcount will work using >> Multi-node >> >>>>>> Cluster.? >> >>>>>> >> >>>>>> any suggestions will help me further understanding? >> >>>>>> >> >>>>>> Thanks & Regards, >> >>>>>> Ramanji. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> > >> > > |
Free forum by Nabble | Edit this page |