[GitHub] incubator-flink pull request: ISSUE #827 fix

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix

zentol
GitHub user tobwiens opened a pull request:

    https://github.com/apache/incubator-flink/pull/45

    ISSUE #827 fix

    Adding WordCountPLOJO example. It demonstrates how to use KeySelectors
    in a word count example.
    Removing comments and adding file support so that this example is
    executable without additional data. While providing the functionality of
    adding input and output files.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tobwiens/incubator-flink FLINK-827-FIX

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-flink/pull/45.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #45
   
----
commit 04ce6ef0ab9fe34ddee7eabe3c592e9f20b02977
Author: TobiasWiens <[hidden email]>
Date:   2014-06-25T16:03:44Z

    ISSUE 827 fix
   
    Adding WordCountPLOJO example. It demonstrates how to use KeySelectors
    in a word count example.
    Removing comments and adding file support so that this example is
    executable without additional data. While providing the functionality of
    adding input and output files.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix

zentol
Github user tobwiens commented on the pull request:

    https://github.com/apache/incubator-flink/pull/45#issuecomment-47122697
 
    FLINK-827


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix FLINK-827

zentol
In reply to this post by zentol
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/45#issuecomment-47134091
 
    Hey, I think you got the name POJO wrong, when I explained you the task: https://en.wikipedia.org/wiki/Plain_Old_Java_Object



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix FLINK-827

zentol
In reply to this post by zentol
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/45#discussion_r14200697
 
    --- Diff: stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/wordcount/WordCountPLOJO.java ---
    @@ -0,0 +1,183 @@
    +/***********************************************************************************************************************
    + *
    + * Copyright (C) 2010-2013 by the Stratosphere project (http://stratosphere.eu)
    + *
    + * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    + * the License. You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    + * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    + * specific language governing permissions and limitations under the License.
    + *
    + **********************************************************************************************************************/
    +package eu.stratosphere.example.java.wordcount;
    +
    +import eu.stratosphere.api.java.DataSet;
    +import eu.stratosphere.api.java.ExecutionEnvironment;
    +import eu.stratosphere.api.java.functions.FlatMapFunction;
    +import eu.stratosphere.api.java.functions.KeySelector;
    +import eu.stratosphere.api.java.functions.ReduceFunction;
    +import eu.stratosphere.util.Collector;
    +
    +
    +
    +/**
    + * Implements a "WordCount" program that computes a simple word occurrence histogram
    + * over hard coded examples or text files. This example demonstrates how to use KeySelectors, ReduceFunction and FlatMapFunction.
    + */
    +@SuppressWarnings("serial")
    +public class WordCountPLOJO {
    +
    + /**
    + * Runs the WordCount program.
    + *
    + * @param args Input and output file.
    + */
    + public static void main(String[] args) throws Exception {
    + // Check whether arguments are given and tell user how to use this example with files.
    + if (args.length < 2) {
    + System.out.println("You can specify: WordCountPLOJO <input path> <result path>, in order to work with files.");
    + }
    +
    + // Input and output path [optional]
    + String inputPath = null;
    + String outputPath = null;
    +
    + // Get the environment as starting point
    + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +
    + // Read the text file from given input path or hard coded
    + DataSet<String> text = null;
    + try {
    + inputPath = args[0];
    + env.readTextFile(inputPath);
    + }
    + catch(Exception e) {
    --- End diff --
   
    In general, exceptions should be used for error handling, not for application logic (in my understanding).
    Have a look at the WordCount example: https://github.com/apache/incubator-flink/blob/master/stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/wordcount/WordCount.java
    it solves this issue differently.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix FLINK-827

zentol
In reply to this post by zentol
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/45#discussion_r14200705
 
    --- Diff: stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/wordcount/WordCountPLOJO.java ---
    @@ -0,0 +1,183 @@
    +/***********************************************************************************************************************
    + *
    + * Copyright (C) 2010-2013 by the Stratosphere project (http://stratosphere.eu)
    + *
    + * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    + * the License. You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    + * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    + * specific language governing permissions and limitations under the License.
    + *
    + **********************************************************************************************************************/
    +package eu.stratosphere.example.java.wordcount;
    +
    +import eu.stratosphere.api.java.DataSet;
    +import eu.stratosphere.api.java.ExecutionEnvironment;
    +import eu.stratosphere.api.java.functions.FlatMapFunction;
    +import eu.stratosphere.api.java.functions.KeySelector;
    +import eu.stratosphere.api.java.functions.ReduceFunction;
    +import eu.stratosphere.util.Collector;
    +
    +
    +
    +/**
    + * Implements a "WordCount" program that computes a simple word occurrence histogram
    + * over hard coded examples or text files. This example demonstrates how to use KeySelectors, ReduceFunction and FlatMapFunction.
    + */
    +@SuppressWarnings("serial")
    +public class WordCountPLOJO {
    +
    + /**
    + * Runs the WordCount program.
    + *
    + * @param args Input and output file.
    + */
    + public static void main(String[] args) throws Exception {
    + // Check whether arguments are given and tell user how to use this example with files.
    + if (args.length < 2) {
    + System.out.println("You can specify: WordCountPLOJO <input path> <result path>, in order to work with files.");
    + }
    +
    + // Input and output path [optional]
    + String inputPath = null;
    + String outputPath = null;
    +
    + // Get the environment as starting point
    + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +
    + // Read the text file from given input path or hard coded
    + DataSet<String> text = null;
    + try {
    + inputPath = args[0];
    + env.readTextFile(inputPath);
    + }
    + catch(Exception e) {
    + System.out.println("No input file specified. Using hard coded example.");
    + text = env.fromElements("To be", "or not to be", "or to be still", "and certainly not to be not at all", "is that the question?");
    + }
    +
    + // Split up the lines in pairs (2-tuples) containing: (word,1)
    + DataSet<CustomizedWord> words = text.flatMap(new Tokenizer());
    +
    + // Create KeySelector to be able to group CustomizedWord
    + CustomizedWordKeySelector keySelector = new CustomizedWordKeySelector();
    +
    + // Instantiate customized reduce function
    + CustomizedWordReducer reducer = new CustomizedWordReducer();
    +
    + // Group by the tuple field "0" and sum up tuple field "1"
    + DataSet<CustomizedWord> result = words.groupBy(keySelector).reduce(reducer);
    +
    + // Print result
    + try {
    + outputPath = args[1];
    + // write out the result
    + result.writeAsText(outputPath);
    + }
    + catch(Exception e) {
    --- End diff --
   
    same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix FLINK-827

zentol
In reply to this post by zentol
Github user rmetzger commented on a diff in the pull request:

    https://github.com/apache/incubator-flink/pull/45#discussion_r14200819
 
    --- Diff: stratosphere-examples/stratosphere-java-examples/src/main/java/eu/stratosphere/example/java/wordcount/WordCountPLOJO.java ---
    @@ -0,0 +1,183 @@
    +/***********************************************************************************************************************
    + *
    + * Copyright (C) 2010-2013 by the Stratosphere project (http://stratosphere.eu)
    + *
    + * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
    + * the License. You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
    + * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
    + * specific language governing permissions and limitations under the License.
    + *
    + **********************************************************************************************************************/
    +package eu.stratosphere.example.java.wordcount;
    +
    +import eu.stratosphere.api.java.DataSet;
    +import eu.stratosphere.api.java.ExecutionEnvironment;
    +import eu.stratosphere.api.java.functions.FlatMapFunction;
    +import eu.stratosphere.api.java.functions.KeySelector;
    +import eu.stratosphere.api.java.functions.ReduceFunction;
    +import eu.stratosphere.util.Collector;
    +
    +
    +
    +/**
    + * Implements a "WordCount" program that computes a simple word occurrence histogram
    + * over hard coded examples or text files. This example demonstrates how to use KeySelectors, ReduceFunction and FlatMapFunction.
    + */
    +@SuppressWarnings("serial")
    +public class WordCountPLOJO {
    +
    + /**
    + * Runs the WordCount program.
    + *
    + * @param args Input and output file.
    + */
    + public static void main(String[] args) throws Exception {
    + // Check whether arguments are given and tell user how to use this example with files.
    + if (args.length < 2) {
    + System.out.println("You can specify: WordCountPLOJO <input path> <result path>, in order to work with files.");
    + }
    +
    + // Input and output path [optional]
    + String inputPath = null;
    + String outputPath = null;
    +
    + // Get the environment as starting point
    + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    +
    + // Read the text file from given input path or hard coded
    + DataSet<String> text = null;
    + try {
    + inputPath = args[0];
    + env.readTextFile(inputPath);
    + }
    + catch(Exception e) {
    + System.out.println("No input file specified. Using hard coded example.");
    + text = env.fromElements("To be", "or not to be", "or to be still", "and certainly not to be not at all", "is that the question?");
    + }
    +
    + // Split up the lines in pairs (2-tuples) containing: (word,1)
    + DataSet<CustomizedWord> words = text.flatMap(new Tokenizer());
    +
    + // Create KeySelector to be able to group CustomizedWord
    + CustomizedWordKeySelector keySelector = new CustomizedWordKeySelector();
    +
    + // Instantiate customized reduce function
    + CustomizedWordReducer reducer = new CustomizedWordReducer();
    +
    + // Group by the tuple field "0" and sum up tuple field "1"
    + DataSet<CustomizedWord> result = words.groupBy(keySelector).reduce(reducer);
    --- End diff --
   
    I think its nicer to do this "inline". Similar to the other wordcount example
    ```java
    words.groupBy(new CustomizedWordKeySelector())
      .reduce(new CustomizedWordReducer());
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix FLINK-827

zentol
In reply to this post by zentol
Github user tobwiens commented on the pull request:

    https://github.com/apache/incubator-flink/pull/45#issuecomment-47193828
 
    Thank you, I was sure to have understood PLOJO ;).
   
    The classes are instantiated inline now.
   
    I will squash the commits if everything is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix FLINK-827

zentol
In reply to this post by zentol
Github user rmetzger commented on the pull request:

    https://github.com/apache/incubator-flink/pull/45#issuecomment-47209731
 
    Can you rename the example to WordCountKeyExtractor


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix FLINK-827

zentol
In reply to this post by zentol
Github user twalthr commented on the pull request:

    https://github.com/apache/incubator-flink/pull/45#issuecomment-47210845
 
    I think WordCountKeySelector would be more consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---
Reply | Threaded
Open this post in threaded view
|

[GitHub] incubator-flink pull request: ISSUE #827 fix FLINK-827

zentol
In reply to this post by zentol
Github user tobwiens commented on the pull request:

    https://github.com/apache/incubator-flink/pull/45#issuecomment-47318605
 
    I think WordCountKeySelector is good because the example focuses on the usage of the KeySelector.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [hidden email] or file a JIRA ticket
with INFRA.
---