Hej,
I have a dataset of StringID's and I want to map them to Longs by using a hash function. I will use the LongID's in a series of Iterative computations and then map back to StringID's. Currently I have a map operation that creates tuples with the string and the long. I have an other mapper cleaning out the String's. Is there a way to do a operation that allows for more the one output set (basically split a set into 2 sets)? This would reduce the complexity of the code a lot. Also how does the optimizer deal with this case? Does it join both map operation's together and actually run it as if it would be a split? cheers Martin |
Hey Martin,
On 27 Jul 2014, at 12:56, Martin Neumann <[hidden email]> wrote: > Is there a way to do a operation that allows for more the one output set > (basically split a set into 2 sets)? This would reduce the complexity of > the code a lot. What exactly do you mean with split? I am not sure if this is what you want, but you can just apply two transformations on the same input data set. DataSet<String> input = ...; DataSet<String> firstSet = input.map(...) DataSet<String> secondSet = input.map(...) Does this help? |
i think this is what martin is currently doing:
StringIDs --map-> (StringIDs,LongIDs) --map-> LongIDs and he wants to use both the second and third set. he asks for a way to replace the second map operation. (since it seems unnecessary to create an extra map for that) i believe the appropriate way would be to use projections instead of a map operation. something like: mapped = stringIDs.map(...) longids = mapped.project(1).types(Long) you would end up with a Tuple1 set though. On 27.7.2014 13:21, Ufuk Celebi wrote: > Hey Martin, > > On 27 Jul 2014, at 12:56, Martin Neumann <[hidden email]> wrote: > >> Is there a way to do a operation that allows for more the one output set >> (basically split a set into 2 sets)? This would reduce the complexity of >> the code a lot. > What exactly do you mean with split? > > I am not sure if this is what you want, but you can just apply two transformations on the same input data set. > > DataSet<String> input = ...; > > DataSet<String> firstSet = input.map(...) > > DataSet<String> secondSet = input.map(...) > > Does this help? |
Hi!
"Splitting", in the sense that one function returns two different data sets, is currently not supported. I guess you have to go with Ufuk's suggestion. IN your case, I guess it would look somewhat like this: DataSet<Tuple2<Long, String>> mapped = ogiginalStrings.map(HashIdMapper()); DataSet<Long> ids = mapped.map(new ProjectTo2()); DataSet<Long> result = ids.runTheGraphAlgorithm(...) result.join(mapped).where(...).equalTo(...).with(new MapBackToStrings()); Greetings, Stephan |
Hey!
A similar issue has arisen in different context. We should solve both problems homogeneously. Can you participate in the discussion here: https://issues.apache.org/jira/browse/FLINK-87 Greetings, Stephan On Mon, Jul 28, 2014 at 3:42 PM, Stephan Ewen <[hidden email]> wrote: > Hi! > > "Splitting", in the sense that one function returns two different data > sets, is currently not supported. > > I guess you have to go with Ufuk's suggestion. IN your case, I guess it > would look somewhat like this: > > > DataSet<Tuple2<Long, String>> mapped = ogiginalStrings.map(HashIdMapper()); > > DataSet<Long> ids = mapped.map(new ProjectTo2()); > > DataSet<Long> result = ids.runTheGraphAlgorithm(...) > > result.join(mapped).where(...).equalTo(...).with(new MapBackToStrings()); > > > Greetings, > Stephan > |
Free forum by Nabble | Edit this page |