forwarding to the new mailing list ...
---------- Forwarded message ---------- From: Fabian Hueske <[hidden email]> Date: Wed, Jun 18, 2014 at 11:20 AM Subject: Re: [stratosphere-users] FlatJoin implementation To: "[hidden email]" < [hidden email]> Hi Asterios, this mailing list is no longer used. All mails (user + dev) should go to the [hidden email] mailing list. To the topic: This feature has been requested by quite a few people. So I think it makes sense to provide this interface (plus joinFilter). The same applies to Cross which is less often used though... A less confusing workaround could be to use join.project() an flatMap. Cheers, Fabian 2014-06-18 11:10 GMT+02:00 Asterios Katsifodimos <[hidden email]>: Hi, > > I've noticed that the join implementation supports a collector in the > sense that it can become a "flat" join. > > The result is that we have to implement something like this that is kind > of ugly: > public static final class VertexComponentIDProjectorWithFilter extends > JoinFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>>{ > @Override > public void join(Tuple2<Long, Long> first, Tuple2<Long, Long> second, > Collector<Tuple2<Long, Long>> out) throws Exception { > if(first.f1 < second.f1){ > out.collect(new Tuple2<Long,Long>(first.f0, first.f1)); > } > else{ > out.collect(second); > } > } > > @Override > public Tuple2<Long, Long> join(Tuple2<Long, Long> first, > Tuple2<Long, Long> second) throws Exception { > return null; > } > } > > > A first comment on the above code is that the developer has to provide a > null-returning, non-collector default join function. This makes code ugly > and introduces a confusion: which of the two is going to be actually > executed? Shouldn't there be a "Flatjoin" operator that would be > "semantically correct"? Or would it complicate developer's life? > > Cheers, > Asterios > > -- > You received this message because you are subscribed to the Google Groups > "stratosphere-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [hidden email]. > Visit this group at http://groups.google.com/group/stratosphere-users. > To view this discussion on the web visit > https://groups.google.com/d/msgid/stratosphere-users/7b905c5a-7ec6-4e9d-a284-5b6ae3b0b977%40googlegroups.com > <https://groups.google.com/d/msgid/stratosphere-users/7b905c5a-7ec6-4e9d-a284-5b6ae3b0b977%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "stratosphere-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. Visit this group at http://groups.google.com/group/stratosphere-users. To view this discussion on the web visit https://groups.google.com/d/msgid/stratosphere-users/CAAdrtT2t7iKUbDxswFND9VJKv0sF%3Da1C6s_0Whojo4Bmtv94Vg%40mail.gmail.com <https://groups.google.com/d/msgid/stratosphere-users/CAAdrtT2t7iKUbDxswFND9VJKv0sF%3Da1C6s_0Whojo4Bmtv94Vg%40mail.gmail.com?utm_medium=email&utm_source=footer> . For more options, visit https://groups.google.com/d/optout. |
On 18 Jun 2014, at 13:30, Robert Metzger <[hidden email]> wrote: > ---------- Forwarded message ---------- > From: Fabian Hueske <[hidden email]> > Date: Wed, Jun 18, 2014 at 11:20 AM > Subject: Re: [stratosphere-users] FlatJoin implementation > To: "[hidden email]" < > [hidden email]> > > To the topic: This feature has been requested by quite a few people. So I > think it makes sense to provide this interface (plus joinFilter). The same > applies to Cross which is less often used though... +1 > A less confusing workaround could be to use join.project() an flatMap. Do you mean join.project() and flatMap as a workaround for a flatJoin()? That will not work, will it? |
Why not?
You do data1.join(data2).where(0).equalTo(0).projectFirst(0,1).projectSecond(1).types(Long.class, Long.class, Long.class).flatMap(new MyFM()) The flatMap MyFM function works on Tuple3<Long, Long, Long> and not on a Tuple2<Tuple2<Long,Long>, Tuple2<Long,Long>. 2014-06-18 13:34 GMT+02:00 Ufuk Celebi <[hidden email]>: > > On 18 Jun 2014, at 13:30, Robert Metzger <[hidden email]> wrote: > > ---------- Forwarded message ---------- > > From: Fabian Hueske <[hidden email]> > > Date: Wed, Jun 18, 2014 at 11:20 AM > > Subject: Re: [stratosphere-users] FlatJoin implementation > > To: "[hidden email]" < > > [hidden email]> > > > > To the topic: This feature has been requested by quite a few people. So I > > think it makes sense to provide this interface (plus joinFilter). The > same > > applies to Cross which is less often used though... > > +1 > > > A less confusing workaround could be to use join.project() an flatMap. > > Do you mean join.project() and flatMap as a workaround for a flatJoin()? > That will not work, will it? |
On 18 Jun 2014, at 14:31, Fabian Hueske <[hidden email]> wrote: > Why not? > You do > > data1.join(data2).where(0).equalTo(0).projectFirst(0,1).projectSecond(1).types(Long.class, > Long.class, Long.class).flatMap(new MyFM()) What if I want the join to collect the left side, right side and union? But I'm not sure how realistic this use case is and if it is what Asterios had in mind? |
What do you mean by "collect the left side" and "union"? If you want to
collect all elements of the left/right side grouped by key, you should use CoGroup. For union you use the Union transformation. If you want to "union" the fields of the inputs you use the projectJoin. The question is about the output of a join function. Right know, it must return exactly one element. A FlatJoin function would allow to return 0, 1, or n elements. A ProjectJoin or DefaultJoin and a following FlatMap realizes the same functionality, but is not as nice to use as a FlatJoin. 2014-06-18 14:38 GMT+02:00 Ufuk Celebi <[hidden email]>: > > On 18 Jun 2014, at 14:31, Fabian Hueske <[hidden email]> wrote: > > > Why not? > > You do > > > > > data1.join(data2).where(0).equalTo(0).projectFirst(0,1).projectSecond(1).types(Long.class, > > Long.class, Long.class).flatMap(new MyFM()) > > What if I want the join to collect the left side, right side and union? > But I'm not sure how realistic this use case is and if it is what Asterios > had in mind? |
Free forum by Nabble | Edit this page |