Fwd: [stratosphere-users] FlatJoin implementation

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Fwd: [stratosphere-users] FlatJoin implementation

Robert Metzger
forwarding to the new mailing list ...

---------- Forwarded message ----------
From: Fabian Hueske <[hidden email]>
Date: Wed, Jun 18, 2014 at 11:20 AM
Subject: Re: [stratosphere-users] FlatJoin implementation
To: "[hidden email]" <
[hidden email]>


Hi Asterios,

this mailing list is no longer used.
All mails (user + dev) should go to the [hidden email]
mailing list.

To the topic: This feature has been requested by quite a few people. So I
think it makes sense to provide this interface (plus joinFilter). The same
applies to Cross which is less often used though...
A less confusing workaround could be to use join.project() an flatMap.

Cheers, Fabian


2014-06-18 11:10 GMT+02:00 Asterios Katsifodimos <[hidden email]>:

Hi,

>
> I've noticed that the join implementation supports a collector in the
> sense that it can become a "flat" join.
>
> The result is that we have to implement something like this that is kind
> of ugly:
> public static final class VertexComponentIDProjectorWithFilter extends
> JoinFunction<Tuple2<Long, Long>, Tuple2<Long, Long>, Tuple2<Long, Long>>{
>  @Override
> public void join(Tuple2<Long, Long> first, Tuple2<Long, Long> second,
> Collector<Tuple2<Long, Long>> out) throws Exception {
>  if(first.f1 < second.f1){
>  out.collect(new Tuple2<Long,Long>(first.f0, first.f1));
> }
>  else{
> out.collect(second);
>  }
> }
>
> @Override
> public Tuple2<Long, Long> join(Tuple2<Long, Long> first,
>  Tuple2<Long, Long> second) throws Exception {
> return null;
>  }
>  }
>
>
> A first comment on the above code is that the developer has to provide a
> null-returning, non-collector default join function. This makes code ugly
> and introduces a confusion: which of the two is going to be actually
> executed? Shouldn't there be a "Flatjoin" operator that would be
> "semantically correct"? Or would it complicate developer's life?
>
> Cheers,
> Asterios
>
> --
> You received this message because you are subscribed to the Google Groups
> "stratosphere-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [hidden email].
> Visit this group at http://groups.google.com/group/stratosphere-users.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/stratosphere-users/7b905c5a-7ec6-4e9d-a284-5b6ae3b0b977%40googlegroups.com
> <https://groups.google.com/d/msgid/stratosphere-users/7b905c5a-7ec6-4e9d-a284-5b6ae3b0b977%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

 --
You received this message because you are subscribed to the Google Groups
"stratosphere-users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [hidden email].
Visit this group at http://groups.google.com/group/stratosphere-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/stratosphere-users/CAAdrtT2t7iKUbDxswFND9VJKv0sF%3Da1C6s_0Whojo4Bmtv94Vg%40mail.gmail.com
<https://groups.google.com/d/msgid/stratosphere-users/CAAdrtT2t7iKUbDxswFND9VJKv0sF%3Da1C6s_0Whojo4Bmtv94Vg%40mail.gmail.com?utm_medium=email&utm_source=footer>
.

For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: [stratosphere-users] FlatJoin implementation

Ufuk Celebi

On 18 Jun 2014, at 13:30, Robert Metzger <[hidden email]> wrote:

> ---------- Forwarded message ----------
> From: Fabian Hueske <[hidden email]>
> Date: Wed, Jun 18, 2014 at 11:20 AM
> Subject: Re: [stratosphere-users] FlatJoin implementation
> To: "[hidden email]" <
> [hidden email]>
>
> To the topic: This feature has been requested by quite a few people. So I
> think it makes sense to provide this interface (plus joinFilter). The same
> applies to Cross which is less often used though...

+1

> A less confusing workaround could be to use join.project() an flatMap.

Do you mean join.project() and flatMap as a workaround for a flatJoin()? That will not work, will it?
Reply | Threaded
Open this post in threaded view
|

Re: [stratosphere-users] FlatJoin implementation

Fabian Hueske
Why not?
You do

data1.join(data2).where(0).equalTo(0).projectFirst(0,1).projectSecond(1).types(Long.class,
Long.class, Long.class).flatMap(new MyFM())

The flatMap MyFM function works on Tuple3<Long, Long, Long> and not on a
Tuple2<Tuple2<Long,Long>, Tuple2<Long,Long>.


2014-06-18 13:34 GMT+02:00 Ufuk Celebi <[hidden email]>:

>
> On 18 Jun 2014, at 13:30, Robert Metzger <[hidden email]> wrote:
> > ---------- Forwarded message ----------
> > From: Fabian Hueske <[hidden email]>
> > Date: Wed, Jun 18, 2014 at 11:20 AM
> > Subject: Re: [stratosphere-users] FlatJoin implementation
> > To: "[hidden email]" <
> > [hidden email]>
> >
> > To the topic: This feature has been requested by quite a few people. So I
> > think it makes sense to provide this interface (plus joinFilter). The
> same
> > applies to Cross which is less often used though...
>
> +1
>
> > A less confusing workaround could be to use join.project() an flatMap.
>
> Do you mean join.project() and flatMap as a workaround for a flatJoin()?
> That will not work, will it?
Reply | Threaded
Open this post in threaded view
|

Re: [stratosphere-users] FlatJoin implementation

Ufuk Celebi

On 18 Jun 2014, at 14:31, Fabian Hueske <[hidden email]> wrote:

> Why not?
> You do
>
> data1.join(data2).where(0).equalTo(0).projectFirst(0,1).projectSecond(1).types(Long.class,
> Long.class, Long.class).flatMap(new MyFM())

What if I want the join to collect the left side, right side and union? But I'm not sure how realistic this use case is and if it is what Asterios had in mind?
Reply | Threaded
Open this post in threaded view
|

Re: [stratosphere-users] FlatJoin implementation

Fabian Hueske
What do you mean by "collect the left side" and "union"? If you want to
collect all elements of the left/right side grouped by key, you should use
CoGroup. For union you use the Union transformation. If you want to "union"
the fields of the inputs you use the projectJoin.

The question is about the output of a join function. Right know, it must
return exactly one element. A FlatJoin function would allow to return 0, 1,
or n elements. A ProjectJoin or DefaultJoin and a following FlatMap
realizes the same functionality, but is not as nice to use as a FlatJoin.


2014-06-18 14:38 GMT+02:00 Ufuk Celebi <[hidden email]>:

>
> On 18 Jun 2014, at 14:31, Fabian Hueske <[hidden email]> wrote:
>
> > Why not?
> > You do
> >
> >
> data1.join(data2).where(0).equalTo(0).projectFirst(0,1).projectSecond(1).types(Long.class,
> > Long.class, Long.class).flatMap(new MyFM())
>
> What if I want the join to collect the left side, right side and union?
> But I'm not sure how realistic this use case is and if it is what Asterios
> had in mind?