Login  Register

Re: [stratosphere-dev] Grouping by a tuple

Posted by Robert Metzger on Jun 12, 2014; 7:53am
URL: http://deprecated-apache-flink-mailing-list-archive.368.s1.nabble.com/Fwd-stratosphere-dev-Grouping-by-a-tuple-tp40p55.html

+1 for opening a ticket.


On Thu, Jun 12, 2014 at 12:46 AM, Fabian Hueske <[hidden email]> wrote:

> I think the issue is rather grouping a DataSet of custom types on multiple
> fields than grouping a Tuple DataSet.
> In this case you need to use a KeySelector and would like to return a Tuple
> containing all fields you want to group on.
> But as Slava said the returning type must be comparable (which Tuples are
> not).
>
> I think it should be possible to check at optimization time whether all
> fields of a tuple are comparable and allow to use such tuples as a grouping
> key.
>
> Would be good to open a JIRA for this in any case. This is a common problem
> when working with POJOs.
>
>
> 2014-06-12 0:25 GMT+02:00 Robert Metzger <[hidden email]>:
>
> > Hi Slava,
> >
> > I'm forwarding your message to our new mailing list at Apache:
> > [hidden email]
> > You can subscribe to the list by sending an (empty) email to:
> > [hidden email].
> > We are planning to shut down the stratosphere-dev@googlegroups soon.
> >
> > Regarding your question: When using the Tuples, you don't need to
> specify a
> > keySelector. It is sufficient to specify the ID(s) of the keys:
> >
> >
> http://stratosphere-javadocs.github.io/eu/stratosphere/api/java/DataSet.html#groupBy(int
> > ..
> > .)
> > So you should be able to do a ".groupBy(0,3,4)"
> >
> > Robert
> >
> > ---------- Forwarded message ----------
> > From: Vyacheslav Zholudev <[hidden email]>
> > Date: Thu, Jun 12, 2014 at 12:17 AM
> > Subject: [stratosphere-dev] Grouping by a tuple
> > To: [hidden email]
> >
> >
> > Hi,
> >
> > Being used to the Hive grouping like "GROUP BY userId, productId, year"
> I'm
> > wondering what's the best way to do it in Stratosphere? The groupBy's
> > KeySelector implies that a Comparable object is returned, however, the
> > obvious choice like TupleN is not comparable. In primitive cases I would
> > prefer to avoid introducing comparable extra entities for grouping tuples
> > of "primitive" types. Would it make sense to introduce
> "ComparableTupleN<T1
> > extends Comparable<? extends T1>, ..., Tn extends Comparable<? extends
> > Tn>>"?
> >
> > Or am I missing the obvious way in a Stratosphere way?
> >
> > Thanks,
> > Vyacheslav
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "stratosphere-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to [hidden email].
> > Visit this group at http://groups.google.com/group/stratosphere-dev.
> > For more options, visit https://groups.google.com/d/optout.
> >
>