Key expressions vs case class fields

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Key expressions vs case class fields

Greg Hogan
Hi,

Looking at the documentation for "Transformations on Grouped DataSet" [1],
what differentiates a key expression from case class fields? Is there a
special Scala capability or are we still just passing strings?

[1]
https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html#transformations-on-grouped-dataset

Thanks,
Greg
Reply | Threaded
Open this post in threaded view
|

Re: Key expressions vs case class fields

Fabian Hueske-2
Hi Greg,

from a user's point of view, expression keys (dataSet.groupBy("_1")) and
selector function keys (dataSet.groupBy(_._1)) are very similar in a Scala
DataSet or DataStream program. This is due to Scala's shortcut for defining
lambda functions.

However, both key types are handled differently when the program is
executed. The expression key "_1" defines the logical position of the key
in the type of the data set. The key fields are accessed by a properly
configured TypeComparator. The lambda function _._1 is a shortcut for x =>
x._1 and is treated as a regular key selector function, i.e., during plan
translation we inject a MapFunction to evaluate the selector function and
extract the key.

Does this answer your question?

Best, Fabian

2016-02-22 19:18 GMT+01:00 Greg Hogan <[hidden email]>:

> Hi,
>
> Looking at the documentation for "Transformations on Grouped DataSet" [1],
> what differentiates a key expression from case class fields? Is there a
> special Scala capability or are we still just passing strings?
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/dataset_transformations.html#transformations-on-grouped-dataset
>
> Thanks,
> Greg
>