Hi Greg,
from a user's point of view, expression keys (dataSet.groupBy("_1")) and
selector function keys (dataSet.groupBy(_._1)) are very similar in a Scala
DataSet or DataStream program. This is due to Scala's shortcut for defining
lambda functions.
However, both key types are handled differently when the program is
executed. The expression key "_1" defines the logical position of the key
in the type of the data set. The key fields are accessed by a properly
configured TypeComparator. The lambda function _._1 is a shortcut for x =>
x._1 and is treated as a regular key selector function, i.e., during plan
translation we inject a MapFunction to evaluate the selector function and
extract the key.
Does this answer your question?
Best, Fabian
2016-02-22 19:18 GMT+01:00 Greg Hogan <
[hidden email]>: