Duplicate sort keys

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicate sort keys

Greg Hogan
Is it correct to expect that Flink should remove duplicate sort keys? I'm
working on instrumenting the FixedLengthRecordSorter (FLINK-4705) and the
following test case from TypeHintITCase:200 is having an unexpected effect
due to the keyPositions = {0, 0} being passed to TupleComparator.

DataSet<Integer> resultDs = ds
      .groupBy(0)
      .sortGroup(0, Order.ASCENDING)
      .reduceGroup(new GroupReducer<Tuple3<Integer, Long, String>, Integer>())
      .returns(BasicTypeInfo.INT_TYPE_INFO);

The sortGroup will have no affect since only one key is presented to the
UDF at a time. Flink also makes no guarantees as to the order in which keys
are presented to the UDF, which are sorted per partition. I would also
expect repeat keys in groupBy to be ignored.

Greg
Reply | Threaded
Open this post in threaded view
|

Re: Duplicate sort keys

Fabian Hueske-2
Hi Greg,

IMO you are right. We should remove duplicate sort keys.

Best, Fabian

2016-10-03 16:04 GMT+02:00 Greg Hogan <[hidden email]>:

> Is it correct to expect that Flink should remove duplicate sort keys? I'm
> working on instrumenting the FixedLengthRecordSorter (FLINK-4705) and the
> following test case from TypeHintITCase:200 is having an unexpected effect
> due to the keyPositions = {0, 0} being passed to TupleComparator.
>
> DataSet<Integer> resultDs = ds
>       .groupBy(0)
>       .sortGroup(0, Order.ASCENDING)
>       .reduceGroup(new GroupReducer<Tuple3<Integer, Long, String>,
> Integer>())
>       .returns(BasicTypeInfo.INT_TYPE_INFO);
>
> The sortGroup will have no affect since only one key is presented to the
> UDF at a time. Flink also makes no guarantees as to the order in which keys
> are presented to the UDF, which are sorted per partition. I would also
> expect repeat keys in groupBy to be ignored.
>
> Greg
>