(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Make FieldAccessor logic consistent with remaining API

Classic

List

Threaded

5 messages Options

Fabian Hueske-2

[DISCUSS] Make FieldAccessor logic consistent with remaining API

Hi everybody,

while reviewing PR #2094 [1] I noticed that the field reference syntax for
FieldAccessors is not compatible with the syntax supported for key
definitions (ExpressionKeys) used in groupBy(), keyBy(),
join().where().equalTo(), etc.

FieldAccessors are only used for build-in aggregations in the DataStream
API (sum(), min(), max(), ...).

In particular I identified the following inconsistencies:

- FieldAccessors allow to address array cells. ExpressionKeys treat arrays
as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it is
not possible to address array cells.
- ExpressionKeys do only support Integer keys for tuples. An atomic type
can only be addressed with "*". FieldAccessors allow to address AtomicTypes
with 0 in addtion to "*".
- ExpressionKeys support to address fields of Java tuples with "f2" and
Scala tuple fields with "_3". FieldAccessors do not support the "f" or "_"
prefix.

I would like to propose to adapt the syntax of both mechanisms (ideally,
both should use the same code for validation). IMO, the ExpressionKey
syntax much more widely used and is well designed. Therefore, I would adopt
it for FieldAccessors as well. However, that would mean to restrict the
syntax of the FieldAccessors and might break existing code.

What do others think?

[1] https://github.com/apache/flink/pull/2094

Gábor Gévay

Re: [DISCUSS] Make FieldAccessor logic consistent with remaining API

Hello,

Thanks Fabian for starting this discussion.

I would just like to add a few thougths about why does the
FieldAccessors even exist. One could say that we should instead just
re-use ExpressionKeys for the aggregations, and we are done. As far as
I can see, the way ExpressionKeys is often used is roughly to call
computeLogicalKeyPositions on it, then call
CompositeType.createComparator with these key positions, and then we
can call extractKeys on the comparator to get the values of the fields
of a record.

However, the problem with this is that in the aggregations we also
need to write fields and not just read them. So with the above
procedure, we would need an "intractKey" method on the comparators
that would be the opposite of extractKeys, i.e. which would set the
key fields.

Another difference between the FieldAccessor world and the
ExpressionKeys world is that in the aggregations we need to deal with
only one field. This means that enhancing the comparators with the
exact opposite of extractKeys would be overkill, because we don't need
to set multiple fields at a time.

It certainly seems to be a good idea to unify these at least
externally (from the side of the users), but I'm not sure what would
be a neat way to also internally unify them.

Best,
Gábor

2016-10-26 20:54 GMT+02:00 Fabian Hueske <[hidden email]>:

> Hi everybody,
>
> while reviewing PR #2094 [1] I noticed that the field reference syntax for
> FieldAccessors is not compatible with the syntax supported for key
> definitions (ExpressionKeys) used in groupBy(), keyBy(),
> join().where().equalTo(), etc.
>
> FieldAccessors are only used for build-in aggregations in the DataStream
> API (sum(), min(), max(), ...).
>
> In particular I identified the following inconsistencies:
>
> - FieldAccessors allow to address array cells. ExpressionKeys treat arrays
> as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it is
> not possible to address array cells.
> - ExpressionKeys do only support Integer keys for tuples. An atomic type
> can only be addressed with "*". FieldAccessors allow to address AtomicTypes
> with 0 in addtion to "*".
> - ExpressionKeys support to address fields of Java tuples with "f2" and
> Scala tuple fields with "_3". FieldAccessors do not support the "f" or "_"
> prefix.
>
> I would like to propose to adapt the syntax of both mechanisms (ideally,
> both should use the same code for validation). IMO, the ExpressionKey
> syntax much more widely used and is well designed. Therefore, I would adopt
> it for FieldAccessors as well. However, that would mean to restrict the
> syntax of the FieldAccessors and might break existing code.
>
> What do others think?
>
> [1] https://github.com/apache/flink/pull/2094

Fabian Hueske-2

Re: [DISCUSS] Make FieldAccessor logic consistent with remaining API

IMO, FieldAccessors are certainly valuable and should exist.
I only want to make sure that there is a common syntax for field references.

Maybe we can refactor the parsing code for field references in
ExpressionKeys and expose it in static public methods that the
FieldAccessors can call.

Best,
Fabian

2016-10-27 14:52 GMT+02:00 Gábor Gévay <[hidden email]>:

> Hello,
>
> Thanks Fabian for starting this discussion.
>
> I would just like to add a few thougths about why does the
> FieldAccessors even exist. One could say that we should instead just
> re-use ExpressionKeys for the aggregations, and we are done. As far as
> I can see, the way ExpressionKeys is often used is roughly to call
> computeLogicalKeyPositions on it, then call
> CompositeType.createComparator with these key positions, and then we
> can call extractKeys on the comparator to get the values of the fields
> of a record.
>
> However, the problem with this is that in the aggregations we also
> need to write fields and not just read them. So with the above
> procedure, we would need an "intractKey" method on the comparators
> that would be the opposite of extractKeys, i.e. which would set the
> key fields.
>
> Another difference between the FieldAccessor world and the
> ExpressionKeys world is that in the aggregations we need to deal with
> only one field. This means that enhancing the comparators with the
> exact opposite of extractKeys would be overkill, because we don't need
> to set multiple fields at a time.
>
> It certainly seems to be a good idea to unify these at least
> externally (from the side of the users), but I'm not sure what would
> be a neat way to also internally unify them.
>
> Best,
> Gábor
>
>
> 2016-10-26 20:54 GMT+02:00 Fabian Hueske <[hidden email]>:
> > Hi everybody,
> >
> > while reviewing PR #2094 [1] I noticed that the field reference syntax
> for
> > FieldAccessors is not compatible with the syntax supported for key
> > definitions (ExpressionKeys) used in groupBy(), keyBy(),
> > join().where().equalTo(), etc.
> >
> > FieldAccessors are only used for build-in aggregations in the DataStream
> > API (sum(), min(), max(), ...).
> >
> > In particular I identified the following inconsistencies:
> >
> > - FieldAccessors allow to address array cells. ExpressionKeys treat
> arrays
> > as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it
> is
> > not possible to address array cells.
> > - ExpressionKeys do only support Integer keys for tuples. An atomic type
> > can only be addressed with "*". FieldAccessors allow to address
> AtomicTypes
> > with 0 in addtion to "*".
> > - ExpressionKeys support to address fields of Java tuples with "f2" and
> > Scala tuple fields with "_3". FieldAccessors do not support the "f" or
> "_"
> > prefix.
> >
> > I would like to propose to adapt the syntax of both mechanisms (ideally,
> > both should use the same code for validation). IMO, the ExpressionKey
> > syntax much more widely used and is well designed. Therefore, I would
> adopt
> > it for FieldAccessors as well. However, that would mean to restrict the
> > syntax of the FieldAccessors and might break existing code.
> >
> > What do others think?
> >
> > [1] https://github.com/apache/flink/pull/2094
>

Márton Balassi

Re: [DISCUSS] Make FieldAccessor logic consistent with remaining API

Thanks for bringing up the issue, Fabian.

I am also for unifying the access patterns between the FieldAccessor and
ExpressionKeys logic.

In terms of Fabian's suggestion to refactor the ExpressionKeys parsing
logic and rely on it in the FieldAccessors I think that is nice first step.
It would be great if we could provide even tighter integration though.

On Fri, Oct 28, 2016 at 6:17 PM, Fabian Hueske <[hidden email]> wrote:

> IMO, FieldAccessors are certainly valuable and should exist.
> I only want to make sure that there is a common syntax for field
> references.
>
> Maybe we can refactor the parsing code for field references in
> ExpressionKeys and expose it in static public methods that the
> FieldAccessors can call.
>
> Best,
> Fabian
>
>
>
> 2016-10-27 14:52 GMT+02:00 Gábor Gévay <[hidden email]>:
>
> > Hello,
> >
> > Thanks Fabian for starting this discussion.
> >
> > I would just like to add a few thougths about why does the
> > FieldAccessors even exist. One could say that we should instead just
> > re-use ExpressionKeys for the aggregations, and we are done. As far as
> > I can see, the way ExpressionKeys is often used is roughly to call
> > computeLogicalKeyPositions on it, then call
> > CompositeType.createComparator with these key positions, and then we
> > can call extractKeys on the comparator to get the values of the fields
> > of a record.
> >
> > However, the problem with this is that in the aggregations we also
> > need to write fields and not just read them. So with the above
> > procedure, we would need an "intractKey" method on the comparators
> > that would be the opposite of extractKeys, i.e. which would set the
> > key fields.
> >
> > Another difference between the FieldAccessor world and the
> > ExpressionKeys world is that in the aggregations we need to deal with
> > only one field. This means that enhancing the comparators with the
> > exact opposite of extractKeys would be overkill, because we don't need
> > to set multiple fields at a time.
> >
> > It certainly seems to be a good idea to unify these at least
> > externally (from the side of the users), but I'm not sure what would
> > be a neat way to also internally unify them.
> >
> > Best,
> > Gábor
> >
> >
> > 2016-10-26 20:54 GMT+02:00 Fabian Hueske <[hidden email]>:
> > > Hi everybody,
> > >
> > > while reviewing PR #2094 [1] I noticed that the field reference syntax
> > for
> > > FieldAccessors is not compatible with the syntax supported for key
> > > definitions (ExpressionKeys) used in groupBy(), keyBy(),
> > > join().where().equalTo(), etc.
> > >
> > > FieldAccessors are only used for build-in aggregations in the
> DataStream
> > > API (sum(), min(), max(), ...).
> > >
> > > In particular I identified the following inconsistencies:
> > >
> > > - FieldAccessors allow to address array cells. ExpressionKeys treat
> > arrays
> > > as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it
> > is
> > > not possible to address array cells.
> > > - ExpressionKeys do only support Integer keys for tuples. An atomic
> type
> > > can only be addressed with "*". FieldAccessors allow to address
> > AtomicTypes
> > > with 0 in addtion to "*".
> > > - ExpressionKeys support to address fields of Java tuples with "f2" and
> > > Scala tuple fields with "_3". FieldAccessors do not support the "f" or
> > "_"
> > > prefix.
> > >
> > > I would like to propose to adapt the syntax of both mechanisms
> (ideally,
> > > both should use the same code for validation). IMO, the ExpressionKey
> > > syntax much more widely used and is well designed. Therefore, I would
> > adopt
> > > it for FieldAccessors as well. However, that would mean to restrict the
> > > syntax of the FieldAccessors and might break existing code.
> > >
> > > What do others think?
> > >
> > > [1] https://github.com/apache/flink/pull/2094
> >
>

Márton Balassi

Re: [DISCUSS] Make FieldAccessor logic consistent with remaining API

Just a friendly reminder that PR 2094 resolved some of the issue mentioned
here, but a bit remains in terms of fully consolidating the semantics.
Merging the PR as soon as Travis comes in green.

The reminder is documented in [1].

[1] https://issues.apache.org/jira/browse/FLINK-5156

On Thu, Nov 3, 2016 at 4:42 PM, Márton Balassi <[hidden email]>
wrote:

> Thanks for bringing up the issue, Fabian.
>
> I am also for unifying the access patterns between the FieldAccessor and
> ExpressionKeys logic.
>
> In terms of Fabian's suggestion to refactor the ExpressionKeys parsing
> logic and rely on it in the FieldAccessors I think that is nice first step.
> It would be great if we could provide even tighter integration though.
>
> On Fri, Oct 28, 2016 at 6:17 PM, Fabian Hueske <[hidden email]> wrote:
>
>> IMO, FieldAccessors are certainly valuable and should exist.
>> I only want to make sure that there is a common syntax for field
>> references.
>>
>> Maybe we can refactor the parsing code for field references in
>> ExpressionKeys and expose it in static public methods that the
>> FieldAccessors can call.
>>
>> Best,
>> Fabian
>>
>>
>>
>> 2016-10-27 14:52 GMT+02:00 Gábor Gévay <[hidden email]>:
>>
>> > Hello,
>> >
>> > Thanks Fabian for starting this discussion.
>> >
>> > I would just like to add a few thougths about why does the
>> > FieldAccessors even exist. One could say that we should instead just
>> > re-use ExpressionKeys for the aggregations, and we are done. As far as
>> > I can see, the way ExpressionKeys is often used is roughly to call
>> > computeLogicalKeyPositions on it, then call
>> > CompositeType.createComparator with these key positions, and then we
>> > can call extractKeys on the comparator to get the values of the fields
>> > of a record.
>> >
>> > However, the problem with this is that in the aggregations we also
>> > need to write fields and not just read them. So with the above
>> > procedure, we would need an "intractKey" method on the comparators
>> > that would be the opposite of extractKeys, i.e. which would set the
>> > key fields.
>> >
>> > Another difference between the FieldAccessor world and the
>> > ExpressionKeys world is that in the aggregations we need to deal with
>> > only one field. This means that enhancing the comparators with the
>> > exact opposite of extractKeys would be overkill, because we don't need
>> > to set multiple fields at a time.
>> >
>> > It certainly seems to be a good idea to unify these at least
>> > externally (from the side of the users), but I'm not sure what would
>> > be a neat way to also internally unify them.
>> >
>> > Best,
>> > Gábor
>> >
>> >
>> > 2016-10-26 20:54 GMT+02:00 Fabian Hueske <[hidden email]>:
>> > > Hi everybody,
>> > >
>> > > while reviewing PR #2094 [1] I noticed that the field reference syntax
>> > for
>> > > FieldAccessors is not compatible with the syntax supported for key
>> > > definitions (ExpressionKeys) used in groupBy(), keyBy(),
>> > > join().where().equalTo(), etc.
>> > >
>> > > FieldAccessors are only used for build-in aggregations in the
>> DataStream
>> > > API (sum(), min(), max(), ...).
>> > >
>> > > In particular I identified the following inconsistencies:
>> > >
>> > > - FieldAccessors allow to address array cells. ExpressionKeys treat
>> > arrays
>> > > as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence,
>> it
>> > is
>> > > not possible to address array cells.
>> > > - ExpressionKeys do only support Integer keys for tuples. An atomic
>> type
>> > > can only be addressed with "*". FieldAccessors allow to address
>> > AtomicTypes
>> > > with 0 in addtion to "*".
>> > > - ExpressionKeys support to address fields of Java tuples with "f2"
>> and
>> > > Scala tuple fields with "_3". FieldAccessors do not support the "f" or
>> > "_"
>> > > prefix.
>> > >
>> > > I would like to propose to adapt the syntax of both mechanisms
>> (ideally,
>> > > both should use the same code for validation). IMO, the ExpressionKey
>> > > syntax much more widely used and is well designed. Therefore, I would
>> > adopt
>> > > it for FieldAccessors as well. However, that would mean to restrict
>> the
>> > > syntax of the FieldAccessors and might break existing code.
>> > >
>> > > What do others think?
>> > >
>> > > [1] https://github.com/apache/flink/pull/2094
>> >
>>
>
>