I as that a recent commit introduced the notion of sort key types, which
have a well defined order. What is the difference to a regular key? Regular keys are also sortable, so why the distinction? |
Regular keys differ from sort keys in that they can be (somehow) sorted,
but their order is not necessarily "intuitive". So regular keys are sufficient for sort-based grouping, but not for explicit sorting (groupSort, partitionSort, outputSort). Right now, this difference is only relevant for POJOs. Since the order of POJO fields is not (well) defined and they are ordered based on all their fields, the resulting order is not well defined either. We can add support for sorting POJOs if these implement Comparable or somehow define the order of their fields (as proposed in FLINK-1665) 2015-04-06 14:34 GMT+02:00 Stephan Ewen <[hidden email]>: > I as that a recent commit introduced the notion of sort key types, which > have a well defined order. > > What is the difference to a regular key? Regular keys are also sortable, so > why the distinction? > |
I am wondering if it is necessary to add this extra distinction and
complexity in the code. One simple way to get around this would be simply require that user requested sorts specify all atomic fields directly. Wouldn't that be a fair restriction? I am saying this because I am seeing the API classes getting increasingly more complex and would really like to keep the newly introduced API concepts to a minimum - otherwise, the API will soon become unmaintainable. On Tue, Apr 7, 2015 at 10:01 AM, Fabian Hueske <[hidden email]> wrote: > Regular keys differ from sort keys in that they can be (somehow) sorted, > but their order is not necessarily "intuitive". So regular keys are > sufficient for sort-based grouping, but not for explicit sorting > (groupSort, partitionSort, outputSort). > > Right now, this difference is only relevant for POJOs. Since the order of > POJO fields is not (well) defined and they are ordered based on all their > fields, the resulting order is not well defined either. > We can add support for sorting POJOs if these implement Comparable or > somehow define the order of their fields (as proposed in FLINK-1665) > > 2015-04-06 14:34 GMT+02:00 Stephan Ewen <[hidden email]>: > > > I as that a recent commit introduced the notion of sort key types, which > > have a well defined order. > > > > What is the difference to a regular key? Regular keys are also sortable, > so > > why the distinction? > > > |
Limiting sorting to atomic fields would also prohibit to sort on Tuples and
CaseClasses. I guess that is something that is not too uncommon. As a user is would also expect to sort POJOs that implement Comparable. Also the default implementation isSortKey() returns the result of isKeyType(). So users don't need to implement the method if the type is a key and sortKey. 2015-04-07 10:26 GMT+02:00 Stephan Ewen <[hidden email]>: > I am wondering if it is necessary to add this extra distinction and > complexity in the code. > > One simple way to get around this would be simply require that user > requested sorts specify all atomic fields directly. Wouldn't that be a fair > restriction? > > I am saying this because I am seeing the API classes getting increasingly > more complex and would really like to keep the newly introduced API > concepts to a minimum - otherwise, the API will soon become unmaintainable. > > On Tue, Apr 7, 2015 at 10:01 AM, Fabian Hueske <[hidden email]> wrote: > > > Regular keys differ from sort keys in that they can be (somehow) sorted, > > but their order is not necessarily "intuitive". So regular keys are > > sufficient for sort-based grouping, but not for explicit sorting > > (groupSort, partitionSort, outputSort). > > > > Right now, this difference is only relevant for POJOs. Since the order of > > POJO fields is not (well) defined and they are ordered based on all their > > fields, the resulting order is not well defined either. > > We can add support for sorting POJOs if these implement Comparable or > > somehow define the order of their fields (as proposed in FLINK-1665) > > > > 2015-04-06 14:34 GMT+02:00 Stephan Ewen <[hidden email]>: > > > > > I as that a recent commit introduced the notion of sort key types, > which > > > have a well defined order. > > > > > > What is the difference to a regular key? Regular keys are also > sortable, > > so > > > why the distinction? > > > > > > |
I think the point is that understanding the concepts becomes increasingly
more difficult if we just keep introducing more concepts all the time with little consideration. Nothing prohibits to sort on Tuples and case classes, it only requires a string or two more in the program code. I think it is very worthwhile considering that, if it helps keeping the concepts of the API simpler... On Tue, Apr 7, 2015 at 10:36 AM, Fabian Hueske <[hidden email]> wrote: > Limiting sorting to atomic fields would also prohibit to sort on Tuples and > CaseClasses. > I guess that is something that is not too uncommon. As a user is would also > expect to sort POJOs that implement Comparable. > > Also the default implementation isSortKey() returns the result of > isKeyType(). > So users don't need to implement the method if the type is a key and > sortKey. > > 2015-04-07 10:26 GMT+02:00 Stephan Ewen <[hidden email]>: > > > I am wondering if it is necessary to add this extra distinction and > > complexity in the code. > > > > One simple way to get around this would be simply require that user > > requested sorts specify all atomic fields directly. Wouldn't that be a > fair > > restriction? > > > > I am saying this because I am seeing the API classes getting increasingly > > more complex and would really like to keep the newly introduced API > > concepts to a minimum - otherwise, the API will soon become > unmaintainable. > > > > On Tue, Apr 7, 2015 at 10:01 AM, Fabian Hueske <[hidden email]> > wrote: > > > > > Regular keys differ from sort keys in that they can be (somehow) > sorted, > > > but their order is not necessarily "intuitive". So regular keys are > > > sufficient for sort-based grouping, but not for explicit sorting > > > (groupSort, partitionSort, outputSort). > > > > > > Right now, this difference is only relevant for POJOs. Since the order > of > > > POJO fields is not (well) defined and they are ordered based on all > their > > > fields, the resulting order is not well defined either. > > > We can add support for sorting POJOs if these implement Comparable or > > > somehow define the order of their fields (as proposed in FLINK-1665) > > > > > > 2015-04-06 14:34 GMT+02:00 Stephan Ewen <[hidden email]>: > > > > > > > I as that a recent commit introduced the notion of sort key types, > > which > > > > have a well defined order. > > > > > > > > What is the difference to a regular key? Regular keys are also > > sortable, > > > so > > > > why the distinction? > > > > > > > > > > |
Sure, simple API concepts are important.
But this concept is quite hidden and will only appear to people who are already quite involved with the system. Only users who define own TypeInformations that require the distinction between sorting and regular keys need to worry about it. So it is only relevant in very few cornercases but makes the API more powerful because it allows sorting on full composite types. 2015-04-07 10:41 GMT+02:00 Stephan Ewen <[hidden email]>: > I think the point is that understanding the concepts becomes increasingly > more difficult if > we just keep introducing more concepts all the time with little > consideration. > > Nothing prohibits to sort on Tuples and case classes, it only requires a > string or two more in the program code. > > I think it is very worthwhile considering that, if it helps keeping the > concepts of the API simpler... > > > On Tue, Apr 7, 2015 at 10:36 AM, Fabian Hueske <[hidden email]> wrote: > > > Limiting sorting to atomic fields would also prohibit to sort on Tuples > and > > CaseClasses. > > I guess that is something that is not too uncommon. As a user is would > also > > expect to sort POJOs that implement Comparable. > > > > Also the default implementation isSortKey() returns the result of > > isKeyType(). > > So users don't need to implement the method if the type is a key and > > sortKey. > > > > 2015-04-07 10:26 GMT+02:00 Stephan Ewen <[hidden email]>: > > > > > I am wondering if it is necessary to add this extra distinction and > > > complexity in the code. > > > > > > One simple way to get around this would be simply require that user > > > requested sorts specify all atomic fields directly. Wouldn't that be a > > fair > > > restriction? > > > > > > I am saying this because I am seeing the API classes getting > increasingly > > > more complex and would really like to keep the newly introduced API > > > concepts to a minimum - otherwise, the API will soon become > > unmaintainable. > > > > > > On Tue, Apr 7, 2015 at 10:01 AM, Fabian Hueske <[hidden email]> > > wrote: > > > > > > > Regular keys differ from sort keys in that they can be (somehow) > > sorted, > > > > but their order is not necessarily "intuitive". So regular keys are > > > > sufficient for sort-based grouping, but not for explicit sorting > > > > (groupSort, partitionSort, outputSort). > > > > > > > > Right now, this difference is only relevant for POJOs. Since the > order > > of > > > > POJO fields is not (well) defined and they are ordered based on all > > their > > > > fields, the resulting order is not well defined either. > > > > We can add support for sorting POJOs if these implement Comparable or > > > > somehow define the order of their fields (as proposed in FLINK-1665) > > > > > > > > 2015-04-06 14:34 GMT+02:00 Stephan Ewen <[hidden email]>: > > > > > > > > > I as that a recent commit introduced the notion of sort key types, > > > which > > > > > have a well defined order. > > > > > > > > > > What is the difference to a regular key? Regular keys are also > > > sortable, > > > > so > > > > > why the distinction? > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |