Hi all,
I have a pending pull request (#311) to fix and enable semantic information for functions with nested and Pojo types. Semantic information is used to tell the optimizer about the behavior of user-defined functions. The optimizer can use this information to generate more efficient execution plans. Assume for example a data set which is partitioned on the first field of a tuple and which is given to a Map function. If the optimizer knows, that the Map function does not modify the first field, it can infer that the data is still partitioned after the Map function was applied. There are two ways to give semantic information for user-defined function: 1) Class annotations: @ConstantFields("0; 1->2") public class MyMapper extends MapFunction<...> { } 2) Inline data flow: data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2"); In both cases the semantic annotation indicates that the first field (0) is preserved and the second field of the input (1) is forwarded to the third field of the output (2). The question is how should we name this feature? Right now it is inconsistently called "ConstantField" and "ConstantSet". I would prefer the name ForwardedFields because this indicates that fields are "forwarded" through the function and possibly also moved to another location. It would however, change the API (although I don't think this feature is often used because it was not advertised a lot). Any other suggestions or opinions on this? Cheers, Fabian |
Hi,
+1 for ForwardedFields. I like it much more than ConstantFields. I think it makes it clear what the feature does. It's a very cool feature and indeed not advertised a lot. I use it when I remember, but most of the times I forget it exists ;) -V. On 23 January 2015 at 22:12, Fabian Hueske <[hidden email]> wrote: > Hi all, > > I have a pending pull request (#311) to fix and enable semantic information > for functions with nested and Pojo types. > Semantic information is used to tell the optimizer about the behavior of > user-defined functions. > The optimizer can use this information to generate more efficient execution > plans. > > Assume for example a data set which is partitioned on the first field of a > tuple and which is given to a Map function. If the optimizer knows, that > the Map function does not modify the first field, it can infer that the > data is still partitioned after the Map function was applied. > > There are two ways to give semantic information for user-defined function: > 1) Class annotations: > @ConstantFields("0; 1->2") > public class MyMapper extends MapFunction<...> { } > > 2) Inline data flow: > data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2"); > > In both cases the semantic annotation indicates that the first field (0) is > preserved and the second field of the input (1) is forwarded to the third > field of the output (2). > > The question is how should we name this feature? > Right now it is inconsistently called "ConstantField" and "ConstantSet". > > I would prefer the name ForwardedFields because this indicates that fields > are "forwarded" through the function and possibly also moved to another > location. It would however, change the API (although I don't think this > feature is often used because it was not advertised a lot). > > Any other suggestions or opinions on this? > > Cheers, Fabian > |
+1 ForwardedFields
On 23.01.2015 22:38, Vasiliki Kalavri wrote: > Hi, > > +1 for ForwardedFields. I like it much more than ConstantFields. > I think it makes it clear what the feature does. > > It's a very cool feature and indeed not advertised a lot. I use it when I > remember, but most of the times I forget it exists ;) > > -V. > > On 23 January 2015 at 22:12, Fabian Hueske <[hidden email]> wrote: > >> Hi all, >> >> I have a pending pull request (#311) to fix and enable semantic information >> for functions with nested and Pojo types. >> Semantic information is used to tell the optimizer about the behavior of >> user-defined functions. >> The optimizer can use this information to generate more efficient execution >> plans. >> >> Assume for example a data set which is partitioned on the first field of a >> tuple and which is given to a Map function. If the optimizer knows, that >> the Map function does not modify the first field, it can infer that the >> data is still partitioned after the Map function was applied. >> >> There are two ways to give semantic information for user-defined function: >> 1) Class annotations: >> @ConstantFields("0; 1->2") >> public class MyMapper extends MapFunction<...> { } >> >> 2) Inline data flow: >> data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2"); >> >> In both cases the semantic annotation indicates that the first field (0) is >> preserved and the second field of the input (1) is forwarded to the third >> field of the output (2). >> >> The question is how should we name this feature? >> Right now it is inconsistently called "ConstantField" and "ConstantSet". >> >> I would prefer the name ForwardedFields because this indicates that fields >> are "forwarded" through the function and possibly also moved to another >> location. It would however, change the API (although I don't think this >> feature is often used because it was not advertised a lot). >> >> Any other suggestions or opinions on this? >> >> Cheers, Fabian >> |
I agree with ForwardFields as well.
I vaguely remember that Joe Harjung (when working on the first Scala API version) called it the CopySet. I would assume that ForwardFields is more intuitive to most people. I only mention this, because Joe was one of the few English native speakers in the team. Would be nice to have a comment by another English native speaker ;-) On Fri, Jan 23, 2015 at 1:51 PM, Chesnay Schepler < [hidden email]> wrote: > +1 ForwardedFields > > > On 23.01.2015 22:38, Vasiliki Kalavri wrote: > >> Hi, >> >> +1 for ForwardedFields. I like it much more than ConstantFields. >> I think it makes it clear what the feature does. >> >> It's a very cool feature and indeed not advertised a lot. I use it when I >> remember, but most of the times I forget it exists ;) >> >> -V. >> >> On 23 January 2015 at 22:12, Fabian Hueske <[hidden email]> wrote: >> >> Hi all, >>> >>> I have a pending pull request (#311) to fix and enable semantic >>> information >>> for functions with nested and Pojo types. >>> Semantic information is used to tell the optimizer about the behavior of >>> user-defined functions. >>> The optimizer can use this information to generate more efficient >>> execution >>> plans. >>> >>> Assume for example a data set which is partitioned on the first field of >>> a >>> tuple and which is given to a Map function. If the optimizer knows, that >>> the Map function does not modify the first field, it can infer that the >>> data is still partitioned after the Map function was applied. >>> >>> There are two ways to give semantic information for user-defined >>> function: >>> 1) Class annotations: >>> @ConstantFields("0; 1->2") >>> public class MyMapper extends MapFunction<...> { } >>> >>> 2) Inline data flow: >>> data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2"); >>> >>> In both cases the semantic annotation indicates that the first field (0) >>> is >>> preserved and the second field of the input (1) is forwarded to the third >>> field of the output (2). >>> >>> The question is how should we name this feature? >>> Right now it is inconsistently called "ConstantField" and "ConstantSet". >>> >>> I would prefer the name ForwardedFields because this indicates that >>> fields >>> are "forwarded" through the function and possibly also moved to another >>> location. It would however, change the API (although I don't think this >>> feature is often used because it was not advertised a lot). >>> >>> Any other suggestions or opinions on this? >>> >>> Cheers, Fabian >>> >>> > |
Free forum by Nabble | Edit this page |