(DEPRECATED) Apache Flink Mailing List archive.

Naming of semantic annotations

Classic

List

Threaded

4 messages Options

Fabian Hueske-2

Naming of semantic annotations

Hi all,

I have a pending pull request (#311) to fix and enable semantic information
for functions with nested and Pojo types.
Semantic information is used to tell the optimizer about the behavior of
user-defined functions.
The optimizer can use this information to generate more efficient execution
plans.

Assume for example a data set which is partitioned on the first field of a
tuple and which is given to a Map function. If the optimizer knows, that
the Map function does not modify the first field, it can infer that the
data is still partitioned after the Map function was applied.

There are two ways to give semantic information for user-defined function:
1) Class annotations:
@ConstantFields("0; 1->2")
public class MyMapper extends MapFunction<...> { }

2) Inline data flow:
data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2");

In both cases the semantic annotation indicates that the first field (0) is
preserved and the second field of the input (1) is forwarded to the third
field of the output (2).

The question is how should we name this feature?
Right now it is inconsistently called "ConstantField" and "ConstantSet".

I would prefer the name ForwardedFields because this indicates that fields
are "forwarded" through the function and possibly also moved to another
location. It would however, change the API (although I don't think this
feature is often used because it was not advertised a lot).

Any other suggestions or opinions on this?

Cheers, Fabian

Vasiliki Kalavri

Re: Naming of semantic annotations

Hi,

+1 for ForwardedFields. I like it much more than ConstantFields.
I think it makes it clear what the feature does.

It's a very cool feature and indeed not advertised a lot. I use it when I
remember, but most of the times I forget it exists ;)

-V.

On 23 January 2015 at 22:12, Fabian Hueske <[hidden email]> wrote:

> Hi all,
>
> I have a pending pull request (#311) to fix and enable semantic information
> for functions with nested and Pojo types.
> Semantic information is used to tell the optimizer about the behavior of
> user-defined functions.
> The optimizer can use this information to generate more efficient execution
> plans.
>
> Assume for example a data set which is partitioned on the first field of a
> tuple and which is given to a Map function. If the optimizer knows, that
> the Map function does not modify the first field, it can infer that the
> data is still partitioned after the Map function was applied.
>
> There are two ways to give semantic information for user-defined function:
> 1) Class annotations:
> @ConstantFields("0; 1->2")
> public class MyMapper extends MapFunction<...> { }
>
> 2) Inline data flow:
> data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2");
>
> In both cases the semantic annotation indicates that the first field (0) is
> preserved and the second field of the input (1) is forwarded to the third
> field of the output (2).
>
> The question is how should we name this feature?
> Right now it is inconsistently called "ConstantField" and "ConstantSet".
>
> I would prefer the name ForwardedFields because this indicates that fields
> are "forwarded" through the function and possibly also moved to another
> location. It would however, change the API (although I don't think this
> feature is often used because it was not advertised a lot).
>
> Any other suggestions or opinions on this?
>
> Cheers, Fabian
>

Chesnay Schepler

Re: Naming of semantic annotations

+1 ForwardedFields

On 23.01.2015 22:38, Vasiliki Kalavri wrote:

> Hi,
>
> +1 for ForwardedFields. I like it much more than ConstantFields.
> I think it makes it clear what the feature does.
>
> It's a very cool feature and indeed not advertised a lot. I use it when I
> remember, but most of the times I forget it exists ;)
>
> -V.
>
> On 23 January 2015 at 22:12, Fabian Hueske <[hidden email]> wrote:
>
>> Hi all,
>>
>> I have a pending pull request (#311) to fix and enable semantic information
>> for functions with nested and Pojo types.
>> Semantic information is used to tell the optimizer about the behavior of
>> user-defined functions.
>> The optimizer can use this information to generate more efficient execution
>> plans.
>>
>> Assume for example a data set which is partitioned on the first field of a
>> tuple and which is given to a Map function. If the optimizer knows, that
>> the Map function does not modify the first field, it can infer that the
>> data is still partitioned after the Map function was applied.
>>
>> There are two ways to give semantic information for user-defined function:
>> 1) Class annotations:
>> @ConstantFields("0; 1->2")
>> public class MyMapper extends MapFunction<...> { }
>>
>> 2) Inline data flow:
>> data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2");
>>
>> In both cases the semantic annotation indicates that the first field (0) is
>> preserved and the second field of the input (1) is forwarded to the third
>> field of the output (2).
>>
>> The question is how should we name this feature?
>> Right now it is inconsistently called "ConstantField" and "ConstantSet".
>>
>> I would prefer the name ForwardedFields because this indicates that fields
>> are "forwarded" through the function and possibly also moved to another
>> location. It would however, change the API (although I don't think this
>> feature is often used because it was not advertised a lot).
>>
>> Any other suggestions or opinions on this?
>>
>> Cheers, Fabian
>>

Stephan Ewen

Re: Naming of semantic annotations

I agree with ForwardFields as well.

I vaguely remember that Joe Harjung (when working on the first Scala API
version) called it the CopySet. I would assume that ForwardFields is more
intuitive to most people.

I only mention this, because Joe was one of the few English native
speakers in the team. Would be nice to have a comment by another English
native speaker ;-)

On Fri, Jan 23, 2015 at 1:51 PM, Chesnay Schepler <
[hidden email]> wrote:

> +1 ForwardedFields
>
>
> On 23.01.2015 22:38, Vasiliki Kalavri wrote:
>
>> Hi,
>>
>> +1 for ForwardedFields. I like it much more than ConstantFields.
>> I think it makes it clear what the feature does.
>>
>> It's a very cool feature and indeed not advertised a lot. I use it when I
>> remember, but most of the times I forget it exists ;)
>>
>> -V.
>>
>> On 23 January 2015 at 22:12, Fabian Hueske <[hidden email]> wrote:
>>
>> Hi all,
>>>
>>> I have a pending pull request (#311) to fix and enable semantic
>>> information
>>> for functions with nested and Pojo types.
>>> Semantic information is used to tell the optimizer about the behavior of
>>> user-defined functions.
>>> The optimizer can use this information to generate more efficient
>>> execution
>>> plans.
>>>
>>> Assume for example a data set which is partitioned on the first field of
>>> a
>>> tuple and which is given to a Map function. If the optimizer knows, that
>>> the Map function does not modify the first field, it can infer that the
>>> data is still partitioned after the Map function was applied.
>>>
>>> There are two ways to give semantic information for user-defined
>>> function:
>>> 1) Class annotations:
>>> @ConstantFields("0; 1->2")
>>> public class MyMapper extends MapFunction<...> { }
>>>
>>> 2) Inline data flow:
>>> data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2");
>>>
>>> In both cases the semantic annotation indicates that the first field (0)
>>> is
>>> preserved and the second field of the input (1) is forwarded to the third
>>> field of the output (2).
>>>
>>> The question is how should we name this feature?
>>> Right now it is inconsistently called "ConstantField" and "ConstantSet".
>>>
>>> I would prefer the name ForwardedFields because this indicates that
>>> fields
>>> are "forwarded" through the function and possibly also moved to another
>>> location. It would however, change the API (although I don't think this
>>> feature is often used because it was not advertised a lot).
>>>
>>> Any other suggestions or opinions on this?
>>>
>>> Cheers, Fabian
>>>
>>>
>