[DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Wei Zhong-2
Hi everyone,

I would like to start discussion about how to support General Python User-Defined Aggregate Function on Table API.

FLIP-58[1] has already introduced the stateless Python UDF and has already been supported in the previous releases. However the stateful Python UDF, i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We want to introduce the general Python user-defined aggregate function for PyFlink Table API.

Here is the design doc:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API

Looking forward to your feedback!

Best,
Wei

[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

jincheng sun
Hi Wei,
Thanks for the discussion! Overall, + 1 for this FLIP.

One question is, can the @udaf added by flip-137 be used in General Python
UDAF?
Would be gread if we can consider it combination with FLIP-137 for design.

What do you think?

Best,
Jincheng

Wei Zhong <[hidden email]> 于2020年8月26日周三 上午11:28写道:

> Hi everyone,
>
> I would like to start discussion about how to support General Python
> User-Defined Aggregate Function on Table API.
>
> FLIP-58[1] has already introduced the stateless Python UDF and has already
> been supported in the previous releases. However the stateful Python UDF,
> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
> want to introduce the general Python user-defined aggregate function for
> PyFlink Table API.
>
> Here is the design doc:
>
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>
> Looking forward to your feedback!
>
> Best,
> Wei
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Xingbo Huang
Hi Wei,

Thanks a lot for the discussion.

Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
together.

One question is whether we can use @udaf which is introduced in FLIP-137[1]
to describe pandas udaf and general python udaf together. From the overall
view of Python User Defined Function, we use @udf to describe general
python udf and pandas udf, @udtf to describe python udtf, and @udaf to
describe general python udaf and pandas udaf, which is more unified.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces

Best,
Xingbo

jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:11写道:

> Hi Wei,
> Thanks for the discussion! Overall, + 1 for this FLIP.
>
> One question is, can the @udaf added by flip-137 be used in General Python
> UDAF?
> Would be gread if we can consider it combination with FLIP-137 for design.
>
> What do you think?
>
> Best,
> Jincheng
>
> Wei Zhong <[hidden email]> 于2020年8月26日周三 上午11:28写道:
>
> > Hi everyone,
> >
> > I would like to start discussion about how to support General Python
> > User-Defined Aggregate Function on Table API.
> >
> > FLIP-58[1] has already introduced the stateless Python UDF and has
> already
> > been supported in the previous releases. However the stateful Python UDF,
> > i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
> > want to introduce the general Python user-defined aggregate function for
> > PyFlink Table API.
> >
> > Here is the design doc:
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
> >
> > Looking forward to your feedback!
> >
> > Best,
> > Wei
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Wei Zhong-2
Hi Jincheng & Xingbo,

Thanks for your suggestions.

I agree that we should keep the user interface uniform. I'll adjust the design to allow users to specify the result type and accumulator type via @udaf.

Best,
Wei


> 在 2020年8月31日,18:06,Xingbo Huang <[hidden email]> 写道:
>
> Hi Wei,
>
> Thanks a lot for the discussion.
>
> Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
> together.
>
> One question is whether we can use @udaf which is introduced in FLIP-137[1]
> to describe pandas udaf and general python udaf together. From the overall
> view of Python User Defined Function, we use @udf to describe general
> python udf and pandas udf, @udtf to describe python udtf, and @udaf to
> describe general python udaf and pandas udaf, which is more unified.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces
>
> Best,
> Xingbo
>
> jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:11写道:
>
>> Hi Wei,
>> Thanks for the discussion! Overall, + 1 for this FLIP.
>>
>> One question is, can the @udaf added by flip-137 be used in General Python
>> UDAF?
>> Would be gread if we can consider it combination with FLIP-137 for design.
>>
>> What do you think?
>>
>> Best,
>> Jincheng
>>
>> Wei Zhong <[hidden email]> 于2020年8月26日周三 上午11:28写道:
>>
>>> Hi everyone,
>>>
>>> I would like to start discussion about how to support General Python
>>> User-Defined Aggregate Function on Table API.
>>>
>>> FLIP-58[1] has already introduced the stateless Python UDF and has
>> already
>>> been supported in the previous releases. However the stateful Python UDF,
>>> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
>>> want to introduce the general Python user-defined aggregate function for
>>> PyFlink Table API.
>>>
>>> Here is the design doc:
>>>
>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>>>
>>> Looking forward to your feedback!
>>>
>>> Best,
>>> Wei
>>>
>>> [1]
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Timo Walther-2
Hi Wei,

is `reset_accumulator` still necessary? We dropped it recently in the
Java API because it was not used anymore by the planner.

Regards,
Timo

On 31.08.20 15:00, Wei Zhong wrote:

> Hi Jincheng & Xingbo,
>
> Thanks for your suggestions.
>
> I agree that we should keep the user interface uniform. I'll adjust the design to allow users to specify the result type and accumulator type via @udaf.
>
> Best,
> Wei
>
>
>> 在 2020年8月31日,18:06,Xingbo Huang <[hidden email]> 写道:
>>
>> Hi Wei,
>>
>> Thanks a lot for the discussion.
>>
>> Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
>> together.
>>
>> One question is whether we can use @udaf which is introduced in FLIP-137[1]
>> to describe pandas udaf and general python udaf together. From the overall
>> view of Python User Defined Function, we use @udf to describe general
>> python udf and pandas udf, @udtf to describe python udtf, and @udaf to
>> describe general python udaf and pandas udaf, which is more unified.
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces
>>
>> Best,
>> Xingbo
>>
>> jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:11写道:
>>
>>> Hi Wei,
>>> Thanks for the discussion! Overall, + 1 for this FLIP.
>>>
>>> One question is, can the @udaf added by flip-137 be used in General Python
>>> UDAF?
>>> Would be gread if we can consider it combination with FLIP-137 for design.
>>>
>>> What do you think?
>>>
>>> Best,
>>> Jincheng
>>>
>>> Wei Zhong <[hidden email]> 于2020年8月26日周三 上午11:28写道:
>>>
>>>> Hi everyone,
>>>>
>>>> I would like to start discussion about how to support General Python
>>>> User-Defined Aggregate Function on Table API.
>>>>
>>>> FLIP-58[1] has already introduced the stateless Python UDF and has
>>> already
>>>> been supported in the previous releases. However the stateful Python UDF,
>>>> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
>>>> want to introduce the general Python user-defined aggregate function for
>>>> PyFlink Table API.
>>>>
>>>> Here is the design doc:
>>>>
>>>>
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>>>>
>>>> Looking forward to your feedback!
>>>>
>>>> Best,
>>>> Wei
>>>>
>>>> [1]
>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>
>>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-139: General Python User-Defined Aggregate Function on Table API

Wei Zhong-2
Hi everyone,

Are there more comments about this FLIP? If not, I would like to bring up the VOTE.

Best,
Wei

> 在 2020年9月1日,11:15,Wei Zhong <[hidden email]> 写道:
>
> Hi Timo,
>
> Thanks for your notification. I’ll remove it from the design doc.
>
> Best,
> Wei
>
>> 在 2020年8月31日,21:11,Timo Walther <[hidden email]> 写道:
>>
>> Hi Wei,
>>
>> is `reset_accumulator` still necessary? We dropped it recently in the Java API because it was not used anymore by the planner.
>>
>> Regards,
>> Timo
>>
>> On 31.08.20 15:00, Wei Zhong wrote:
>>> Hi Jincheng & Xingbo,
>>> Thanks for your suggestions.
>>> I agree that we should keep the user interface uniform. I'll adjust the design to allow users to specify the result type and accumulator type via @udaf.
>>> Best,
>>> Wei
>>>> 在 2020年8月31日,18:06,Xingbo Huang <[hidden email]> 写道:
>>>>
>>>> Hi Wei,
>>>>
>>>> Thanks a lot for the discussion.
>>>>
>>>> Thanks a lot for Jincheng's suggestion of discussing FLIP-137 and FLIP-139
>>>> together.
>>>>
>>>> One question is whether we can use @udaf which is introduced in FLIP-137[1]
>>>> to describe pandas udaf and general python udaf together. From the overall
>>>> view of Python User Defined Function, we use @udf to describe general
>>>> python udf and pandas udf, @udtf to describe python udtf, and @udaf to
>>>> describe general python udaf and pandas udaf, which is more unified.
>>>>
>>>> [1]
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink#FLIP137:SupportPandasUDAFinPyFlink-Interfaces
>>>>
>>>> Best,
>>>> Xingbo
>>>>
>>>> jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:11写道:
>>>>
>>>>> Hi Wei,
>>>>> Thanks for the discussion! Overall, + 1 for this FLIP.
>>>>>
>>>>> One question is, can the @udaf added by flip-137 be used in General Python
>>>>> UDAF?
>>>>> Would be gread if we can consider it combination with FLIP-137 for design.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> Best,
>>>>> Jincheng
>>>>>
>>>>> Wei Zhong <[hidden email]> 于2020年8月26日周三 上午11:28写道:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> I would like to start discussion about how to support General Python
>>>>>> User-Defined Aggregate Function on Table API.
>>>>>>
>>>>>> FLIP-58[1] has already introduced the stateless Python UDF and has
>>>>> already
>>>>>> been supported in the previous releases. However the stateful Python UDF,
>>>>>> i.e. User-Defined Aggregate Function is not supported in PyFlink yet. We
>>>>>> want to introduce the general Python user-defined aggregate function for
>>>>>> PyFlink Table API.
>>>>>>
>>>>>> Here is the design doc:
>>>>>>
>>>>>>
>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-139%3A+General+Python+User-Defined+Aggregate+Function+Support+on+Table+API
>>>>>>
>>>>>> Looking forward to your feedback!
>>>>>>
>>>>>> Best,
>>>>>> Wei
>>>>>>
>>>>>> [1]
>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>>>>>
>>>>>>
>>>>>
>>
>