Hi everyone,
I would like to start a discussion thread on "Support Pandas UDAF in PyFlink" Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the high serialization/deserialization overhead in Python UDF and makes it convenient to leverage the popular Python libraries such as Pandas, Numpy, etc. Since Pandas UDF has so many advantages, we want to support Pandas UDAF to extend usage of Pandas UDF. Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. It includes the following items: - Support Pandas UDAF in Batch Group Aggregation - Support Pandas UDAF in Batch Group Window Aggregation - Support Pandas UDAF in Batch Over Window Aggregation - Support Pandas UDAF in Stream Group Window Aggregation - Support Pandas UDAF in Stream Bounded Over Window Aggregation Looking forward to your feedback! Best, Xingbo [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink |
Hi Xingbo,
Thanks for the discussion! Overall, + 1 for this FLIP. I have two points to add: - We also need to consider how pandas UDAF supports metrics, and whether we need a custom interface for pandas UDAF? - We have added @udaf(), so whether to use ordinary Python UDAF? If not, the addition of @udaf is not appropriate. We need to discuss it further. We can consider it combination with FLIP-139 for design. What do you think? Best, Jincheng Xingbo Huang <[hidden email]> 于2020年8月24日周一 下午2:25写道: > Hi everyone, > > I would like to start a discussion thread on "Support Pandas UDAF in > PyFlink" > > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the > high serialization/deserialization overhead in Python UDF and makes it > convenient to leverage the popular Python libraries such as Pandas, Numpy, > etc. Since Pandas UDF has so many advantages, we want to support Pandas > UDAF to extend usage of Pandas UDF. > > Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. It > includes the following items: > - Support Pandas UDAF in Batch Group Aggregation > - Support Pandas UDAF in Batch Group Window Aggregation > - Support Pandas UDAF in Batch Over Window Aggregation > - Support Pandas UDAF in Stream Group Window Aggregation > - Support Pandas UDAF in Stream Bounded Over Window Aggregation > > > Looking forward to your feedback! > > Best, > Xingbo > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink > |
Hi Jincheng,
Thanks a lot for joining the discussion and the suggestion of discussing FLIP-137 and FLIP-139 together. >> 1. We also need to consider how pandas UDAF supports metrics, and whether we need a custom interface for pandas UDAF? Yes. We need to add an interface so that users can add some logic in the `open` or `close` method such as creating metrics. I have added the definition of the interface and the corresponding example in the doc. >> 2. We have added @udaf(), so whether to use ordinary Python UDAF? Yes. From the overall view of Python User Defined Function, we use @udf to describe general python udf and pandas udf, @udtf to describe python udtf, and @udaf to describe general python udaf and pandas udaf, which is more unified. I will discuss it in FLIP-139 later. Best, Xingbo jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:05写道: > Hi Xingbo, > > Thanks for the discussion! Overall, + 1 for this FLIP. > I have two points to add: > > - We also need to consider how pandas UDAF supports metrics, and whether > we need a custom interface for pandas UDAF? > - We have added @udaf(), so whether to use ordinary Python UDAF? If not, > the addition of @udaf is not appropriate. We need to discuss it further. > > We can consider it combination with FLIP-139 for design. What do you think? > > Best, > Jincheng > > > Xingbo Huang <[hidden email]> 于2020年8月24日周一 下午2:25写道: > > > Hi everyone, > > > > I would like to start a discussion thread on "Support Pandas UDAF in > > PyFlink" > > > > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the > > high serialization/deserialization overhead in Python UDF and makes it > > convenient to leverage the popular Python libraries such as Pandas, > Numpy, > > etc. Since Pandas UDF has so many advantages, we want to support Pandas > > UDAF to extend usage of Pandas UDF. > > > > Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. It > > includes the following items: > > - Support Pandas UDAF in Batch Group Aggregation > > - Support Pandas UDAF in Batch Group Window Aggregation > > - Support Pandas UDAF in Batch Over Window Aggregation > > - Support Pandas UDAF in Stream Group Window Aggregation > > - Support Pandas UDAF in Stream Bounded Over Window Aggregation > > > > > > Looking forward to your feedback! > > > > Best, > > Xingbo > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink > > > |
Thanks for the update Xingbo!
Pandas UDAF can reuse the `class aggregate function (user defined function)` interface in FLIP-139, and the core logic of Pandas UDAF users is written in the `accumulate` method. In this way, we can unify the interface semantics of all UDAF. What do you think? Best, Jincheng Xingbo Huang <[hidden email]> 于2020年8月31日周一 下午6:06写道: > Hi Jincheng, > > Thanks a lot for joining the discussion and the suggestion of discussing > FLIP-137 and FLIP-139 together. > > >> 1. We also need to consider how pandas UDAF supports metrics, and > whether > we need a custom interface for pandas UDAF? > > Yes. We need to add an interface so that users can add some logic in the > `open` or `close` method such as creating metrics. I have added the > definition of the interface and the corresponding example in the doc. > > >> 2. We have added @udaf(), so whether to use ordinary Python UDAF? > > Yes. From the overall view of Python User Defined Function, we use @udf to > describe general python udf and pandas udf, @udtf to describe python udtf, > and @udaf to describe general python udaf and pandas udaf, which is more > unified. I will discuss it in FLIP-139 later. > > Best, > Xingbo > > jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:05写道: > > > Hi Xingbo, > > > > Thanks for the discussion! Overall, + 1 for this FLIP. > > I have two points to add: > > > > - We also need to consider how pandas UDAF supports metrics, and whether > > we need a custom interface for pandas UDAF? > > - We have added @udaf(), so whether to use ordinary Python UDAF? If not, > > the addition of @udaf is not appropriate. We need to discuss it further. > > > > We can consider it combination with FLIP-139 for design. What do you > think? > > > > Best, > > Jincheng > > > > > > Xingbo Huang <[hidden email]> 于2020年8月24日周一 下午2:25写道: > > > > > Hi everyone, > > > > > > I would like to start a discussion thread on "Support Pandas UDAF in > > > PyFlink" > > > > > > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves the > > > high serialization/deserialization overhead in Python UDF and makes it > > > convenient to leverage the popular Python libraries such as Pandas, > > Numpy, > > > etc. Since Pandas UDF has so many advantages, we want to support Pandas > > > UDAF to extend usage of Pandas UDF. > > > > > > Dian Fu and I have discussed offline and have drafted the FLIP-137[2]. > It > > > includes the following items: > > > - Support Pandas UDAF in Batch Group Aggregation > > > - Support Pandas UDAF in Batch Group Window Aggregation > > > - Support Pandas UDAF in Batch Over Window Aggregation > > > - Support Pandas UDAF in Stream Group Window Aggregation > > > - Support Pandas UDAF in Stream Bounded Over Window Aggregation > > > > > > > > > Looking forward to your feedback! > > > > > > Best, > > > Xingbo > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > > > [2] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink > > > > > > |
Hi Jincheng,
Yes, I agree that users can extend the class `AggregateFunction` if they want to define a Pandas UDAF by the way of custom classes. I have updated the part of the FLIP. Best, Xingbo jincheng sun <[hidden email]> 于2020年9月3日周四 下午1:48写道: > Thanks for the update Xingbo! > > Pandas UDAF can reuse the `class aggregate function (user defined > function)` interface in FLIP-139, and the core logic of Pandas UDAF users > is written in the `accumulate` method. In this way, we can unify the > interface semantics of all UDAF. > > What do you think? > > Best, > Jincheng > > > > Xingbo Huang <[hidden email]> 于2020年8月31日周一 下午6:06写道: > > > Hi Jincheng, > > > > Thanks a lot for joining the discussion and the suggestion of discussing > > FLIP-137 and FLIP-139 together. > > > > >> 1. We also need to consider how pandas UDAF supports metrics, and > > whether > > we need a custom interface for pandas UDAF? > > > > Yes. We need to add an interface so that users can add some logic in the > > `open` or `close` method such as creating metrics. I have added the > > definition of the interface and the corresponding example in the doc. > > > > >> 2. We have added @udaf(), so whether to use ordinary Python UDAF? > > > > Yes. From the overall view of Python User Defined Function, we use @udf > to > > describe general python udf and pandas udf, @udtf to describe python > udtf, > > and @udaf to describe general python udaf and pandas udaf, which is more > > unified. I will discuss it in FLIP-139 later. > > > > Best, > > Xingbo > > > > jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:05写道: > > > > > Hi Xingbo, > > > > > > Thanks for the discussion! Overall, + 1 for this FLIP. > > > I have two points to add: > > > > > > - We also need to consider how pandas UDAF supports metrics, and > whether > > > we need a custom interface for pandas UDAF? > > > - We have added @udaf(), so whether to use ordinary Python UDAF? If > not, > > > the addition of @udaf is not appropriate. We need to discuss it > further. > > > > > > We can consider it combination with FLIP-139 for design. What do you > > think? > > > > > > Best, > > > Jincheng > > > > > > > > > Xingbo Huang <[hidden email]> 于2020年8月24日周一 下午2:25写道: > > > > > > > Hi everyone, > > > > > > > > I would like to start a discussion thread on "Support Pandas UDAF in > > > > PyFlink" > > > > > > > > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves > the > > > > high serialization/deserialization overhead in Python UDF and makes > it > > > > convenient to leverage the popular Python libraries such as Pandas, > > > Numpy, > > > > etc. Since Pandas UDF has so many advantages, we want to support > Pandas > > > > UDAF to extend usage of Pandas UDF. > > > > > > > > Dian Fu and I have discussed offline and have drafted the > FLIP-137[2]. > > It > > > > includes the following items: > > > > - Support Pandas UDAF in Batch Group Aggregation > > > > - Support Pandas UDAF in Batch Group Window Aggregation > > > > - Support Pandas UDAF in Batch Over Window Aggregation > > > > - Support Pandas UDAF in Stream Group Window Aggregation > > > > - Support Pandas UDAF in Stream Bounded Over Window Aggregation > > > > > > > > > > > > Looking forward to your feedback! > > > > > > > > Best, > > > > Xingbo > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > > > > [2] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink > > > > > > > > > > |
Thank you! looking forward to the voting :)
Best, Jincheng Xingbo Huang <[hidden email]> 于2020年9月3日周四 下午2:39写道: > Hi Jincheng, > > Yes, I agree that users can extend the class `AggregateFunction` if they > want to define a Pandas UDAF by the way of custom classes. I have updated > the part of the FLIP. > > Best, > Xingbo > > jincheng sun <[hidden email]> 于2020年9月3日周四 下午1:48写道: > > > Thanks for the update Xingbo! > > > > Pandas UDAF can reuse the `class aggregate function (user defined > > function)` interface in FLIP-139, and the core logic of Pandas UDAF users > > is written in the `accumulate` method. In this way, we can unify the > > interface semantics of all UDAF. > > > > What do you think? > > > > Best, > > Jincheng > > > > > > > > Xingbo Huang <[hidden email]> 于2020年8月31日周一 下午6:06写道: > > > > > Hi Jincheng, > > > > > > Thanks a lot for joining the discussion and the suggestion of > discussing > > > FLIP-137 and FLIP-139 together. > > > > > > >> 1. We also need to consider how pandas UDAF supports metrics, and > > > whether > > > we need a custom interface for pandas UDAF? > > > > > > Yes. We need to add an interface so that users can add some logic in > the > > > `open` or `close` method such as creating metrics. I have added the > > > definition of the interface and the corresponding example in the doc. > > > > > > >> 2. We have added @udaf(), so whether to use ordinary Python UDAF? > > > > > > Yes. From the overall view of Python User Defined Function, we use @udf > > to > > > describe general python udf and pandas udf, @udtf to describe python > > udtf, > > > and @udaf to describe general python udaf and pandas udaf, which is > more > > > unified. I will discuss it in FLIP-139 later. > > > > > > Best, > > > Xingbo > > > > > > jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:05写道: > > > > > > > Hi Xingbo, > > > > > > > > Thanks for the discussion! Overall, + 1 for this FLIP. > > > > I have two points to add: > > > > > > > > - We also need to consider how pandas UDAF supports metrics, and > > whether > > > > we need a custom interface for pandas UDAF? > > > > - We have added @udaf(), so whether to use ordinary Python UDAF? If > > not, > > > > the addition of @udaf is not appropriate. We need to discuss it > > further. > > > > > > > > We can consider it combination with FLIP-139 for design. What do you > > > think? > > > > > > > > Best, > > > > Jincheng > > > > > > > > > > > > Xingbo Huang <[hidden email]> 于2020年8月24日周一 下午2:25写道: > > > > > > > > > Hi everyone, > > > > > > > > > > I would like to start a discussion thread on "Support Pandas UDAF > in > > > > > PyFlink" > > > > > > > > > > Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves > > the > > > > > high serialization/deserialization overhead in Python UDF and makes > > it > > > > > convenient to leverage the popular Python libraries such as Pandas, > > > > Numpy, > > > > > etc. Since Pandas UDF has so many advantages, we want to support > > Pandas > > > > > UDAF to extend usage of Pandas UDF. > > > > > > > > > > Dian Fu and I have discussed offline and have drafted the > > FLIP-137[2]. > > > It > > > > > includes the following items: > > > > > - Support Pandas UDAF in Batch Group Aggregation > > > > > - Support Pandas UDAF in Batch Group Window Aggregation > > > > > - Support Pandas UDAF in Batch Over Window Aggregation > > > > > - Support Pandas UDAF in Stream Group Window Aggregation > > > > > - Support Pandas UDAF in Stream Bounded Over Window Aggregation > > > > > > > > > > > > > > > Looking forward to your feedback! > > > > > > > > > > Best, > > > > > Xingbo > > > > > > > > > > [1] > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > > > > > [2] > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink > > > > > > > > > > > > > > > |
Thanks for preparing the FLIP, xingbo!
LGTM overall and looking forward to the voting! Regards, Dian > 在 2020年9月3日,下午5:22,jincheng sun <[hidden email]> 写道: > > Thank you! looking forward to the voting :) > > Best, > Jincheng > > > Xingbo Huang <[hidden email]> 于2020年9月3日周四 下午2:39写道: > >> Hi Jincheng, >> >> Yes, I agree that users can extend the class `AggregateFunction` if they >> want to define a Pandas UDAF by the way of custom classes. I have updated >> the part of the FLIP. >> >> Best, >> Xingbo >> >> jincheng sun <[hidden email]> 于2020年9月3日周四 下午1:48写道: >> >>> Thanks for the update Xingbo! >>> >>> Pandas UDAF can reuse the `class aggregate function (user defined >>> function)` interface in FLIP-139, and the core logic of Pandas UDAF users >>> is written in the `accumulate` method. In this way, we can unify the >>> interface semantics of all UDAF. >>> >>> What do you think? >>> >>> Best, >>> Jincheng >>> >>> >>> >>> Xingbo Huang <[hidden email]> 于2020年8月31日周一 下午6:06写道: >>> >>>> Hi Jincheng, >>>> >>>> Thanks a lot for joining the discussion and the suggestion of >> discussing >>>> FLIP-137 and FLIP-139 together. >>>> >>>>>> 1. We also need to consider how pandas UDAF supports metrics, and >>>> whether >>>> we need a custom interface for pandas UDAF? >>>> >>>> Yes. We need to add an interface so that users can add some logic in >> the >>>> `open` or `close` method such as creating metrics. I have added the >>>> definition of the interface and the corresponding example in the doc. >>>> >>>>>> 2. We have added @udaf(), so whether to use ordinary Python UDAF? >>>> >>>> Yes. From the overall view of Python User Defined Function, we use @udf >>> to >>>> describe general python udf and pandas udf, @udtf to describe python >>> udtf, >>>> and @udaf to describe general python udaf and pandas udaf, which is >> more >>>> unified. I will discuss it in FLIP-139 later. >>>> >>>> Best, >>>> Xingbo >>>> >>>> jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:05写道: >>>> >>>>> Hi Xingbo, >>>>> >>>>> Thanks for the discussion! Overall, + 1 for this FLIP. >>>>> I have two points to add: >>>>> >>>>> - We also need to consider how pandas UDAF supports metrics, and >>> whether >>>>> we need a custom interface for pandas UDAF? >>>>> - We have added @udaf(), so whether to use ordinary Python UDAF? If >>> not, >>>>> the addition of @udaf is not appropriate. We need to discuss it >>> further. >>>>> >>>>> We can consider it combination with FLIP-139 for design. What do you >>>> think? >>>>> >>>>> Best, >>>>> Jincheng >>>>> >>>>> >>>>> Xingbo Huang <[hidden email]> 于2020年8月24日周一 下午2:25写道: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I would like to start a discussion thread on "Support Pandas UDAF >> in >>>>>> PyFlink" >>>>>> >>>>>> Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves >>> the >>>>>> high serialization/deserialization overhead in Python UDF and makes >>> it >>>>>> convenient to leverage the popular Python libraries such as Pandas, >>>>> Numpy, >>>>>> etc. Since Pandas UDF has so many advantages, we want to support >>> Pandas >>>>>> UDAF to extend usage of Pandas UDF. >>>>>> >>>>>> Dian Fu and I have discussed offline and have drafted the >>> FLIP-137[2]. >>>> It >>>>>> includes the following items: >>>>>> - Support Pandas UDAF in Batch Group Aggregation >>>>>> - Support Pandas UDAF in Batch Group Window Aggregation >>>>>> - Support Pandas UDAF in Batch Over Window Aggregation >>>>>> - Support Pandas UDAF in Stream Group Window Aggregation >>>>>> - Support Pandas UDAF in Stream Bounded Over Window Aggregation >>>>>> >>>>>> >>>>>> Looking forward to your feedback! >>>>>> >>>>>> Best, >>>>>> Xingbo >>>>>> >>>>>> [1] >>>>>> >>>>>> >>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink >>>>>> [2] >>>>>> >>>>>> >>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink >>>>>> >>>>> >>>> >>> >> |
Hi everyone,
Thanks all of you for the discussion. If there are no objections, I would like to start a vote thread tomorrow. Best, Xingbo Dian Fu <[hidden email]> 于2020年9月3日周四 下午5:45写道: > Thanks for preparing the FLIP, xingbo! > > LGTM overall and looking forward to the voting! > > Regards, > Dian > > > 在 2020年9月3日,下午5:22,jincheng sun <[hidden email]> 写道: > > > > Thank you! looking forward to the voting :) > > > > Best, > > Jincheng > > > > > > Xingbo Huang <[hidden email]> 于2020年9月3日周四 下午2:39写道: > > > >> Hi Jincheng, > >> > >> Yes, I agree that users can extend the class `AggregateFunction` if they > >> want to define a Pandas UDAF by the way of custom classes. I have > updated > >> the part of the FLIP. > >> > >> Best, > >> Xingbo > >> > >> jincheng sun <[hidden email]> 于2020年9月3日周四 下午1:48写道: > >> > >>> Thanks for the update Xingbo! > >>> > >>> Pandas UDAF can reuse the `class aggregate function (user defined > >>> function)` interface in FLIP-139, and the core logic of Pandas UDAF > users > >>> is written in the `accumulate` method. In this way, we can unify the > >>> interface semantics of all UDAF. > >>> > >>> What do you think? > >>> > >>> Best, > >>> Jincheng > >>> > >>> > >>> > >>> Xingbo Huang <[hidden email]> 于2020年8月31日周一 下午6:06写道: > >>> > >>>> Hi Jincheng, > >>>> > >>>> Thanks a lot for joining the discussion and the suggestion of > >> discussing > >>>> FLIP-137 and FLIP-139 together. > >>>> > >>>>>> 1. We also need to consider how pandas UDAF supports metrics, and > >>>> whether > >>>> we need a custom interface for pandas UDAF? > >>>> > >>>> Yes. We need to add an interface so that users can add some logic in > >> the > >>>> `open` or `close` method such as creating metrics. I have added the > >>>> definition of the interface and the corresponding example in the doc. > >>>> > >>>>>> 2. We have added @udaf(), so whether to use ordinary Python UDAF? > >>>> > >>>> Yes. From the overall view of Python User Defined Function, we use > @udf > >>> to > >>>> describe general python udf and pandas udf, @udtf to describe python > >>> udtf, > >>>> and @udaf to describe general python udaf and pandas udaf, which is > >> more > >>>> unified. I will discuss it in FLIP-139 later. > >>>> > >>>> Best, > >>>> Xingbo > >>>> > >>>> jincheng sun <[hidden email]> 于2020年8月31日周一 上午11:05写道: > >>>> > >>>>> Hi Xingbo, > >>>>> > >>>>> Thanks for the discussion! Overall, + 1 for this FLIP. > >>>>> I have two points to add: > >>>>> > >>>>> - We also need to consider how pandas UDAF supports metrics, and > >>> whether > >>>>> we need a custom interface for pandas UDAF? > >>>>> - We have added @udaf(), so whether to use ordinary Python UDAF? If > >>> not, > >>>>> the addition of @udaf is not appropriate. We need to discuss it > >>> further. > >>>>> > >>>>> We can consider it combination with FLIP-139 for design. What do you > >>>> think? > >>>>> > >>>>> Best, > >>>>> Jincheng > >>>>> > >>>>> > >>>>> Xingbo Huang <[hidden email]> 于2020年8月24日周一 下午2:25写道: > >>>>> > >>>>>> Hi everyone, > >>>>>> > >>>>>> I would like to start a discussion thread on "Support Pandas UDAF > >> in > >>>>>> PyFlink" > >>>>>> > >>>>>> Pandas UDF has been supported in FLINK 1.11 (FLIP-97[1]). It solves > >>> the > >>>>>> high serialization/deserialization overhead in Python UDF and makes > >>> it > >>>>>> convenient to leverage the popular Python libraries such as Pandas, > >>>>> Numpy, > >>>>>> etc. Since Pandas UDF has so many advantages, we want to support > >>> Pandas > >>>>>> UDAF to extend usage of Pandas UDF. > >>>>>> > >>>>>> Dian Fu and I have discussed offline and have drafted the > >>> FLIP-137[2]. > >>>> It > >>>>>> includes the following items: > >>>>>> - Support Pandas UDAF in Batch Group Aggregation > >>>>>> - Support Pandas UDAF in Batch Group Window Aggregation > >>>>>> - Support Pandas UDAF in Batch Over Window Aggregation > >>>>>> - Support Pandas UDAF in Stream Group Window Aggregation > >>>>>> - Support Pandas UDAF in Stream Bounded Over Window Aggregation > >>>>>> > >>>>>> > >>>>>> Looking forward to your feedback! > >>>>>> > >>>>>> Best, > >>>>>> Xingbo > >>>>>> > >>>>>> [1] > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-97%3A+Support+Scalar+Vectorized+Python+UDF+in+PyFlink > >>>>>> [2] > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-137%3A+Support+Pandas+UDAF+in+PyFlink > >>>>>> > >>>>> > >>>> > >>> > >> > > |
Free forum by Nabble | Edit this page |