[DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Xingbo Huang
Hi everyone,

I would like to start a discussion thread on "Support Cython Optimizing
Python User Defined Function"

Scalar Python UDF FLIP-58[1] has already been supported in release 1.10 and
Python UDTF will be supported in the coming release of 1.11. In release
1.10, we focused on supporting UDF features and did not make many
optimizations in terms of performance. Although we have made a lot of
optimizations in master[2], Cython can further greatly improve the
performance of Python UDF.

Robert Metzger, Jincheng Sun and I have discussed offline and have drafted
the FLIP-121[3]. It includes the following items:

- Introduces Cython implementation of coder and operations

- Doc changes for building sdist and wheel packages from source code

- Solutions for packages building


Looking forward to your feedback!

Best,

Xingbo

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table

[2] https://issues.apache.org/jira/browse/FLINK-16747

[3]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Robert Metzger
Thank you for posting the FLIP.

The proposed integration with Azure Pipelines looks good to me.

On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <[hidden email]> wrote:

> Hi everyone,
>
> I would like to start a discussion thread on "Support Cython Optimizing
> Python User Defined Function"
>
> Scalar Python UDF FLIP-58[1] has already been supported in release 1.10 and
> Python UDTF will be supported in the coming release of 1.11. In release
> 1.10, we focused on supporting UDF features and did not make many
> optimizations in terms of performance. Although we have made a lot of
> optimizations in master[2], Cython can further greatly improve the
> performance of Python UDF.
>
> Robert Metzger, Jincheng Sun and I have discussed offline and have drafted
> the FLIP-121[3]. It includes the following items:
>
> - Introduces Cython implementation of coder and operations
>
> - Doc changes for building sdist and wheel packages from source code
>
> - Solutions for packages building
>
>
> Looking forward to your feedback!
>
> Best,
>
> Xingbo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>
> [2] https://issues.apache.org/jira/browse/FLINK-16747
>
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Dian Fu-2
Hi Xingbo,

Thanks a lot for the great work. Big +1 to this feature. The performance improvement is impressive.

Regards,
Dian

> 在 2020年4月7日,下午12:38,Robert Metzger <[hidden email]> 写道:
>
> Thank you for posting the FLIP.
>
> The proposed integration with Azure Pipelines looks good to me.
>
> On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <[hidden email]> wrote:
>
>> Hi everyone,
>>
>> I would like to start a discussion thread on "Support Cython Optimizing
>> Python User Defined Function"
>>
>> Scalar Python UDF FLIP-58[1] has already been supported in release 1.10 and
>> Python UDTF will be supported in the coming release of 1.11. In release
>> 1.10, we focused on supporting UDF features and did not make many
>> optimizations in terms of performance. Although we have made a lot of
>> optimizations in master[2], Cython can further greatly improve the
>> performance of Python UDF.
>>
>> Robert Metzger, Jincheng Sun and I have discussed offline and have drafted
>> the FLIP-121[3]. It includes the following items:
>>
>> - Introduces Cython implementation of coder and operations
>>
>> - Doc changes for building sdist and wheel packages from source code
>>
>> - Solutions for packages building
>>
>>
>> Looking forward to your feedback!
>>
>> Best,
>>
>> Xingbo
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>>
>> [2] https://issues.apache.org/jira/browse/FLINK-16747
>>
>> [3]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

jincheng sun
Hi Xingbo,

Thanks for bring up this discussion!

I agree with Robert, +1 for integration with Azure.

Best,
Jincheng

Dian Fu <[hidden email]> 于2020年4月7日周二 下午2:21写道:

> Hi Xingbo,
>
> Thanks a lot for the great work. Big +1 to this feature. The performance
> improvement is impressive.
>
> Regards,
> Dian
>
> > 在 2020年4月7日,下午12:38,Robert Metzger <[hidden email]> 写道:
> >
> > Thank you for posting the FLIP.
> >
> > The proposed integration with Azure Pipelines looks good to me.
> >
> > On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <[hidden email]> wrote:
> >
> >> Hi everyone,
> >>
> >> I would like to start a discussion thread on "Support Cython Optimizing
> >> Python User Defined Function"
> >>
> >> Scalar Python UDF FLIP-58[1] has already been supported in release 1.10
> and
> >> Python UDTF will be supported in the coming release of 1.11. In release
> >> 1.10, we focused on supporting UDF features and did not make many
> >> optimizations in terms of performance. Although we have made a lot of
> >> optimizations in master[2], Cython can further greatly improve the
> >> performance of Python UDF.
> >>
> >> Robert Metzger, Jincheng Sun and I have discussed offline and have
> drafted
> >> the FLIP-121[3]. It includes the following items:
> >>
> >> - Introduces Cython implementation of coder and operations
> >>
> >> - Doc changes for building sdist and wheel packages from source code
> >>
> >> - Solutions for packages building
> >>
> >>
> >> Looking forward to your feedback!
> >>
> >> Best,
> >>
> >> Xingbo
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >>
> >> [2] https://issues.apache.org/jira/browse/FLINK-16747
> >>
> >> [3]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Xingbo Huang
Hi everyone,

Thanks all of you for the discussion.
If there are no objections, I would like to start a vote thread tomorrow.

Best,
Xingbo

jincheng sun <[hidden email]> 于2020年4月7日周二 下午6:22写道:

> Hi Xingbo,
>
> Thanks for bring up this discussion!
>
> I agree with Robert, +1 for integration with Azure.
>
> Best,
> Jincheng
>
> Dian Fu <[hidden email]> 于2020年4月7日周二 下午2:21写道:
>
> > Hi Xingbo,
> >
> > Thanks a lot for the great work. Big +1 to this feature. The performance
> > improvement is impressive.
> >
> > Regards,
> > Dian
> >
> > > 在 2020年4月7日,下午12:38,Robert Metzger <[hidden email]> 写道:
> > >
> > > Thank you for posting the FLIP.
> > >
> > > The proposed integration with Azure Pipelines looks good to me.
> > >
> > > On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <[hidden email]>
> wrote:
> > >
> > >> Hi everyone,
> > >>
> > >> I would like to start a discussion thread on "Support Cython
> Optimizing
> > >> Python User Defined Function"
> > >>
> > >> Scalar Python UDF FLIP-58[1] has already been supported in release
> 1.10
> > and
> > >> Python UDTF will be supported in the coming release of 1.11. In
> release
> > >> 1.10, we focused on supporting UDF features and did not make many
> > >> optimizations in terms of performance. Although we have made a lot of
> > >> optimizations in master[2], Cython can further greatly improve the
> > >> performance of Python UDF.
> > >>
> > >> Robert Metzger, Jincheng Sun and I have discussed offline and have
> > drafted
> > >> the FLIP-121[3]. It includes the following items:
> > >>
> > >> - Introduces Cython implementation of coder and operations
> > >>
> > >> - Doc changes for building sdist and wheel packages from source code
> > >>
> > >> - Solutions for packages building
> > >>
> > >>
> > >> Looking forward to your feedback!
> > >>
> > >> Best,
> > >>
> > >> Xingbo
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > >>
> > >> [2] https://issues.apache.org/jira/browse/FLINK-16747
> > >>
> > >> [3]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
> > >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-121: Support Cython Optimizing Python User Defined Function

Hequn Cheng-2
Hi,

+1 on integrating with Azure, it is consistent with the long term goal and
we are also going to switch from Travis to Azure.
The performance improvement is very impressive. Looking forward to the vote.

Best, Hequn

On Tue, Apr 7, 2020 at 9:10 PM Xingbo Huang <[hidden email]> wrote:

> Hi everyone,
>
> Thanks all of you for the discussion.
> If there are no objections, I would like to start a vote thread tomorrow.
>
> Best,
> Xingbo
>
> jincheng sun <[hidden email]> 于2020年4月7日周二 下午6:22写道:
>
> > Hi Xingbo,
> >
> > Thanks for bring up this discussion!
> >
> > I agree with Robert, +1 for integration with Azure.
> >
> > Best,
> > Jincheng
> >
> > Dian Fu <[hidden email]> 于2020年4月7日周二 下午2:21写道:
> >
> > > Hi Xingbo,
> > >
> > > Thanks a lot for the great work. Big +1 to this feature. The
> performance
> > > improvement is impressive.
> > >
> > > Regards,
> > > Dian
> > >
> > > > 在 2020年4月7日,下午12:38,Robert Metzger <[hidden email]> 写道:
> > > >
> > > > Thank you for posting the FLIP.
> > > >
> > > > The proposed integration with Azure Pipelines looks good to me.
> > > >
> > > > On Tue, Mar 31, 2020 at 1:23 PM Xingbo Huang <[hidden email]>
> > wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I would like to start a discussion thread on "Support Cython
> > Optimizing
> > > >> Python User Defined Function"
> > > >>
> > > >> Scalar Python UDF FLIP-58[1] has already been supported in release
> > 1.10
> > > and
> > > >> Python UDTF will be supported in the coming release of 1.11. In
> > release
> > > >> 1.10, we focused on supporting UDF features and did not make many
> > > >> optimizations in terms of performance. Although we have made a lot
> of
> > > >> optimizations in master[2], Cython can further greatly improve the
> > > >> performance of Python UDF.
> > > >>
> > > >> Robert Metzger, Jincheng Sun and I have discussed offline and have
> > > drafted
> > > >> the FLIP-121[3]. It includes the following items:
> > > >>
> > > >> - Introduces Cython implementation of coder and operations
> > > >>
> > > >> - Doc changes for building sdist and wheel packages from source code
> > > >>
> > > >> - Solutions for packages building
> > > >>
> > > >>
> > > >> Looking forward to your feedback!
> > > >>
> > > >> Best,
> > > >>
> > > >> Xingbo
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > > >>
> > > >> [2] https://issues.apache.org/jira/browse/FLINK-16747
> > > >>
> > > >> [3]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-121%3A+Support+Cython+Optimizing+Python+User+Defined+Function
> > > >>
> > >
> > >
> >
>