[DISCUSS] FLIP-38 Support python language in flink TableAPI

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
Hi All,
As Xianda brought up in the previous email, There are a large number of
data analysis users who want flink to support Python. At the Flink API
level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
become the first-class citizen. Table API is declarative and can be
automatically optimized, which is mentioned in the Flink mid-term roadmap
by Stephan. So we first considering supporting Python at the Table level to
cater to the current large number of analytics users. For further promote
Python support in flink table level. Dian, Wei and I discussed offline a
bit and came up with an initial features outline as follows:

- Python TableAPI Interface
  Introduce a set of Python Table API interfaces, including interface
definitions such as Table, TableEnvironment, TableConfig, etc.

- Implementation Architecture
  We will offer two alternative architecture options, one for pure Python
language support and one for extended multi-language design.

- Job Submission
  Provide a way that can submit(local/remote) Python Table API jobs.

- Python Shell
  Python Shell is to provide an interactive way for users to write and
execute flink Python Table API jobs.


The design document for FLIP-38 can be found here:

https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing

I am looking forward to your comments and feedback.

Best,
Jincheng
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Jeff Zhang
Thanks jincheng for driving this. Overall I agree with the approach, just
left a few comments for details.



jincheng sun <[hidden email]> 于2019年4月2日周二 下午4:03写道:

> Hi All,
> As Xianda brought up in the previous email, There are a large number of
> data analysis users who want flink to support Python. At the Flink API
> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> become the first-class citizen. Table API is declarative and can be
> automatically optimized, which is mentioned in the Flink mid-term roadmap
> by Stephan. So we first considering supporting Python at the Table level to
> cater to the current large number of analytics users. For further promote
> Python support in flink table level. Dian, Wei and I discussed offline a
> bit and came up with an initial features outline as follows:
>
> - Python TableAPI Interface
>   Introduce a set of Python Table API interfaces, including interface
> definitions such as Table, TableEnvironment, TableConfig, etc.
>
> - Implementation Architecture
>   We will offer two alternative architecture options, one for pure Python
> language support and one for extended multi-language design.
>
> - Job Submission
>   Provide a way that can submit(local/remote) Python Table API jobs.
>
> - Python Shell
>   Python Shell is to provide an interactive way for users to write and
> execute flink Python Table API jobs.
>
>
> The design document for FLIP-38 can be found here:
>
>
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
>
> I am looking forward to your comments and feedback.
>
> Best,
> Jincheng
>


--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Shuyi Chen
In reply to this post by jincheng sun
Thanks a lot for driving the FLIP, jincheng. The approach looks
good. Adding multi-lang support sounds a promising direction to expand the
footprint of Flink. Do we have plan for adding Golang support? As many
backend engineers nowadays are familiar with Go, but probably not Java as
much, adding Golang support would significantly reduce their friction to
use Flink. Also, do we have a design for multi-lang UDF support, and what's
timeline for adding DataStream API support? We would like to help and
contribute as well as we do have similar need internally at our company.
Thanks a lot.

Shuyi

On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
wrote:

> Hi All,
> As Xianda brought up in the previous email, There are a large number of
> data analysis users who want flink to support Python. At the Flink API
> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> become the first-class citizen. Table API is declarative and can be
> automatically optimized, which is mentioned in the Flink mid-term roadmap
> by Stephan. So we first considering supporting Python at the Table level to
> cater to the current large number of analytics users. For further promote
> Python support in flink table level. Dian, Wei and I discussed offline a
> bit and came up with an initial features outline as follows:
>
> - Python TableAPI Interface
>   Introduce a set of Python Table API interfaces, including interface
> definitions such as Table, TableEnvironment, TableConfig, etc.
>
> - Implementation Architecture
>   We will offer two alternative architecture options, one for pure Python
> language support and one for extended multi-language design.
>
> - Job Submission
>   Provide a way that can submit(local/remote) Python Table API jobs.
>
> - Python Shell
>   Python Shell is to provide an interactive way for users to write and
> execute flink Python Table API jobs.
>
>
> The design document for FLIP-38 can be found here:
>
>
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
>
> I am looking forward to your comments and feedback.
>
> Best,
> Jincheng
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Thomas Weise
Thanks for putting this proposal together.

It would be nice, if you could share a few use case examples (maybe add
them as section to the FLIP?).

The reason I ask: The table API is immensely useful, but it isn't clear to
me what value other language bindings provide without UDF support. With
FLIP-38 it will be possible to write a program in Python, but not execute
Python functions. Without UDF support, isn't it possible to achieve roughly
the same with plain SQL? In which situation would I use the Python API?

There was related discussion regarding UDF support in [1]. If the
assumption is that such support will be added later, then I would like to
circle back to the question why this cannot be built on top of Beam? It
would be nice to clarify the bigger goal before embarking for the first
milestone.

I'm going to comment on other things in the doc.

[1]
https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E

Thomas


On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]> wrote:

> Thanks a lot for driving the FLIP, jincheng. The approach looks
> good. Adding multi-lang support sounds a promising direction to expand the
> footprint of Flink. Do we have plan for adding Golang support? As many
> backend engineers nowadays are familiar with Go, but probably not Java as
> much, adding Golang support would significantly reduce their friction to
> use Flink. Also, do we have a design for multi-lang UDF support, and what's
> timeline for adding DataStream API support? We would like to help and
> contribute as well as we do have similar need internally at our company.
> Thanks a lot.
>
> Shuyi
>
> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
> wrote:
>
> > Hi All,
> > As Xianda brought up in the previous email, There are a large number of
> > data analysis users who want flink to support Python. At the Flink API
> > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> > become the first-class citizen. Table API is declarative and can be
> > automatically optimized, which is mentioned in the Flink mid-term roadmap
> > by Stephan. So we first considering supporting Python at the Table level
> to
> > cater to the current large number of analytics users. For further promote
> > Python support in flink table level. Dian, Wei and I discussed offline a
> > bit and came up with an initial features outline as follows:
> >
> > - Python TableAPI Interface
> >   Introduce a set of Python Table API interfaces, including interface
> > definitions such as Table, TableEnvironment, TableConfig, etc.
> >
> > - Implementation Architecture
> >   We will offer two alternative architecture options, one for pure Python
> > language support and one for extended multi-language design.
> >
> > - Job Submission
> >   Provide a way that can submit(local/remote) Python Table API jobs.
> >
> > - Python Shell
> >   Python Shell is to provide an interactive way for users to write and
> > execute flink Python Table API jobs.
> >
> >
> > The design document for FLIP-38 can be found here:
> >
> >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >
> > I am looking forward to your comments and feedback.
> >
> > Best,
> > Jincheng
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
In reply to this post by Jeff Zhang
Hi Jeff, thank you for your encouragement and valuable comments on the
google doc.

I saw that you participated in many pySpark discussions a long time ago, I
am very grateful and look forward to your follow-up comments on pyFlink!

Thanks,
Jincheng

Jeff Zhang <[hidden email]> 于2019年4月2日周二 下午10:53写道:

> Thanks jincheng for driving this. Overall I agree with the approach, just
> left a few comments for details.
>
>
>
> jincheng sun <[hidden email]> 于2019年4月2日周二 下午4:03写道:
>
> > Hi All,
> > As Xianda brought up in the previous email, There are a large number of
> > data analysis users who want flink to support Python. At the Flink API
> > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> > become the first-class citizen. Table API is declarative and can be
> > automatically optimized, which is mentioned in the Flink mid-term roadmap
> > by Stephan. So we first considering supporting Python at the Table level
> to
> > cater to the current large number of analytics users. For further promote
> > Python support in flink table level. Dian, Wei and I discussed offline a
> > bit and came up with an initial features outline as follows:
> >
> > - Python TableAPI Interface
> >   Introduce a set of Python Table API interfaces, including interface
> > definitions such as Table, TableEnvironment, TableConfig, etc.
> >
> > - Implementation Architecture
> >   We will offer two alternative architecture options, one for pure Python
> > language support and one for extended multi-language design.
> >
> > - Job Submission
> >   Provide a way that can submit(local/remote) Python Table API jobs.
> >
> > - Python Shell
> >   Python Shell is to provide an interactive way for users to write and
> > execute flink Python Table API jobs.
> >
> >
> > The design document for FLIP-38 can be found here:
> >
> >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >
> > I am looking forward to your comments and feedback.
> >
> > Best,
> > Jincheng
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
In reply to this post by Shuyi Chen
Hi Shuyi,

Glad to see your feedback and port more requirements about multi-language!

I think the Flink community is very much looking forward to more language
support, of course, Golang should be in the future support list.
Since the topic of supporting Python on Flink has been researched and
discussed in the community for a long time, and I want to support Python in
the Table API as the first stage, then other languages should be planed to
support. but I do not think more about the detail about how/when support
Golang. And very welcome to share more ideas on how to support Golang if
you have more thoughts. :)

Regarding UDF, we do have some ideas and design attempts. The related
attempts to show the performance of python UDF are not optimistic. And
there are also some problems with Python environment management should be
considered. After we have more investigations and experiments, I will share
the discussion with you in time. Perhaps after the first stage(Python
TableAPI support), We will then discuss the detailed discussion of UDF
support.

I think the support of the DataStream API should be considered after
supporting UDFs because DataStream is mostly supported by various
functions.

We plan to complete the first phase before the release of Flink-1.9, and
start the UDF support after 1.9. Of course,  I am very glad to hear that
you want to contribute to the Flink multi-language support. I believe,
nothing is impossible if more people interest in Python Table API with UDF
support and more people want to contribute community more, UDF may be there
when flink-1.9 release. :)

Best,
Jincheng

Shuyi Chen <[hidden email]> 于2019年4月4日周四 上午3:35写道:

> Thanks a lot for driving the FLIP, jincheng. The approach looks
> good. Adding multi-lang support sounds a promising direction to expand the
> footprint of Flink. Do we have plan for adding Golang support? As many
> backend engineers nowadays are familiar with Go, but probably not Java as
> much, adding Golang support would significantly reduce their friction to
> use Flink. Also, do we have a design for multi-lang UDF support, and what's
> timeline for adding DataStream API support? We would like to help and
> contribute as well as we do have similar need internally at our company.
> Thanks a lot.
>
> Shuyi
>
> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
> wrote:
>
> > Hi All,
> > As Xianda brought up in the previous email, There are a large number of
> > data analysis users who want flink to support Python. At the Flink API
> > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
> > become the first-class citizen. Table API is declarative and can be
> > automatically optimized, which is mentioned in the Flink mid-term roadmap
> > by Stephan. So we first considering supporting Python at the Table level
> to
> > cater to the current large number of analytics users. For further promote
> > Python support in flink table level. Dian, Wei and I discussed offline a
> > bit and came up with an initial features outline as follows:
> >
> > - Python TableAPI Interface
> >   Introduce a set of Python Table API interfaces, including interface
> > definitions such as Table, TableEnvironment, TableConfig, etc.
> >
> > - Implementation Architecture
> >   We will offer two alternative architecture options, one for pure Python
> > language support and one for extended multi-language design.
> >
> > - Job Submission
> >   Provide a way that can submit(local/remote) Python Table API jobs.
> >
> > - Python Shell
> >   Python Shell is to provide an interactive way for users to write and
> > execute flink Python Table API jobs.
> >
> >
> > The design document for FLIP-38 can be found here:
> >
> >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >
> > I am looking forward to your comments and feedback.
> >
> > Best,
> > Jincheng
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

dwysakowicz
Hi all,

Thank you very much Jincheng for the very thorough proposal. I was
following the discussion very briefly, but I have an impression that the
consensus in the previous discussion[1] was that we do not want to have
an independent, flink specific multi language support but we want to
collaborate on that manner with the Beam community. I think this is also
the concern Thomas raised[2].

Let's make sure we do not contradict with what was said in[1]. Could you
elaborate more how does it fit in the Beam-Flink multi language support?

Best,

Dawid

[1]
https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E

[2]
https://lists.apache.org/thread.html/da6cd815fa601d81be9f706aaa4d2c595db0b52c40a9040238b830c7@%3Cdev.flink.apache.org%3E


On 04/04/2019 08:31, jincheng sun wrote:

> Hi Shuyi,
>
> Glad to see your feedback and port more requirements about multi-language!
>
> I think the Flink community is very much looking forward to more language
> support, of course, Golang should be in the future support list.
> Since the topic of supporting Python on Flink has been researched and
> discussed in the community for a long time, and I want to support Python in
> the Table API as the first stage, then other languages should be planed to
> support. but I do not think more about the detail about how/when support
> Golang. And very welcome to share more ideas on how to support Golang if
> you have more thoughts. :)
>
> Regarding UDF, we do have some ideas and design attempts. The related
> attempts to show the performance of python UDF are not optimistic. And
> there are also some problems with Python environment management should be
> considered. After we have more investigations and experiments, I will share
> the discussion with you in time. Perhaps after the first stage(Python
> TableAPI support), We will then discuss the detailed discussion of UDF
> support.
>
> I think the support of the DataStream API should be considered after
> supporting UDFs because DataStream is mostly supported by various
> functions.
>
> We plan to complete the first phase before the release of Flink-1.9, and
> start the UDF support after 1.9. Of course,  I am very glad to hear that
> you want to contribute to the Flink multi-language support. I believe,
> nothing is impossible if more people interest in Python Table API with UDF
> support and more people want to contribute community more, UDF may be there
> when flink-1.9 release. :)
>
> Best,
> Jincheng
>
> Shuyi Chen <[hidden email]> 于2019年4月4日周四 上午3:35写道:
>
>> Thanks a lot for driving the FLIP, jincheng. The approach looks
>> good. Adding multi-lang support sounds a promising direction to expand the
>> footprint of Flink. Do we have plan for adding Golang support? As many
>> backend engineers nowadays are familiar with Go, but probably not Java as
>> much, adding Golang support would significantly reduce their friction to
>> use Flink. Also, do we have a design for multi-lang UDF support, and what's
>> timeline for adding DataStream API support? We would like to help and
>> contribute as well as we do have similar need internally at our company.
>> Thanks a lot.
>>
>> Shuyi
>>
>> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
>> wrote:
>>
>>> Hi All,
>>> As Xianda brought up in the previous email, There are a large number of
>>> data analysis users who want flink to support Python. At the Flink API
>>> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API will
>>> become the first-class citizen. Table API is declarative and can be
>>> automatically optimized, which is mentioned in the Flink mid-term roadmap
>>> by Stephan. So we first considering supporting Python at the Table level
>> to
>>> cater to the current large number of analytics users. For further promote
>>> Python support in flink table level. Dian, Wei and I discussed offline a
>>> bit and came up with an initial features outline as follows:
>>>
>>> - Python TableAPI Interface
>>>   Introduce a set of Python Table API interfaces, including interface
>>> definitions such as Table, TableEnvironment, TableConfig, etc.
>>>
>>> - Implementation Architecture
>>>   We will offer two alternative architecture options, one for pure Python
>>> language support and one for extended multi-language design.
>>>
>>> - Job Submission
>>>   Provide a way that can submit(local/remote) Python Table API jobs.
>>>
>>> - Python Shell
>>>   Python Shell is to provide an interactive way for users to write and
>>> execute flink Python Table API jobs.
>>>
>>>
>>> The design document for FLIP-38 can be found here:
>>>
>>>
>>>
>> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
>>> I am looking forward to your comments and feedback.
>>>
>>> Best,
>>> Jincheng
>>>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Shaoxuan Wang
David,
This proposal does not contradict with what we have discussed.
Please check my reply in
https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
on
2019/02/21.
"Beam Python API and Flink Python TableAPI describe the DAG/pipeline in
different manners. We got a chance to communicate with Tyler Akidau (from
Beam) offline, and explained why the Flink tableAPI needs a specific design
for python, rather than purely leverage Beam portability layer.

In our proposal, most of the Python code is just a DAG/pipeline builder for
tableAPI. The majority of operators run purely in Java, while only python
UDFs executed in Python environment during the runtime. This design does
not affect the development and adoption of Beam language portability layer
with Flink runner. Flink and Beam community will still collaborate, jointly
develop and optimize on the JVM / Non-JVM (python,GO) bridge (data and
control connections between different processes) to ensure the robustness
and performance."

When we talk about multi-language support, it involves two components: API
and language. And they are Orthogonal. TableAPI is a descriptive API, and
will be a superset of SQL. I do not see Beam has the layer and any plan to
cover the tableAPI semantics. We already have two languages supported for
tableAPI(java/scala). I do not see the reason why we should not add another
language (python) support for tableAPI.

Regards,
Shaoxuan



On Thu, Apr 4, 2019 at 3:13 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi all,
>
> Thank you very much Jincheng for the very thorough proposal. I was
> following the discussion very briefly, but I have an impression that the
> consensus in the previous discussion[1] was that we do not want to have
> an independent, flink specific multi language support but we want to
> collaborate on that manner with the Beam community. I think this is also
> the concern Thomas raised[2].
>
> Let's make sure we do not contradict with what was said in[1]. Could you
> elaborate more how does it fit in the Beam-Flink multi language support?
>
> Best,
>
> Dawid
>
> [1]
>
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
>
> [2]
>
> https://lists.apache.org/thread.html/da6cd815fa601d81be9f706aaa4d2c595db0b52c40a9040238b830c7@%3Cdev.flink.apache.org%3E
>
>
> On 04/04/2019 08:31, jincheng sun wrote:
> > Hi Shuyi,
> >
> > Glad to see your feedback and port more requirements about
> multi-language!
> >
> > I think the Flink community is very much looking forward to more language
> > support, of course, Golang should be in the future support list.
> > Since the topic of supporting Python on Flink has been researched and
> > discussed in the community for a long time, and I want to support Python
> in
> > the Table API as the first stage, then other languages should be planed
> to
> > support. but I do not think more about the detail about how/when support
> > Golang. And very welcome to share more ideas on how to support Golang if
> > you have more thoughts. :)
> >
> > Regarding UDF, we do have some ideas and design attempts. The related
> > attempts to show the performance of python UDF are not optimistic. And
> > there are also some problems with Python environment management should be
> > considered. After we have more investigations and experiments, I will
> share
> > the discussion with you in time. Perhaps after the first stage(Python
> > TableAPI support), We will then discuss the detailed discussion of UDF
> > support.
> >
> > I think the support of the DataStream API should be considered after
> > supporting UDFs because DataStream is mostly supported by various
> > functions.
> >
> > We plan to complete the first phase before the release of Flink-1.9, and
> > start the UDF support after 1.9. Of course,  I am very glad to hear that
> > you want to contribute to the Flink multi-language support. I believe,
> > nothing is impossible if more people interest in Python Table API with
> UDF
> > support and more people want to contribute community more, UDF may be
> there
> > when flink-1.9 release. :)
> >
> > Best,
> > Jincheng
> >
> > Shuyi Chen <[hidden email]> 于2019年4月4日周四 上午3:35写道:
> >
> >> Thanks a lot for driving the FLIP, jincheng. The approach looks
> >> good. Adding multi-lang support sounds a promising direction to expand
> the
> >> footprint of Flink. Do we have plan for adding Golang support? As many
> >> backend engineers nowadays are familiar with Go, but probably not Java
> as
> >> much, adding Golang support would significantly reduce their friction to
> >> use Flink. Also, do we have a design for multi-lang UDF support, and
> what's
> >> timeline for adding DataStream API support? We would like to help and
> >> contribute as well as we do have similar need internally at our company.
> >> Thanks a lot.
> >>
> >> Shuyi
> >>
> >> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
> >> wrote:
> >>
> >>> Hi All,
> >>> As Xianda brought up in the previous email, There are a large number of
> >>> data analysis users who want flink to support Python. At the Flink API
> >>> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
> will
> >>> become the first-class citizen. Table API is declarative and can be
> >>> automatically optimized, which is mentioned in the Flink mid-term
> roadmap
> >>> by Stephan. So we first considering supporting Python at the Table
> level
> >> to
> >>> cater to the current large number of analytics users. For further
> promote
> >>> Python support in flink table level. Dian, Wei and I discussed offline
> a
> >>> bit and came up with an initial features outline as follows:
> >>>
> >>> - Python TableAPI Interface
> >>>   Introduce a set of Python Table API interfaces, including interface
> >>> definitions such as Table, TableEnvironment, TableConfig, etc.
> >>>
> >>> - Implementation Architecture
> >>>   We will offer two alternative architecture options, one for pure
> Python
> >>> language support and one for extended multi-language design.
> >>>
> >>> - Job Submission
> >>>   Provide a way that can submit(local/remote) Python Table API jobs.
> >>>
> >>> - Python Shell
> >>>   Python Shell is to provide an interactive way for users to write and
> >>> execute flink Python Table API jobs.
> >>>
> >>>
> >>> The design document for FLIP-38 can be found here:
> >>>
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >>> I am looking forward to your comments and feedback.
> >>>
> >>> Best,
> >>> Jincheng
> >>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

dwysakowicz
Hi Shaoxuan,

Yes, I've seen your message and I am not saying it already contradicts.
I agree as long as we just define DAG/pipeline/logical plan it is a
reasonable thing to do. No doubts about that. I have a feeling though it
mentions at some points things that might be in the area of
responsibility of Beam, e.g. convenience methods like: fromElements,
Table#head(in comments)... Those methods require bidirectional
communication between java <> python, and not only one way communication
python -> (logical, representation) -> java. Also UDFs support as far as
I understand is something we might be able to leverage Beam (but at the
same time I might be completely wrong).

The only thing I wanted to outline is I would welcome at least some
comparisons of the proposed approach to Beam multi-language support.
Discussion when can we think of leveraging Beam and when we should come
up with our own solution and why would also be beneficial I think. Right
now the design document does not mention Beam at all.

Sorry if I sounded too harsh, my intention isn't/wasn't to discard this
effort.

Best,

Dawid

On 04/04/2019 09:41, Shaoxuan Wang wrote:

> David,
> This proposal does not contradict with what we have discussed.
> Please check my reply in
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> on
> 2019/02/21.
> "Beam Python API and Flink Python TableAPI describe the DAG/pipeline in
> different manners. We got a chance to communicate with Tyler Akidau (from
> Beam) offline, and explained why the Flink tableAPI needs a specific design
> for python, rather than purely leverage Beam portability layer.
>
> In our proposal, most of the Python code is just a DAG/pipeline builder for
> tableAPI. The majority of operators run purely in Java, while only python
> UDFs executed in Python environment during the runtime. This design does
> not affect the development and adoption of Beam language portability layer
> with Flink runner. Flink and Beam community will still collaborate, jointly
> develop and optimize on the JVM / Non-JVM (python,GO) bridge (data and
> control connections between different processes) to ensure the robustness
> and performance."
>
> When we talk about multi-language support, it involves two components: API
> and language. And they are Orthogonal. TableAPI is a descriptive API, and
> will be a superset of SQL. I do not see Beam has the layer and any plan to
> cover the tableAPI semantics. We already have two languages supported for
> tableAPI(java/scala). I do not see the reason why we should not add another
> language (python) support for tableAPI.
>
> Regards,
> Shaoxuan
>
>
>
> On Thu, Apr 4, 2019 at 3:13 PM Dawid Wysakowicz <[hidden email]>
> wrote:
>
>> Hi all,
>>
>> Thank you very much Jincheng for the very thorough proposal. I was
>> following the discussion very briefly, but I have an impression that the
>> consensus in the previous discussion[1] was that we do not want to have
>> an independent, flink specific multi language support but we want to
>> collaborate on that manner with the Beam community. I think this is also
>> the concern Thomas raised[2].
>>
>> Let's make sure we do not contradict with what was said in[1]. Could you
>> elaborate more how does it fit in the Beam-Flink multi language support?
>>
>> Best,
>>
>> Dawid
>>
>> [1]
>>
>> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
>>
>> [2]
>>
>> https://lists.apache.org/thread.html/da6cd815fa601d81be9f706aaa4d2c595db0b52c40a9040238b830c7@%3Cdev.flink.apache.org%3E
>>
>>
>> On 04/04/2019 08:31, jincheng sun wrote:
>>> Hi Shuyi,
>>>
>>> Glad to see your feedback and port more requirements about
>> multi-language!
>>> I think the Flink community is very much looking forward to more language
>>> support, of course, Golang should be in the future support list.
>>> Since the topic of supporting Python on Flink has been researched and
>>> discussed in the community for a long time, and I want to support Python
>> in
>>> the Table API as the first stage, then other languages should be planed
>> to
>>> support. but I do not think more about the detail about how/when support
>>> Golang. And very welcome to share more ideas on how to support Golang if
>>> you have more thoughts. :)
>>>
>>> Regarding UDF, we do have some ideas and design attempts. The related
>>> attempts to show the performance of python UDF are not optimistic. And
>>> there are also some problems with Python environment management should be
>>> considered. After we have more investigations and experiments, I will
>> share
>>> the discussion with you in time. Perhaps after the first stage(Python
>>> TableAPI support), We will then discuss the detailed discussion of UDF
>>> support.
>>>
>>> I think the support of the DataStream API should be considered after
>>> supporting UDFs because DataStream is mostly supported by various
>>> functions.
>>>
>>> We plan to complete the first phase before the release of Flink-1.9, and
>>> start the UDF support after 1.9. Of course,  I am very glad to hear that
>>> you want to contribute to the Flink multi-language support. I believe,
>>> nothing is impossible if more people interest in Python Table API with
>> UDF
>>> support and more people want to contribute community more, UDF may be
>> there
>>> when flink-1.9 release. :)
>>>
>>> Best,
>>> Jincheng
>>>
>>> Shuyi Chen <[hidden email]> 于2019年4月4日周四 上午3:35写道:
>>>
>>>> Thanks a lot for driving the FLIP, jincheng. The approach looks
>>>> good. Adding multi-lang support sounds a promising direction to expand
>> the
>>>> footprint of Flink. Do we have plan for adding Golang support? As many
>>>> backend engineers nowadays are familiar with Go, but probably not Java
>> as
>>>> much, adding Golang support would significantly reduce their friction to
>>>> use Flink. Also, do we have a design for multi-lang UDF support, and
>> what's
>>>> timeline for adding DataStream API support? We would like to help and
>>>> contribute as well as we do have similar need internally at our company.
>>>> Thanks a lot.
>>>>
>>>> Shuyi
>>>>
>>>> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>> As Xianda brought up in the previous email, There are a large number of
>>>>> data analysis users who want flink to support Python. At the Flink API
>>>>> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
>> will
>>>>> become the first-class citizen. Table API is declarative and can be
>>>>> automatically optimized, which is mentioned in the Flink mid-term
>> roadmap
>>>>> by Stephan. So we first considering supporting Python at the Table
>> level
>>>> to
>>>>> cater to the current large number of analytics users. For further
>> promote
>>>>> Python support in flink table level. Dian, Wei and I discussed offline
>> a
>>>>> bit and came up with an initial features outline as follows:
>>>>>
>>>>> - Python TableAPI Interface
>>>>>   Introduce a set of Python Table API interfaces, including interface
>>>>> definitions such as Table, TableEnvironment, TableConfig, etc.
>>>>>
>>>>> - Implementation Architecture
>>>>>   We will offer two alternative architecture options, one for pure
>> Python
>>>>> language support and one for extended multi-language design.
>>>>>
>>>>> - Job Submission
>>>>>   Provide a way that can submit(local/remote) Python Table API jobs.
>>>>>
>>>>> - Python Shell
>>>>>   Python Shell is to provide an interactive way for users to write and
>>>>> execute flink Python Table API jobs.
>>>>>
>>>>>
>>>>> The design document for FLIP-38 can be found here:
>>>>>
>>>>>
>>>>>
>> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
>>>>> I am looking forward to your comments and feedback.
>>>>>
>>>>> Best,
>>>>> Jincheng
>>>>>
>>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
In reply to this post by Thomas Weise
Hi Thomas,

Glad to see your feedback!

Yes, we can add use case examples in both google doc and FLIP, I had
already add the simple usage in the google doc, here I want to know which
kind of examples you want? :)

The very short answer to UDF support is Yes. As you said, we need UDF
support on the Python Table API, including (UDF, UDTF, UDAF). This needs to
be discussed after basic Python TableAPI supported. Because UDF involves
the management of the python environment, Runtime level Java and Runtime
communication, and UDAF in Flink also involves the application of State, so
this is a topic that is worth discussing in depth in a separate thread.

I think that no matter how the Flink and Beam work together on the UDF
level, it will not affect the current Python API (interface), we can first
support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF) support.

And great thanks for your valuable comments in Google doc! I will feedback
you in the google doc. :)


Regards,
Jincheng

Thomas Weise <[hidden email]> 于2019年4月4日周四 上午8:03写道:

> Thanks for putting this proposal together.
>
> It would be nice, if you could share a few use case examples (maybe add
> them as section to the FLIP?).
>
> The reason I ask: The table API is immensely useful, but it isn't clear to
> me what value other language bindings provide without UDF support. With
> FLIP-38 it will be possible to write a program in Python, but not execute
> Python functions. Without UDF support, isn't it possible to achieve roughly
> the same with plain SQL? In which situation would I use the Python API?
>
> There was related discussion regarding UDF support in [1]. If the
> assumption is that such support will be added later, then I would like to
> circle back to the question why this cannot be built on top of Beam? It
> would be nice to clarify the bigger goal before embarking for the first
> milestone.
>
> I'm going to comment on other things in the doc.
>
> [1]
>
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
>
> Thomas
>
>
> On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]> wrote:
>
> > Thanks a lot for driving the FLIP, jincheng. The approach looks
> > good. Adding multi-lang support sounds a promising direction to expand
> the
> > footprint of Flink. Do we have plan for adding Golang support? As many
> > backend engineers nowadays are familiar with Go, but probably not Java as
> > much, adding Golang support would significantly reduce their friction to
> > use Flink. Also, do we have a design for multi-lang UDF support, and
> what's
> > timeline for adding DataStream API support? We would like to help and
> > contribute as well as we do have similar need internally at our company.
> > Thanks a lot.
> >
> > Shuyi
> >
> > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
> > wrote:
> >
> > > Hi All,
> > > As Xianda brought up in the previous email, There are a large number of
> > > data analysis users who want flink to support Python. At the Flink API
> > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
> will
> > > become the first-class citizen. Table API is declarative and can be
> > > automatically optimized, which is mentioned in the Flink mid-term
> roadmap
> > > by Stephan. So we first considering supporting Python at the Table
> level
> > to
> > > cater to the current large number of analytics users. For further
> promote
> > > Python support in flink table level. Dian, Wei and I discussed offline
> a
> > > bit and came up with an initial features outline as follows:
> > >
> > > - Python TableAPI Interface
> > >   Introduce a set of Python Table API interfaces, including interface
> > > definitions such as Table, TableEnvironment, TableConfig, etc.
> > >
> > > - Implementation Architecture
> > >   We will offer two alternative architecture options, one for pure
> Python
> > > language support and one for extended multi-language design.
> > >
> > > - Job Submission
> > >   Provide a way that can submit(local/remote) Python Table API jobs.
> > >
> > > - Python Shell
> > >   Python Shell is to provide an interactive way for users to write and
> > > execute flink Python Table API jobs.
> > >
> > >
> > > The design document for FLIP-38 can be found here:
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> > >
> > > I am looking forward to your comments and feedback.
> > >
> > > Best,
> > > Jincheng
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
In reply to this post by dwysakowicz
Hi Dawid,

Thanks for your feedback!

Yes, you are right, there is no problem with two-way communication in our
proposal. The two solutions proposed already support for two-way
communication between Python and Java. Similar to the interface of
from_collection/from_elements, our proposal can solve those problems very
well.

Regarding why we do not mention Beam, is because Beam does not have the
layer to cover the table API semantics, and the current proposal is about
Table API interface and implementation. So there is no inevitable
connection with Beam in the design of the Python Table API.

Furthermore, Beam uses protobuf to define the data structure and solve the
multi-language problem. It is very similar to the proposed Approach 2, but
there are some differences in the implementation details. You can see the
comments in the document. :)

Regards,
Jincheng

Dawid Wysakowicz <[hidden email]> 于2019年4月4日周四 下午4:28写道:

> Hi Shaoxuan,
>
> Yes, I've seen your message and I am not saying it already contradicts.
> I agree as long as we just define DAG/pipeline/logical plan it is a
> reasonable thing to do. No doubts about that. I have a feeling though it
> mentions at some points things that might be in the area of
> responsibility of Beam, e.g. convenience methods like: fromElements,
> Table#head(in comments)... Those methods require bidirectional
> communication between java <> python, and not only one way communication
> python -> (logical, representation) -> java. Also UDFs support as far as
> I understand is something we might be able to leverage Beam (but at the
> same time I might be completely wrong).
>
> The only thing I wanted to outline is I would welcome at least some
> comparisons of the proposed approach to Beam multi-language support.
> Discussion when can we think of leveraging Beam and when we should come
> up with our own solution and why would also be beneficial I think. Right
> now the design document does not mention Beam at all.
>
> Sorry if I sounded too harsh, my intention isn't/wasn't to discard this
> effort.
>
> Best,
>
> Dawid
>
> On 04/04/2019 09:41, Shaoxuan Wang wrote:
> > David,
> > This proposal does not contradict with what we have discussed.
> > Please check my reply in
> >
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> > on
> > 2019/02/21.
> > "Beam Python API and Flink Python TableAPI describe the DAG/pipeline in
> > different manners. We got a chance to communicate with Tyler Akidau (from
> > Beam) offline, and explained why the Flink tableAPI needs a specific
> design
> > for python, rather than purely leverage Beam portability layer.
> >
> > In our proposal, most of the Python code is just a DAG/pipeline builder
> for
> > tableAPI. The majority of operators run purely in Java, while only python
> > UDFs executed in Python environment during the runtime. This design does
> > not affect the development and adoption of Beam language portability
> layer
> > with Flink runner. Flink and Beam community will still collaborate,
> jointly
> > develop and optimize on the JVM / Non-JVM (python,GO) bridge (data and
> > control connections between different processes) to ensure the robustness
> > and performance."
> >
> > When we talk about multi-language support, it involves two components:
> API
> > and language. And they are Orthogonal. TableAPI is a descriptive API, and
> > will be a superset of SQL. I do not see Beam has the layer and any plan
> to
> > cover the tableAPI semantics. We already have two languages supported for
> > tableAPI(java/scala). I do not see the reason why we should not add
> another
> > language (python) support for tableAPI.
> >
> > Regards,
> > Shaoxuan
> >
> >
> >
> > On Thu, Apr 4, 2019 at 3:13 PM Dawid Wysakowicz <[hidden email]>
> > wrote:
> >
> >> Hi all,
> >>
> >> Thank you very much Jincheng for the very thorough proposal. I was
> >> following the discussion very briefly, but I have an impression that the
> >> consensus in the previous discussion[1] was that we do not want to have
> >> an independent, flink specific multi language support but we want to
> >> collaborate on that manner with the Beam community. I think this is also
> >> the concern Thomas raised[2].
> >>
> >> Let's make sure we do not contradict with what was said in[1]. Could you
> >> elaborate more how does it fit in the Beam-Flink multi language support?
> >>
> >> Best,
> >>
> >> Dawid
> >>
> >> [1]
> >>
> >>
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> >>
> >> [2]
> >>
> >>
> https://lists.apache.org/thread.html/da6cd815fa601d81be9f706aaa4d2c595db0b52c40a9040238b830c7@%3Cdev.flink.apache.org%3E
> >>
> >>
> >> On 04/04/2019 08:31, jincheng sun wrote:
> >>> Hi Shuyi,
> >>>
> >>> Glad to see your feedback and port more requirements about
> >> multi-language!
> >>> I think the Flink community is very much looking forward to more
> language
> >>> support, of course, Golang should be in the future support list.
> >>> Since the topic of supporting Python on Flink has been researched and
> >>> discussed in the community for a long time, and I want to support
> Python
> >> in
> >>> the Table API as the first stage, then other languages should be planed
> >> to
> >>> support. but I do not think more about the detail about how/when
> support
> >>> Golang. And very welcome to share more ideas on how to support Golang
> if
> >>> you have more thoughts. :)
> >>>
> >>> Regarding UDF, we do have some ideas and design attempts. The related
> >>> attempts to show the performance of python UDF are not optimistic. And
> >>> there are also some problems with Python environment management should
> be
> >>> considered. After we have more investigations and experiments, I will
> >> share
> >>> the discussion with you in time. Perhaps after the first stage(Python
> >>> TableAPI support), We will then discuss the detailed discussion of UDF
> >>> support.
> >>>
> >>> I think the support of the DataStream API should be considered after
> >>> supporting UDFs because DataStream is mostly supported by various
> >>> functions.
> >>>
> >>> We plan to complete the first phase before the release of Flink-1.9,
> and
> >>> start the UDF support after 1.9. Of course,  I am very glad to hear
> that
> >>> you want to contribute to the Flink multi-language support. I believe,
> >>> nothing is impossible if more people interest in Python Table API with
> >> UDF
> >>> support and more people want to contribute community more, UDF may be
> >> there
> >>> when flink-1.9 release. :)
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>> Shuyi Chen <[hidden email]> 于2019年4月4日周四 上午3:35写道:
> >>>
> >>>> Thanks a lot for driving the FLIP, jincheng. The approach looks
> >>>> good. Adding multi-lang support sounds a promising direction to expand
> >> the
> >>>> footprint of Flink. Do we have plan for adding Golang support? As many
> >>>> backend engineers nowadays are familiar with Go, but probably not Java
> >> as
> >>>> much, adding Golang support would significantly reduce their friction
> to
> >>>> use Flink. Also, do we have a design for multi-lang UDF support, and
> >> what's
> >>>> timeline for adding DataStream API support? We would like to help and
> >>>> contribute as well as we do have similar need internally at our
> company.
> >>>> Thanks a lot.
> >>>>
> >>>> Shuyi
> >>>>
> >>>> On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]
> >
> >>>> wrote:
> >>>>
> >>>>> Hi All,
> >>>>> As Xianda brought up in the previous email, There are a large number
> of
> >>>>> data analysis users who want flink to support Python. At the Flink
> API
> >>>>> level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
> >> will
> >>>>> become the first-class citizen. Table API is declarative and can be
> >>>>> automatically optimized, which is mentioned in the Flink mid-term
> >> roadmap
> >>>>> by Stephan. So we first considering supporting Python at the Table
> >> level
> >>>> to
> >>>>> cater to the current large number of analytics users. For further
> >> promote
> >>>>> Python support in flink table level. Dian, Wei and I discussed
> offline
> >> a
> >>>>> bit and came up with an initial features outline as follows:
> >>>>>
> >>>>> - Python TableAPI Interface
> >>>>>   Introduce a set of Python Table API interfaces, including interface
> >>>>> definitions such as Table, TableEnvironment, TableConfig, etc.
> >>>>>
> >>>>> - Implementation Architecture
> >>>>>   We will offer two alternative architecture options, one for pure
> >> Python
> >>>>> language support and one for extended multi-language design.
> >>>>>
> >>>>> - Job Submission
> >>>>>   Provide a way that can submit(local/remote) Python Table API jobs.
> >>>>>
> >>>>> - Python Shell
> >>>>>   Python Shell is to provide an interactive way for users to write
> and
> >>>>> execute flink Python Table API jobs.
> >>>>>
> >>>>>
> >>>>> The design document for FLIP-38 can be found here:
> >>>>>
> >>>>>
> >>>>>
> >>
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >>>>> I am looking forward to your comments and feedback.
> >>>>>
> >>>>> Best,
> >>>>> Jincheng
> >>>>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Shuyi Chen
In reply to this post by Thomas Weise
Hi Thomas,

I agreed that UDF support is important. As some of us might not be familiar
with the effort on Apache Beam, It will be great if you can share some
design documents on Beam's portability layer and multi-language support,
and the current status. Thanks a lot.

Regards
Shuyi

On Wed, Apr 3, 2019 at 5:03 PM Thomas Weise <[hidden email]> wrote:

> Thanks for putting this proposal together.
>
> It would be nice, if you could share a few use case examples (maybe add
> them as section to the FLIP?).
>
> The reason I ask: The table API is immensely useful, but it isn't clear to
> me what value other language bindings provide without UDF support. With
> FLIP-38 it will be possible to write a program in Python, but not execute
> Python functions. Without UDF support, isn't it possible to achieve roughly
> the same with plain SQL? In which situation would I use the Python API?
>
> There was related discussion regarding UDF support in [1]. If the
> assumption is that such support will be added later, then I would like to
> circle back to the question why this cannot be built on top of Beam? It
> would be nice to clarify the bigger goal before embarking for the first
> milestone.
>
> I'm going to comment on other things in the doc.
>
> [1]
>
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
>
> Thomas
>
>
> On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]> wrote:
>
> > Thanks a lot for driving the FLIP, jincheng. The approach looks
> > good. Adding multi-lang support sounds a promising direction to expand
> the
> > footprint of Flink. Do we have plan for adding Golang support? As many
> > backend engineers nowadays are familiar with Go, but probably not Java as
> > much, adding Golang support would significantly reduce their friction to
> > use Flink. Also, do we have a design for multi-lang UDF support, and
> what's
> > timeline for adding DataStream API support? We would like to help and
> > contribute as well as we do have similar need internally at our company.
> > Thanks a lot.
> >
> > Shuyi
> >
> > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
> > wrote:
> >
> > > Hi All,
> > > As Xianda brought up in the previous email, There are a large number of
> > > data analysis users who want flink to support Python. At the Flink API
> > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
> will
> > > become the first-class citizen. Table API is declarative and can be
> > > automatically optimized, which is mentioned in the Flink mid-term
> roadmap
> > > by Stephan. So we first considering supporting Python at the Table
> level
> > to
> > > cater to the current large number of analytics users. For further
> promote
> > > Python support in flink table level. Dian, Wei and I discussed offline
> a
> > > bit and came up with an initial features outline as follows:
> > >
> > > - Python TableAPI Interface
> > >   Introduce a set of Python Table API interfaces, including interface
> > > definitions such as Table, TableEnvironment, TableConfig, etc.
> > >
> > > - Implementation Architecture
> > >   We will offer two alternative architecture options, one for pure
> Python
> > > language support and one for extended multi-language design.
> > >
> > > - Job Submission
> > >   Provide a way that can submit(local/remote) Python Table API jobs.
> > >
> > > - Python Shell
> > >   Python Shell is to provide an interactive way for users to write and
> > > execute flink Python Table API jobs.
> > >
> > >
> > > The design document for FLIP-38 can be found here:
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> > >
> > > I am looking forward to your comments and feedback.
> > >
> > > Best,
> > > Jincheng
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Thomas Weise-2
In reply to this post by jincheng sun
Hi Jincheng,

>
> Yes, we can add use case examples in both google doc and FLIP, I had
> already add the simple usage in the google doc, here I want to know which
> kind of examples you want? :)
>

Do you have use cases where the Python table API can be applied without UDF
support?

(And where the same could not be accomplished with just SQL.)


> The very short answer to UDF support is Yes. As you said, we need UDF
> support on the Python Table API, including (UDF, UDTF, UDAF). This needs to
> be discussed after basic Python TableAPI supported. Because UDF involves
> the management of the python environment, Runtime level Java and Runtime
> communication, and UDAF in Flink also involves the application of State, so
> this is a topic that is worth discussing in depth in a separate thread.
>

The current proposal for job submission touches something that Beam
portability already had to solve.

If we think that the Python table API will only be useful with UDF support
(question above), then it may be better to discuss the first step with the
final goal in mind. If we find that Beam can be used for the UDF part then
approach 1 vs. approach 2 in the doc (for the client side language
boundary) may look different.


>
> I think that no matter how the Flink and Beam work together on the UDF
> level, it will not affect the current Python API (interface), we can first
> support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF)
> support.
>
>
I agree that the client side API should not be affected.


> And great thanks for your valuable comments in Google doc! I will feedback
> you in the google doc. :)
>
>
> Regards,
> Jincheng
>
> Thomas Weise <[hidden email]> 于2019年4月4日周四 上午8:03写道:
>
> > Thanks for putting this proposal together.
> >
> > It would be nice, if you could share a few use case examples (maybe add
> > them as section to the FLIP?).
> >
> > The reason I ask: The table API is immensely useful, but it isn't clear
> to
> > me what value other language bindings provide without UDF support. With
> > FLIP-38 it will be possible to write a program in Python, but not execute
> > Python functions. Without UDF support, isn't it possible to achieve
> roughly
> > the same with plain SQL? In which situation would I use the Python API?
> >
> > There was related discussion regarding UDF support in [1]. If the
> > assumption is that such support will be added later, then I would like to
> > circle back to the question why this cannot be built on top of Beam? It
> > would be nice to clarify the bigger goal before embarking for the first
> > milestone.
> >
> > I'm going to comment on other things in the doc.
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> >
> > Thomas
> >
> >
> > On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]> wrote:
> >
> > > Thanks a lot for driving the FLIP, jincheng. The approach looks
> > > good. Adding multi-lang support sounds a promising direction to expand
> > the
> > > footprint of Flink. Do we have plan for adding Golang support? As many
> > > backend engineers nowadays are familiar with Go, but probably not Java
> as
> > > much, adding Golang support would significantly reduce their friction
> to
> > > use Flink. Also, do we have a design for multi-lang UDF support, and
> > what's
> > > timeline for adding DataStream API support? We would like to help and
> > > contribute as well as we do have similar need internally at our
> company.
> > > Thanks a lot.
> > >
> > > Shuyi
> > >
> > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <[hidden email]>
> > > wrote:
> > >
> > > > Hi All,
> > > > As Xianda brought up in the previous email, There are a large number
> of
> > > > data analysis users who want flink to support Python. At the Flink
> API
> > > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
> > will
> > > > become the first-class citizen. Table API is declarative and can be
> > > > automatically optimized, which is mentioned in the Flink mid-term
> > roadmap
> > > > by Stephan. So we first considering supporting Python at the Table
> > level
> > > to
> > > > cater to the current large number of analytics users. For further
> > promote
> > > > Python support in flink table level. Dian, Wei and I discussed
> offline
> > a
> > > > bit and came up with an initial features outline as follows:
> > > >
> > > > - Python TableAPI Interface
> > > >   Introduce a set of Python Table API interfaces, including interface
> > > > definitions such as Table, TableEnvironment, TableConfig, etc.
> > > >
> > > > - Implementation Architecture
> > > >   We will offer two alternative architecture options, one for pure
> > Python
> > > > language support and one for extended multi-language design.
> > > >
> > > > - Job Submission
> > > >   Provide a way that can submit(local/remote) Python Table API jobs.
> > > >
> > > > - Python Shell
> > > >   Python Shell is to provide an interactive way for users to write
> and
> > > > execute flink Python Table API jobs.
> > > >
> > > >
> > > > The design document for FLIP-38 can be found here:
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> > > >
> > > > I am looking forward to your comments and feedback.
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Thomas Weise
In reply to this post by Shuyi Chen
Hu Shuyi,

I would recommend to start with the following:

https://docs.google.com/presentation/d/1AkU-QXSflau-RSeolB4TSLy0_mg0xwb398Czw7aqVGw/edit#slide=id.p
https://www.youtube.com/watch?v=VsGQ2LFeTHY&list=PL4dEBWmGSIU_9JTGnkGVg6-BwaV0FMxyJ&index=12&t=0s

For follow-up, there is plenty of knowledge available within the Flink
community :)

Thomas
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
In reply to this post by Thomas Weise-2
Hi Thomas,
Thanks for your quick reply. I will share my thoughts further. :)

1. Without the support of the User-defined function, the Python Table API
can do a lot of things: (Of course, we need to support UDF on the Python
Table API.)

Do you have use cases where the Python table API can be applied without UDF
> support?


Without the support of the User-defined function(UDF), the Python Table API
can do a lot of things, such as ETL, JOIN, Aggregations,
Tumble/Slide/Session Window, etc., because Flink has hundreds of built-in
Scalar functions and Aggregate functions
<https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/expressions/BuiltInFunctionDefinitions.java>
inside(detail can be found here
<https://github.com/apache/flink/blob/master/flink-table/flink-table-common/src/main/java/org/apache/flink/table/expressions/BuiltInFunctionDefinitions.java>),
furthermore,  we can also Add more commonly used built-in
ScalarFunction/TableFunction/AggregateFunction if we need.

Just as we provide Java and Scala language support, we also support the
Table API first and then add support for UDFs.

Just clarify, we will support UDFs on the Python Table API, but the timing
will be in the next phase. In Google doc, the has already been mentioned in
the Future or next step section.

2. Flink itself should have its own Python Table API:

From my point of view, whether or not Beam supports the Python TableAPI,
Flink itself should have its own Python Table API, such as the interfaces
listed in the design documentation. The Python Table API not only requires
a standard table interface definition but also need defined Flink internal
tableEnvironment, TableConfig, Window, ConnectorDescriptor, TableSource,
TableSink, Catalog and other interface definitions closely related to
Flink. All of these require friendly support on the Python Table API. At
the same time, the Flink Python Table API also has the requirement for the
interactive query. From the perspective of Flink functionality integrity
and user-friendliness, Flink should have its own Python Table API interface
definition, which is at the heart of the FLIP-38 proposal. Of course, I
would very much like to see that Beam has good support for Flink on the
Python Table API. I think they can coexist.

3. Support for Python Table API and support for UDF can be done separately:

Support for the Flink Python TableAPI is a door for the Flink community to
open to Python language users. After opening the Python language, we need
to provide more functional support for the user, including various
operators, such as select/filter/join/window/aggregate, and of course, UDFs
that are added later.

    - Support for the Python Table API(with various operators) What we need
to solve is how to convert Python to Java on the client side, just a very
thin layer of conversion.
    - Support for UDF (scala function/table function/aggregate function)
What we need to solve is how Java communicates with Python at runtime
level, and how Python User-defined Aggregate Function uses Flink state
(State is Flink-specific), As well as the management of the python
environment at runtime, and complex issues such as performance. And I agree
that Flink can work with Beam when discussing UDFs support, and I would
love to see support UDFs in Python Table API can on the top of Beam. :)

So I think that supporting the Python Table API and the problems that
support UDFs need to be solved are very different, we can discuss them
separately.

For a brief summary, I recommend that Flink support the Python Table API
and support for UDFs discuss in the different thread. Let's first discuss
support the Python TableAPI.  What do you think?

Regards,
Jincheng


Thomas Weise <[hidden email]> 于2019年4月5日周五 下午12:11写道:

> Hi Jincheng,
>
> >
> > Yes, we can add use case examples in both google doc and FLIP, I had
> > already add the simple usage in the google doc, here I want to know which
> > kind of examples you want? :)
> >
>
> Do you have use cases where the Python table API can be applied without UDF
> support?
>
> (And where the same could not be accomplished with just SQL.)
>
>
> > The very short answer to UDF support is Yes. As you said, we need UDF
> > support on the Python Table API, including (UDF, UDTF, UDAF). This needs
> to
> > be discussed after basic Python TableAPI supported. Because UDF involves
> > the management of the python environment, Runtime level Java and Runtime
> > communication, and UDAF in Flink also involves the application of State,
> so
> > this is a topic that is worth discussing in depth in a separate thread.
> >
>
> The current proposal for job submission touches something that Beam
> portability already had to solve.
>
> If we think that the Python table API will only be useful with UDF support
> (question above), then it may be better to discuss the first step with the
> final goal in mind. If we find that Beam can be used for the UDF part then
> approach 1 vs. approach 2 in the doc (for the client side language
> boundary) may look different.
>
>
> >
> > I think that no matter how the Flink and Beam work together on the UDF
> > level, it will not affect the current Python API (interface), we can
> first
> > support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF)
> > support.
> >
> >
> I agree that the client side API should not be affected.
>
>
> > And great thanks for your valuable comments in Google doc! I will
> feedback
> > you in the google doc. :)
> >
> >
> > Regards,
> > Jincheng
> >
> > Thomas Weise <[hidden email]> 于2019年4月4日周四 上午8:03写道:
> >
> > > Thanks for putting this proposal together.
> > >
> > > It would be nice, if you could share a few use case examples (maybe add
> > > them as section to the FLIP?).
> > >
> > > The reason I ask: The table API is immensely useful, but it isn't clear
> > to
> > > me what value other language bindings provide without UDF support. With
> > > FLIP-38 it will be possible to write a program in Python, but not
> execute
> > > Python functions. Without UDF support, isn't it possible to achieve
> > roughly
> > > the same with plain SQL? In which situation would I use the Python API?
> > >
> > > There was related discussion regarding UDF support in [1]. If the
> > > assumption is that such support will be added later, then I would like
> to
> > > circle back to the question why this cannot be built on top of Beam? It
> > > would be nice to clarify the bigger goal before embarking for the first
> > > milestone.
> > >
> > > I'm going to comment on other things in the doc.
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> > >
> > > Thomas
> > >
> > >
> > > On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]> wrote:
> > >
> > > > Thanks a lot for driving the FLIP, jincheng. The approach looks
> > > > good. Adding multi-lang support sounds a promising direction to
> expand
> > > the
> > > > footprint of Flink. Do we have plan for adding Golang support? As
> many
> > > > backend engineers nowadays are familiar with Go, but probably not
> Java
> > as
> > > > much, adding Golang support would significantly reduce their friction
> > to
> > > > use Flink. Also, do we have a design for multi-lang UDF support, and
> > > what's
> > > > timeline for adding DataStream API support? We would like to help and
> > > > contribute as well as we do have similar need internally at our
> > company.
> > > > Thanks a lot.
> > > >
> > > > Shuyi
> > > >
> > > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > As Xianda brought up in the previous email, There are a large
> number
> > of
> > > > > data analysis users who want flink to support Python. At the Flink
> > API
> > > > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
> > > will
> > > > > become the first-class citizen. Table API is declarative and can be
> > > > > automatically optimized, which is mentioned in the Flink mid-term
> > > roadmap
> > > > > by Stephan. So we first considering supporting Python at the Table
> > > level
> > > > to
> > > > > cater to the current large number of analytics users. For further
> > > promote
> > > > > Python support in flink table level. Dian, Wei and I discussed
> > offline
> > > a
> > > > > bit and came up with an initial features outline as follows:
> > > > >
> > > > > - Python TableAPI Interface
> > > > >   Introduce a set of Python Table API interfaces, including
> interface
> > > > > definitions such as Table, TableEnvironment, TableConfig, etc.
> > > > >
> > > > > - Implementation Architecture
> > > > >   We will offer two alternative architecture options, one for pure
> > > Python
> > > > > language support and one for extended multi-language design.
> > > > >
> > > > > - Job Submission
> > > > >   Provide a way that can submit(local/remote) Python Table API
> jobs.
> > > > >
> > > > > - Python Shell
> > > > >   Python Shell is to provide an interactive way for users to write
> > and
> > > > > execute flink Python Table API jobs.
> > > > >
> > > > >
> > > > > The design document for FLIP-38 can be found here:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> > > > >
> > > > > I am looking forward to your comments and feedback.
> > > > >
> > > > > Best,
> > > > > Jincheng
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
In reply to this post by Thomas Weise-2
One more thing It's better to mention that Flink table API is a superset of
Flink SQL, such as:
- AddColumns/DropColums/RenameColumns, the detail can be found in Google doc
<https://docs.google.com/document/d/1tryl6swt1K1pw7yvv5pdvFXSxfrBZ3_OkOObymis2ck/edit#heading=h.7rwcjbvr52dc>
- Interactive Programming in Flink Table API, the detail can be found in
FLIP-36
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink>
I think In the future, more and more features that cannot be expressed in
SQL will be added in Table API.

Thomas Weise <[hidden email]> 于2019年4月5日周五 下午12:11写道:

> Hi Jincheng,
>
> >
> > Yes, we can add use case examples in both google doc and FLIP, I had
> > already add the simple usage in the google doc, here I want to know which
> > kind of examples you want? :)
> >
>
> Do you have use cases where the Python table API can be applied without UDF
> support?
>
> (And where the same could not be accomplished with just SQL.)
>
>
> > The very short answer to UDF support is Yes. As you said, we need UDF
> > support on the Python Table API, including (UDF, UDTF, UDAF). This needs
> to
> > be discussed after basic Python TableAPI supported. Because UDF involves
> > the management of the python environment, Runtime level Java and Runtime
> > communication, and UDAF in Flink also involves the application of State,
> so
> > this is a topic that is worth discussing in depth in a separate thread.
> >
>
> The current proposal for job submission touches something that Beam
> portability already had to solve.
>
> If we think that the Python table API will only be useful with UDF support
> (question above), then it may be better to discuss the first step with the
> final goal in mind. If we find that Beam can be used for the UDF part then
> approach 1 vs. approach 2 in the doc (for the client side language
> boundary) may look different.
>
>
> >
> > I think that no matter how the Flink and Beam work together on the UDF
> > level, it will not affect the current Python API (interface), we can
> first
> > support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF)
> > support.
> >
> >
> I agree that the client side API should not be affected.
>
>
> > And great thanks for your valuable comments in Google doc! I will
> feedback
> > you in the google doc. :)
> >
> >
> > Regards,
> > Jincheng
> >
> > Thomas Weise <[hidden email]> 于2019年4月4日周四 上午8:03写道:
> >
> > > Thanks for putting this proposal together.
> > >
> > > It would be nice, if you could share a few use case examples (maybe add
> > > them as section to the FLIP?).
> > >
> > > The reason I ask: The table API is immensely useful, but it isn't clear
> > to
> > > me what value other language bindings provide without UDF support. With
> > > FLIP-38 it will be possible to write a program in Python, but not
> execute
> > > Python functions. Without UDF support, isn't it possible to achieve
> > roughly
> > > the same with plain SQL? In which situation would I use the Python API?
> > >
> > > There was related discussion regarding UDF support in [1]. If the
> > > assumption is that such support will be added later, then I would like
> to
> > > circle back to the question why this cannot be built on top of Beam? It
> > > would be nice to clarify the bigger goal before embarking for the first
> > > milestone.
> > >
> > > I'm going to comment on other things in the doc.
> > >
> > > [1]
> > >
> > >
> >
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> > >
> > > Thomas
> > >
> > >
> > > On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]> wrote:
> > >
> > > > Thanks a lot for driving the FLIP, jincheng. The approach looks
> > > > good. Adding multi-lang support sounds a promising direction to
> expand
> > > the
> > > > footprint of Flink. Do we have plan for adding Golang support? As
> many
> > > > backend engineers nowadays are familiar with Go, but probably not
> Java
> > as
> > > > much, adding Golang support would significantly reduce their friction
> > to
> > > > use Flink. Also, do we have a design for multi-lang UDF support, and
> > > what's
> > > > timeline for adding DataStream API support? We would like to help and
> > > > contribute as well as we do have similar need internally at our
> > company.
> > > > Thanks a lot.
> > > >
> > > > Shuyi
> > > >
> > > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > > As Xianda brought up in the previous email, There are a large
> number
> > of
> > > > > data analysis users who want flink to support Python. At the Flink
> > API
> > > > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table API
> > > will
> > > > > become the first-class citizen. Table API is declarative and can be
> > > > > automatically optimized, which is mentioned in the Flink mid-term
> > > roadmap
> > > > > by Stephan. So we first considering supporting Python at the Table
> > > level
> > > > to
> > > > > cater to the current large number of analytics users. For further
> > > promote
> > > > > Python support in flink table level. Dian, Wei and I discussed
> > offline
> > > a
> > > > > bit and came up with an initial features outline as follows:
> > > > >
> > > > > - Python TableAPI Interface
> > > > >   Introduce a set of Python Table API interfaces, including
> interface
> > > > > definitions such as Table, TableEnvironment, TableConfig, etc.
> > > > >
> > > > > - Implementation Architecture
> > > > >   We will offer two alternative architecture options, one for pure
> > > Python
> > > > > language support and one for extended multi-language design.
> > > > >
> > > > > - Job Submission
> > > > >   Provide a way that can submit(local/remote) Python Table API
> jobs.
> > > > >
> > > > > - Python Shell
> > > > >   Python Shell is to provide an interactive way for users to write
> > and
> > > > > execute flink Python Table API jobs.
> > > > >
> > > > >
> > > > > The design document for FLIP-38 can be found here:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> > > > >
> > > > > I am looking forward to your comments and feedback.
> > > > >
> > > > > Best,
> > > > > Jincheng
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Stephan Ewen
Hi all!

I think that all the opinions and ideas are not actually in conflict, so
let me summarize what I understand is the proposal:

*(1) Long-term goal: Full Python Table API with UDFs*

     To break the implementation effort up into stages, the first step
would be the API without UDFs.
      Because of all the built-in functions in the Table API, this can
already exist by itself, with some value, but ultimately is quite limited
without UDF support.

     ==> The FLIP should probably reflect the full goal rather than the
first implementation step only, this would make sure everyone understands
what the final goal of the effort is.


*(2) Relationship to Beam Language Portability*

Flink's own Python Table API and Beam-Python on Flink add different value
and are both attractive for different scenarios.

  - Beam's Python API supports complex pipelines in a similar style as the
DataStream API. There is also the ecosystem of libraries built on top that
DSL, for example for machine learning.

  - Flink's Python Table API builds mostly relational expressions, plus
some UDFs. Most of the Python code never executes in Python, though. It is
geared at use cases similar to Flink's Table API.

Both approaches mainly differ in how the streaming DAG is built from Python
code and received by the JVM.

In previous discussions, we concluded that for inter process data exchange
(JVM <> Python), we want to share code with Beam.
That part is possibly the most crucial piece to getting performance out of
the Python DSL, so will benefit from sharing development, optimizations,
etc.

Best,
Stephan




On Fri, Apr 5, 2019 at 5:25 PM jincheng sun <[hidden email]>
wrote:

> One more thing It's better to mention that Flink table API is a superset of
> Flink SQL, such as:
> - AddColumns/DropColums/RenameColumns, the detail can be found in Google
> doc
> <
> https://docs.google.com/document/d/1tryl6swt1K1pw7yvv5pdvFXSxfrBZ3_OkOObymis2ck/edit#heading=h.7rwcjbvr52dc
> >
> - Interactive Programming in Flink Table API, the detail can be found in
> FLIP-36
> <
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> >
> I think In the future, more and more features that cannot be expressed in
> SQL will be added in Table API.
>
> Thomas Weise <[hidden email]> 于2019年4月5日周五 下午12:11写道:
>
> > Hi Jincheng,
> >
> > >
> > > Yes, we can add use case examples in both google doc and FLIP, I had
> > > already add the simple usage in the google doc, here I want to know
> which
> > > kind of examples you want? :)
> > >
> >
> > Do you have use cases where the Python table API can be applied without
> UDF
> > support?
> >
> > (And where the same could not be accomplished with just SQL.)
> >
> >
> > > The very short answer to UDF support is Yes. As you said, we need UDF
> > > support on the Python Table API, including (UDF, UDTF, UDAF). This
> needs
> > to
> > > be discussed after basic Python TableAPI supported. Because UDF
> involves
> > > the management of the python environment, Runtime level Java and
> Runtime
> > > communication, and UDAF in Flink also involves the application of
> State,
> > so
> > > this is a topic that is worth discussing in depth in a separate thread.
> > >
> >
> > The current proposal for job submission touches something that Beam
> > portability already had to solve.
> >
> > If we think that the Python table API will only be useful with UDF
> support
> > (question above), then it may be better to discuss the first step with
> the
> > final goal in mind. If we find that Beam can be used for the UDF part
> then
> > approach 1 vs. approach 2 in the doc (for the client side language
> > boundary) may look different.
> >
> >
> > >
> > > I think that no matter how the Flink and Beam work together on the UDF
> > > level, it will not affect the current Python API (interface), we can
> > first
> > > support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF)
> > > support.
> > >
> > >
> > I agree that the client side API should not be affected.
> >
> >
> > > And great thanks for your valuable comments in Google doc! I will
> > feedback
> > > you in the google doc. :)
> > >
> > >
> > > Regards,
> > > Jincheng
> > >
> > > Thomas Weise <[hidden email]> 于2019年4月4日周四 上午8:03写道:
> > >
> > > > Thanks for putting this proposal together.
> > > >
> > > > It would be nice, if you could share a few use case examples (maybe
> add
> > > > them as section to the FLIP?).
> > > >
> > > > The reason I ask: The table API is immensely useful, but it isn't
> clear
> > > to
> > > > me what value other language bindings provide without UDF support.
> With
> > > > FLIP-38 it will be possible to write a program in Python, but not
> > execute
> > > > Python functions. Without UDF support, isn't it possible to achieve
> > > roughly
> > > > the same with plain SQL? In which situation would I use the Python
> API?
> > > >
> > > > There was related discussion regarding UDF support in [1]. If the
> > > > assumption is that such support will be added later, then I would
> like
> > to
> > > > circle back to the question why this cannot be built on top of Beam?
> It
> > > > would be nice to clarify the bigger goal before embarking for the
> first
> > > > milestone.
> > > >
> > > > I'm going to comment on other things in the doc.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> > > >
> > > > Thomas
> > > >
> > > >
> > > > On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]>
> wrote:
> > > >
> > > > > Thanks a lot for driving the FLIP, jincheng. The approach looks
> > > > > good. Adding multi-lang support sounds a promising direction to
> > expand
> > > > the
> > > > > footprint of Flink. Do we have plan for adding Golang support? As
> > many
> > > > > backend engineers nowadays are familiar with Go, but probably not
> > Java
> > > as
> > > > > much, adding Golang support would significantly reduce their
> friction
> > > to
> > > > > use Flink. Also, do we have a design for multi-lang UDF support,
> and
> > > > what's
> > > > > timeline for adding DataStream API support? We would like to help
> and
> > > > > contribute as well as we do have similar need internally at our
> > > company.
> > > > > Thanks a lot.
> > > > >
> > > > > Shuyi
> > > > >
> > > > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > > As Xianda brought up in the previous email, There are a large
> > number
> > > of
> > > > > > data analysis users who want flink to support Python. At the
> Flink
> > > API
> > > > > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table
> API
> > > > will
> > > > > > become the first-class citizen. Table API is declarative and can
> be
> > > > > > automatically optimized, which is mentioned in the Flink mid-term
> > > > roadmap
> > > > > > by Stephan. So we first considering supporting Python at the
> Table
> > > > level
> > > > > to
> > > > > > cater to the current large number of analytics users. For further
> > > > promote
> > > > > > Python support in flink table level. Dian, Wei and I discussed
> > > offline
> > > > a
> > > > > > bit and came up with an initial features outline as follows:
> > > > > >
> > > > > > - Python TableAPI Interface
> > > > > >   Introduce a set of Python Table API interfaces, including
> > interface
> > > > > > definitions such as Table, TableEnvironment, TableConfig, etc.
> > > > > >
> > > > > > - Implementation Architecture
> > > > > >   We will offer two alternative architecture options, one for
> pure
> > > > Python
> > > > > > language support and one for extended multi-language design.
> > > > > >
> > > > > > - Job Submission
> > > > > >   Provide a way that can submit(local/remote) Python Table API
> > jobs.
> > > > > >
> > > > > > - Python Shell
> > > > > >   Python Shell is to provide an interactive way for users to
> write
> > > and
> > > > > > execute flink Python Table API jobs.
> > > > > >
> > > > > >
> > > > > > The design document for FLIP-38 can be found here:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> > > > > >
> > > > > > I am looking forward to your comments and feedback.
> > > > > >
> > > > > > Best,
> > > > > > Jincheng
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

Stephan Ewen
One more thought:

The FLIP is very much centered on the CLI and it looks like it has mainly
batch jobs and session clusters in mind.

In very many cases, especially in streaming cases, the CLI (or shell) is
not the entry point for a program.
See for example the use of Flink jobs on Kubernetes (Container Mode /
Entrypoint).

It would be super cool if the Python API would work seamlessly with all
modes of starting Flink jobs.
That would make i available to all users.

On Thu, Apr 11, 2019 at 5:34 PM Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> I think that all the opinions and ideas are not actually in conflict, so
> let me summarize what I understand is the proposal:
>
> *(1) Long-term goal: Full Python Table API with UDFs*
>
>      To break the implementation effort up into stages, the first step
> would be the API without UDFs.
>       Because of all the built-in functions in the Table API, this can
> already exist by itself, with some value, but ultimately is quite limited
> without UDF support.
>
>      ==> The FLIP should probably reflect the full goal rather than the
> first implementation step only, this would make sure everyone understands
> what the final goal of the effort is.
>
>
> *(2) Relationship to Beam Language Portability*
>
> Flink's own Python Table API and Beam-Python on Flink add different value
> and are both attractive for different scenarios.
>
>   - Beam's Python API supports complex pipelines in a similar style as the
> DataStream API. There is also the ecosystem of libraries built on top that
> DSL, for example for machine learning.
>
>   - Flink's Python Table API builds mostly relational expressions, plus
> some UDFs. Most of the Python code never executes in Python, though. It is
> geared at use cases similar to Flink's Table API.
>
> Both approaches mainly differ in how the streaming DAG is built from
> Python code and received by the JVM.
>
> In previous discussions, we concluded that for inter process data exchange
> (JVM <> Python), we want to share code with Beam.
> That part is possibly the most crucial piece to getting performance out of
> the Python DSL, so will benefit from sharing development, optimizations,
> etc.
>
> Best,
> Stephan
>
>
>
>
> On Fri, Apr 5, 2019 at 5:25 PM jincheng sun <[hidden email]>
> wrote:
>
>> One more thing It's better to mention that Flink table API is a superset
>> of
>> Flink SQL, such as:
>> - AddColumns/DropColums/RenameColumns, the detail can be found in Google
>> doc
>> <
>> https://docs.google.com/document/d/1tryl6swt1K1pw7yvv5pdvFXSxfrBZ3_OkOObymis2ck/edit#heading=h.7rwcjbvr52dc
>> >
>> - Interactive Programming in Flink Table API, the detail can be found in
>> FLIP-36
>> <
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>> >
>> I think In the future, more and more features that cannot be expressed in
>> SQL will be added in Table API.
>>
>> Thomas Weise <[hidden email]> 于2019年4月5日周五 下午12:11写道:
>>
>> > Hi Jincheng,
>> >
>> > >
>> > > Yes, we can add use case examples in both google doc and FLIP, I had
>> > > already add the simple usage in the google doc, here I want to know
>> which
>> > > kind of examples you want? :)
>> > >
>> >
>> > Do you have use cases where the Python table API can be applied without
>> UDF
>> > support?
>> >
>> > (And where the same could not be accomplished with just SQL.)
>> >
>> >
>> > > The very short answer to UDF support is Yes. As you said, we need UDF
>> > > support on the Python Table API, including (UDF, UDTF, UDAF). This
>> needs
>> > to
>> > > be discussed after basic Python TableAPI supported. Because UDF
>> involves
>> > > the management of the python environment, Runtime level Java and
>> Runtime
>> > > communication, and UDAF in Flink also involves the application of
>> State,
>> > so
>> > > this is a topic that is worth discussing in depth in a separate
>> thread.
>> > >
>> >
>> > The current proposal for job submission touches something that Beam
>> > portability already had to solve.
>> >
>> > If we think that the Python table API will only be useful with UDF
>> support
>> > (question above), then it may be better to discuss the first step with
>> the
>> > final goal in mind. If we find that Beam can be used for the UDF part
>> then
>> > approach 1 vs. approach 2 in the doc (for the client side language
>> > boundary) may look different.
>> >
>> >
>> > >
>> > > I think that no matter how the Flink and Beam work together on the UDF
>> > > level, it will not affect the current Python API (interface), we can
>> > first
>> > > support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF)
>> > > support.
>> > >
>> > >
>> > I agree that the client side API should not be affected.
>> >
>> >
>> > > And great thanks for your valuable comments in Google doc! I will
>> > feedback
>> > > you in the google doc. :)
>> > >
>> > >
>> > > Regards,
>> > > Jincheng
>> > >
>> > > Thomas Weise <[hidden email]> 于2019年4月4日周四 上午8:03写道:
>> > >
>> > > > Thanks for putting this proposal together.
>> > > >
>> > > > It would be nice, if you could share a few use case examples (maybe
>> add
>> > > > them as section to the FLIP?).
>> > > >
>> > > > The reason I ask: The table API is immensely useful, but it isn't
>> clear
>> > > to
>> > > > me what value other language bindings provide without UDF support.
>> With
>> > > > FLIP-38 it will be possible to write a program in Python, but not
>> > execute
>> > > > Python functions. Without UDF support, isn't it possible to achieve
>> > > roughly
>> > > > the same with plain SQL? In which situation would I use the Python
>> API?
>> > > >
>> > > > There was related discussion regarding UDF support in [1]. If the
>> > > > assumption is that such support will be added later, then I would
>> like
>> > to
>> > > > circle back to the question why this cannot be built on top of
>> Beam? It
>> > > > would be nice to clarify the bigger goal before embarking for the
>> first
>> > > > milestone.
>> > > >
>> > > > I'm going to comment on other things in the doc.
>> > > >
>> > > > [1]
>> > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
>> > > >
>> > > > Thomas
>> > > >
>> > > >
>> > > > On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]>
>> wrote:
>> > > >
>> > > > > Thanks a lot for driving the FLIP, jincheng. The approach looks
>> > > > > good. Adding multi-lang support sounds a promising direction to
>> > expand
>> > > > the
>> > > > > footprint of Flink. Do we have plan for adding Golang support? As
>> > many
>> > > > > backend engineers nowadays are familiar with Go, but probably not
>> > Java
>> > > as
>> > > > > much, adding Golang support would significantly reduce their
>> friction
>> > > to
>> > > > > use Flink. Also, do we have a design for multi-lang UDF support,
>> and
>> > > > what's
>> > > > > timeline for adding DataStream API support? We would like to help
>> and
>> > > > > contribute as well as we do have similar need internally at our
>> > > company.
>> > > > > Thanks a lot.
>> > > > >
>> > > > > Shuyi
>> > > > >
>> > > > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <
>> > [hidden email]>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi All,
>> > > > > > As Xianda brought up in the previous email, There are a large
>> > number
>> > > of
>> > > > > > data analysis users who want flink to support Python. At the
>> Flink
>> > > API
>> > > > > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the Table
>> API
>> > > > will
>> > > > > > become the first-class citizen. Table API is declarative and
>> can be
>> > > > > > automatically optimized, which is mentioned in the Flink
>> mid-term
>> > > > roadmap
>> > > > > > by Stephan. So we first considering supporting Python at the
>> Table
>> > > > level
>> > > > > to
>> > > > > > cater to the current large number of analytics users. For
>> further
>> > > > promote
>> > > > > > Python support in flink table level. Dian, Wei and I discussed
>> > > offline
>> > > > a
>> > > > > > bit and came up with an initial features outline as follows:
>> > > > > >
>> > > > > > - Python TableAPI Interface
>> > > > > >   Introduce a set of Python Table API interfaces, including
>> > interface
>> > > > > > definitions such as Table, TableEnvironment, TableConfig, etc.
>> > > > > >
>> > > > > > - Implementation Architecture
>> > > > > >   We will offer two alternative architecture options, one for
>> pure
>> > > > Python
>> > > > > > language support and one for extended multi-language design.
>> > > > > >
>> > > > > > - Job Submission
>> > > > > >   Provide a way that can submit(local/remote) Python Table API
>> > jobs.
>> > > > > >
>> > > > > > - Python Shell
>> > > > > >   Python Shell is to provide an interactive way for users to
>> write
>> > > and
>> > > > > > execute flink Python Table API jobs.
>> > > > > >
>> > > > > >
>> > > > > > The design document for FLIP-38 can be found here:
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
>> > > > > >
>> > > > > > I am looking forward to your comments and feedback.
>> > > > > >
>> > > > > > Best,
>> > > > > > Jincheng
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
Hi Stephan,

Thanks for your suggestion and summarize. :)

     ==> The FLIP should probably reflect the full goal rather than the
> first implementation step only, this would make sure everyone understands
> what the final goal of the effort is.


I totally agree that we can implement the function in stages, but FLIP
needs to reflect the full final goal. I agree with Thomas and you,  I will
add the design of the UDF part later.

Yes, you are right, currently, we only consider the `flink run` and
`python-shell` as the job entry point. and we should add REST API for
another entry point.

It would be super cool if the Python API would work seamlessly with all
> modes of starting Flink jobs.


If my understand you correctly, support Python TableAPI in Kubernetes, we
only need to increase (or improve the existing) REST API corresponding to
the Python Table API, of course, it also may need to release Docker Image
that supports Python, it will easily deploy Python TableAPI into
Kubernetes.

So, Finally, we support the following ways to submit Python TableAPI:
- Python Shell - interactive development.
- CLI - submit the job by `flink run`. e.g: deploy job into the yarn
cluster.
- REST - submit the job by REST API. e.g: deploy job into the kubernetes
cluster.

Please correct me if there are any incorrect understanding.

Thanks,
Jincheng


Stephan Ewen <[hidden email]> 于2019年4月12日周五 上午12:22写道:

> One more thought:
>
> The FLIP is very much centered on the CLI and it looks like it has mainly
> batch jobs and session clusters in mind.
>
> In very many cases, especially in streaming cases, the CLI (or shell) is
> not the entry point for a program.
> See for example the use of Flink jobs on Kubernetes (Container Mode /
> Entrypoint).
>
> It would be super cool if the Python API would work seamlessly with all
> modes of starting Flink jobs.
> That would make i available to all users.
>
> On Thu, Apr 11, 2019 at 5:34 PM Stephan Ewen <[hidden email]> wrote:
>
> > Hi all!
> >
> > I think that all the opinions and ideas are not actually in conflict, so
> > let me summarize what I understand is the proposal:
> >
> > *(1) Long-term goal: Full Python Table API with UDFs*
> >
> >      To break the implementation effort up into stages, the first step
> > would be the API without UDFs.
> >       Because of all the built-in functions in the Table API, this can
> > already exist by itself, with some value, but ultimately is quite limited
> > without UDF support.
> >
> >      ==> The FLIP should probably reflect the full goal rather than the
> > first implementation step only, this would make sure everyone understands
> > what the final goal of the effort is.
> >
> >
> > *(2) Relationship to Beam Language Portability*
> >
> > Flink's own Python Table API and Beam-Python on Flink add different value
> > and are both attractive for different scenarios.
> >
> >   - Beam's Python API supports complex pipelines in a similar style as
> the
> > DataStream API. There is also the ecosystem of libraries built on top
> that
> > DSL, for example for machine learning.
> >
> >   - Flink's Python Table API builds mostly relational expressions, plus
> > some UDFs. Most of the Python code never executes in Python, though. It
> is
> > geared at use cases similar to Flink's Table API.
> >
> > Both approaches mainly differ in how the streaming DAG is built from
> > Python code and received by the JVM.
> >
> > In previous discussions, we concluded that for inter process data
> exchange
> > (JVM <> Python), we want to share code with Beam.
> > That part is possibly the most crucial piece to getting performance out
> of
> > the Python DSL, so will benefit from sharing development, optimizations,
> > etc.
> >
> > Best,
> > Stephan
> >
> >
> >
> >
> > On Fri, Apr 5, 2019 at 5:25 PM jincheng sun <[hidden email]>
> > wrote:
> >
> >> One more thing It's better to mention that Flink table API is a superset
> >> of
> >> Flink SQL, such as:
> >> - AddColumns/DropColums/RenameColumns, the detail can be found in Google
> >> doc
> >> <
> >>
> https://docs.google.com/document/d/1tryl6swt1K1pw7yvv5pdvFXSxfrBZ3_OkOObymis2ck/edit#heading=h.7rwcjbvr52dc
> >> >
> >> - Interactive Programming in Flink Table API, the detail can be found in
> >> FLIP-36
> >> <
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
> >> >
> >> I think In the future, more and more features that cannot be expressed
> in
> >> SQL will be added in Table API.
> >>
> >> Thomas Weise <[hidden email]> 于2019年4月5日周五 下午12:11写道:
> >>
> >> > Hi Jincheng,
> >> >
> >> > >
> >> > > Yes, we can add use case examples in both google doc and FLIP, I had
> >> > > already add the simple usage in the google doc, here I want to know
> >> which
> >> > > kind of examples you want? :)
> >> > >
> >> >
> >> > Do you have use cases where the Python table API can be applied
> without
> >> UDF
> >> > support?
> >> >
> >> > (And where the same could not be accomplished with just SQL.)
> >> >
> >> >
> >> > > The very short answer to UDF support is Yes. As you said, we need
> UDF
> >> > > support on the Python Table API, including (UDF, UDTF, UDAF). This
> >> needs
> >> > to
> >> > > be discussed after basic Python TableAPI supported. Because UDF
> >> involves
> >> > > the management of the python environment, Runtime level Java and
> >> Runtime
> >> > > communication, and UDAF in Flink also involves the application of
> >> State,
> >> > so
> >> > > this is a topic that is worth discussing in depth in a separate
> >> thread.
> >> > >
> >> >
> >> > The current proposal for job submission touches something that Beam
> >> > portability already had to solve.
> >> >
> >> > If we think that the Python table API will only be useful with UDF
> >> support
> >> > (question above), then it may be better to discuss the first step with
> >> the
> >> > final goal in mind. If we find that Beam can be used for the UDF part
> >> then
> >> > approach 1 vs. approach 2 in the doc (for the client side language
> >> > boundary) may look different.
> >> >
> >> >
> >> > >
> >> > > I think that no matter how the Flink and Beam work together on the
> UDF
> >> > > level, it will not affect the current Python API (interface), we can
> >> > first
> >> > > support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF)
> >> > > support.
> >> > >
> >> > >
> >> > I agree that the client side API should not be affected.
> >> >
> >> >
> >> > > And great thanks for your valuable comments in Google doc! I will
> >> > feedback
> >> > > you in the google doc. :)
> >> > >
> >> > >
> >> > > Regards,
> >> > > Jincheng
> >> > >
> >> > > Thomas Weise <[hidden email]> 于2019年4月4日周四 上午8:03写道:
> >> > >
> >> > > > Thanks for putting this proposal together.
> >> > > >
> >> > > > It would be nice, if you could share a few use case examples
> (maybe
> >> add
> >> > > > them as section to the FLIP?).
> >> > > >
> >> > > > The reason I ask: The table API is immensely useful, but it isn't
> >> clear
> >> > > to
> >> > > > me what value other language bindings provide without UDF support.
> >> With
> >> > > > FLIP-38 it will be possible to write a program in Python, but not
> >> > execute
> >> > > > Python functions. Without UDF support, isn't it possible to
> achieve
> >> > > roughly
> >> > > > the same with plain SQL? In which situation would I use the Python
> >> API?
> >> > > >
> >> > > > There was related discussion regarding UDF support in [1]. If the
> >> > > > assumption is that such support will be added later, then I would
> >> like
> >> > to
> >> > > > circle back to the question why this cannot be built on top of
> >> Beam? It
> >> > > > would be nice to clarify the bigger goal before embarking for the
> >> first
> >> > > > milestone.
> >> > > >
> >> > > > I'm going to comment on other things in the doc.
> >> > > >
> >> > > > [1]
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
> >> > > >
> >> > > > Thomas
> >> > > >
> >> > > >
> >> > > > On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]>
> >> wrote:
> >> > > >
> >> > > > > Thanks a lot for driving the FLIP, jincheng. The approach looks
> >> > > > > good. Adding multi-lang support sounds a promising direction to
> >> > expand
> >> > > > the
> >> > > > > footprint of Flink. Do we have plan for adding Golang support?
> As
> >> > many
> >> > > > > backend engineers nowadays are familiar with Go, but probably
> not
> >> > Java
> >> > > as
> >> > > > > much, adding Golang support would significantly reduce their
> >> friction
> >> > > to
> >> > > > > use Flink. Also, do we have a design for multi-lang UDF support,
> >> and
> >> > > > what's
> >> > > > > timeline for adding DataStream API support? We would like to
> help
> >> and
> >> > > > > contribute as well as we do have similar need internally at our
> >> > > company.
> >> > > > > Thanks a lot.
> >> > > > >
> >> > > > > Shuyi
> >> > > > >
> >> > > > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <
> >> > [hidden email]>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hi All,
> >> > > > > > As Xianda brought up in the previous email, There are a large
> >> > number
> >> > > of
> >> > > > > > data analysis users who want flink to support Python. At the
> >> Flink
> >> > > API
> >> > > > > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the
> Table
> >> API
> >> > > > will
> >> > > > > > become the first-class citizen. Table API is declarative and
> >> can be
> >> > > > > > automatically optimized, which is mentioned in the Flink
> >> mid-term
> >> > > > roadmap
> >> > > > > > by Stephan. So we first considering supporting Python at the
> >> Table
> >> > > > level
> >> > > > > to
> >> > > > > > cater to the current large number of analytics users. For
> >> further
> >> > > > promote
> >> > > > > > Python support in flink table level. Dian, Wei and I discussed
> >> > > offline
> >> > > > a
> >> > > > > > bit and came up with an initial features outline as follows:
> >> > > > > >
> >> > > > > > - Python TableAPI Interface
> >> > > > > >   Introduce a set of Python Table API interfaces, including
> >> > interface
> >> > > > > > definitions such as Table, TableEnvironment, TableConfig, etc.
> >> > > > > >
> >> > > > > > - Implementation Architecture
> >> > > > > >   We will offer two alternative architecture options, one for
> >> pure
> >> > > > Python
> >> > > > > > language support and one for extended multi-language design.
> >> > > > > >
> >> > > > > > - Job Submission
> >> > > > > >   Provide a way that can submit(local/remote) Python Table API
> >> > jobs.
> >> > > > > >
> >> > > > > > - Python Shell
> >> > > > > >   Python Shell is to provide an interactive way for users to
> >> write
> >> > > and
> >> > > > > > execute flink Python Table API jobs.
> >> > > > > >
> >> > > > > >
> >> > > > > > The design document for FLIP-38 can be found here:
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
> >> > > > > >
> >> > > > > > I am looking forward to your comments and feedback.
> >> > > > > >
> >> > > > > > Best,
> >> > > > > > Jincheng
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-38 Support python language in flink TableAPI

jincheng sun
Hi everyone,

Thank you for all of your feedback and comments in google doc!

I have updated the google doc and add the UDFs part. For a short summary:

  - Python TableAPI - Flink introduces a set of Python Table API Interfaces
which align with Flink Java Table API. It uses Py4j framework to
communicate between Python VM  and Java VM.
  - Python User-defined functions - IMO. Flink supports the communication
framework of UDFs, we will try to reuse the existing achievements of Beam
as much as possible, and do our best for this. The first step is
      to solve the above interface definition problem, which turns `
WindowedValue<T>` into `T` in the FnDataService and BeamFnDataClient
interface definition, has been discussed in the Beam community.

The detail can be fonded here:
https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing

So we can start the development of Table API without UDFs in Flink, and
work with the Beam community to promote the abstraction of Beam.

What do you think?

Regards,
Jincheng

jincheng sun <[hidden email]> 于2019年4月17日周三 下午4:01写道:

> Hi Stephan,
>
> Thanks for your suggestion and summarize. :)
>
>      ==> The FLIP should probably reflect the full goal rather than the
>> first implementation step only, this would make sure everyone understands
>> what the final goal of the effort is.
>
>
> I totally agree that we can implement the function in stages, but FLIP
> needs to reflect the full final goal. I agree with Thomas and you,  I will
> add the design of the UDF part later.
>
> Yes, you are right, currently, we only consider the `flink run` and
> `python-shell` as the job entry point. and we should add REST API for
> another entry point.
>
> It would be super cool if the Python API would work seamlessly with all
>> modes of starting Flink jobs.
>
>
> If my understand you correctly, support Python TableAPI in Kubernetes, we
> only need to increase (or improve the existing) REST API corresponding to
> the Python Table API, of course, it also may need to release Docker Image
> that supports Python, it will easily deploy Python TableAPI into
> Kubernetes.
>
> So, Finally, we support the following ways to submit Python TableAPI:
> - Python Shell - interactive development.
> - CLI - submit the job by `flink run`. e.g: deploy job into the yarn
> cluster.
> - REST - submit the job by REST API. e.g: deploy job into the kubernetes
> cluster.
>
> Please correct me if there are any incorrect understanding.
>
> Thanks,
> Jincheng
>
>
> Stephan Ewen <[hidden email]> 于2019年4月12日周五 上午12:22写道:
>
>> One more thought:
>>
>> The FLIP is very much centered on the CLI and it looks like it has mainly
>> batch jobs and session clusters in mind.
>>
>> In very many cases, especially in streaming cases, the CLI (or shell) is
>> not the entry point for a program.
>> See for example the use of Flink jobs on Kubernetes (Container Mode /
>> Entrypoint).
>>
>> It would be super cool if the Python API would work seamlessly with all
>> modes of starting Flink jobs.
>> That would make i available to all users.
>>
>> On Thu, Apr 11, 2019 at 5:34 PM Stephan Ewen <[hidden email]> wrote:
>>
>> > Hi all!
>> >
>> > I think that all the opinions and ideas are not actually in conflict, so
>> > let me summarize what I understand is the proposal:
>> >
>> > *(1) Long-term goal: Full Python Table API with UDFs*
>> >
>> >      To break the implementation effort up into stages, the first step
>> > would be the API without UDFs.
>> >       Because of all the built-in functions in the Table API, this can
>> > already exist by itself, with some value, but ultimately is quite
>> limited
>> > without UDF support.
>> >
>> >      ==> The FLIP should probably reflect the full goal rather than the
>> > first implementation step only, this would make sure everyone
>> understands
>> > what the final goal of the effort is.
>> >
>> >
>> > *(2) Relationship to Beam Language Portability*
>> >
>> > Flink's own Python Table API and Beam-Python on Flink add different
>> value
>> > and are both attractive for different scenarios.
>> >
>> >   - Beam's Python API supports complex pipelines in a similar style as
>> the
>> > DataStream API. There is also the ecosystem of libraries built on top
>> that
>> > DSL, for example for machine learning.
>> >
>> >   - Flink's Python Table API builds mostly relational expressions, plus
>> > some UDFs. Most of the Python code never executes in Python, though. It
>> is
>> > geared at use cases similar to Flink's Table API.
>> >
>> > Both approaches mainly differ in how the streaming DAG is built from
>> > Python code and received by the JVM.
>> >
>> > In previous discussions, we concluded that for inter process data
>> exchange
>> > (JVM <> Python), we want to share code with Beam.
>> > That part is possibly the most crucial piece to getting performance out
>> of
>> > the Python DSL, so will benefit from sharing development, optimizations,
>> > etc.
>> >
>> > Best,
>> > Stephan
>> >
>> >
>> >
>> >
>> > On Fri, Apr 5, 2019 at 5:25 PM jincheng sun <[hidden email]>
>> > wrote:
>> >
>> >> One more thing It's better to mention that Flink table API is a
>> superset
>> >> of
>> >> Flink SQL, such as:
>> >> - AddColumns/DropColums/RenameColumns, the detail can be found in
>> Google
>> >> doc
>> >> <
>> >>
>> https://docs.google.com/document/d/1tryl6swt1K1pw7yvv5pdvFXSxfrBZ3_OkOObymis2ck/edit#heading=h.7rwcjbvr52dc
>> >> >
>> >> - Interactive Programming in Flink Table API, the detail can be found
>> in
>> >> FLIP-36
>> >> <
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-36%3A+Support+Interactive+Programming+in+Flink
>> >> >
>> >> I think In the future, more and more features that cannot be expressed
>> in
>> >> SQL will be added in Table API.
>> >>
>> >> Thomas Weise <[hidden email]> 于2019年4月5日周五 下午12:11写道:
>> >>
>> >> > Hi Jincheng,
>> >> >
>> >> > >
>> >> > > Yes, we can add use case examples in both google doc and FLIP, I
>> had
>> >> > > already add the simple usage in the google doc, here I want to know
>> >> which
>> >> > > kind of examples you want? :)
>> >> > >
>> >> >
>> >> > Do you have use cases where the Python table API can be applied
>> without
>> >> UDF
>> >> > support?
>> >> >
>> >> > (And where the same could not be accomplished with just SQL.)
>> >> >
>> >> >
>> >> > > The very short answer to UDF support is Yes. As you said, we need
>> UDF
>> >> > > support on the Python Table API, including (UDF, UDTF, UDAF). This
>> >> needs
>> >> > to
>> >> > > be discussed after basic Python TableAPI supported. Because UDF
>> >> involves
>> >> > > the management of the python environment, Runtime level Java and
>> >> Runtime
>> >> > > communication, and UDAF in Flink also involves the application of
>> >> State,
>> >> > so
>> >> > > this is a topic that is worth discussing in depth in a separate
>> >> thread.
>> >> > >
>> >> >
>> >> > The current proposal for job submission touches something that Beam
>> >> > portability already had to solve.
>> >> >
>> >> > If we think that the Python table API will only be useful with UDF
>> >> support
>> >> > (question above), then it may be better to discuss the first step
>> with
>> >> the
>> >> > final goal in mind. If we find that Beam can be used for the UDF part
>> >> then
>> >> > approach 1 vs. approach 2 in the doc (for the client side language
>> >> > boundary) may look different.
>> >> >
>> >> >
>> >> > >
>> >> > > I think that no matter how the Flink and Beam work together on the
>> UDF
>> >> > > level, it will not affect the current Python API (interface), we
>> can
>> >> > first
>> >> > > support the Python API in Flink. Then start the UDX (UDF/UDTF/UDAF)
>> >> > > support.
>> >> > >
>> >> > >
>> >> > I agree that the client side API should not be affected.
>> >> >
>> >> >
>> >> > > And great thanks for your valuable comments in Google doc! I will
>> >> > feedback
>> >> > > you in the google doc. :)
>> >> > >
>> >> > >
>> >> > > Regards,
>> >> > > Jincheng
>> >> > >
>> >> > > Thomas Weise <[hidden email]> 于2019年4月4日周四 上午8:03写道:
>> >> > >
>> >> > > > Thanks for putting this proposal together.
>> >> > > >
>> >> > > > It would be nice, if you could share a few use case examples
>> (maybe
>> >> add
>> >> > > > them as section to the FLIP?).
>> >> > > >
>> >> > > > The reason I ask: The table API is immensely useful, but it isn't
>> >> clear
>> >> > > to
>> >> > > > me what value other language bindings provide without UDF
>> support.
>> >> With
>> >> > > > FLIP-38 it will be possible to write a program in Python, but not
>> >> > execute
>> >> > > > Python functions. Without UDF support, isn't it possible to
>> achieve
>> >> > > roughly
>> >> > > > the same with plain SQL? In which situation would I use the
>> Python
>> >> API?
>> >> > > >
>> >> > > > There was related discussion regarding UDF support in [1]. If the
>> >> > > > assumption is that such support will be added later, then I would
>> >> like
>> >> > to
>> >> > > > circle back to the question why this cannot be built on top of
>> >> Beam? It
>> >> > > > would be nice to clarify the bigger goal before embarking for the
>> >> first
>> >> > > > milestone.
>> >> > > >
>> >> > > > I'm going to comment on other things in the doc.
>> >> > > >
>> >> > > > [1]
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://lists.apache.org/thread.html/f6f8116b4b38b0b2d70ed45b990d6bb1bcb33611fde6fdf32ec0e840@%3Cdev.flink.apache.org%3E
>> >> > > >
>> >> > > > Thomas
>> >> > > >
>> >> > > >
>> >> > > > On Wed, Apr 3, 2019 at 12:35 PM Shuyi Chen <[hidden email]>
>> >> wrote:
>> >> > > >
>> >> > > > > Thanks a lot for driving the FLIP, jincheng. The approach looks
>> >> > > > > good. Adding multi-lang support sounds a promising direction to
>> >> > expand
>> >> > > > the
>> >> > > > > footprint of Flink. Do we have plan for adding Golang support?
>> As
>> >> > many
>> >> > > > > backend engineers nowadays are familiar with Go, but probably
>> not
>> >> > Java
>> >> > > as
>> >> > > > > much, adding Golang support would significantly reduce their
>> >> friction
>> >> > > to
>> >> > > > > use Flink. Also, do we have a design for multi-lang UDF
>> support,
>> >> and
>> >> > > > what's
>> >> > > > > timeline for adding DataStream API support? We would like to
>> help
>> >> and
>> >> > > > > contribute as well as we do have similar need internally at our
>> >> > > company.
>> >> > > > > Thanks a lot.
>> >> > > > >
>> >> > > > > Shuyi
>> >> > > > >
>> >> > > > > On Tue, Apr 2, 2019 at 1:03 AM jincheng sun <
>> >> > [hidden email]>
>> >> > > > > wrote:
>> >> > > > >
>> >> > > > > > Hi All,
>> >> > > > > > As Xianda brought up in the previous email, There are a large
>> >> > number
>> >> > > of
>> >> > > > > > data analysis users who want flink to support Python. At the
>> >> Flink
>> >> > > API
>> >> > > > > > level, we have DataStreamAPI/DataSetAPI/TableAPI&SQL, the
>> Table
>> >> API
>> >> > > > will
>> >> > > > > > become the first-class citizen. Table API is declarative and
>> >> can be
>> >> > > > > > automatically optimized, which is mentioned in the Flink
>> >> mid-term
>> >> > > > roadmap
>> >> > > > > > by Stephan. So we first considering supporting Python at the
>> >> Table
>> >> > > > level
>> >> > > > > to
>> >> > > > > > cater to the current large number of analytics users. For
>> >> further
>> >> > > > promote
>> >> > > > > > Python support in flink table level. Dian, Wei and I
>> discussed
>> >> > > offline
>> >> > > > a
>> >> > > > > > bit and came up with an initial features outline as follows:
>> >> > > > > >
>> >> > > > > > - Python TableAPI Interface
>> >> > > > > >   Introduce a set of Python Table API interfaces, including
>> >> > interface
>> >> > > > > > definitions such as Table, TableEnvironment, TableConfig,
>> etc.
>> >> > > > > >
>> >> > > > > > - Implementation Architecture
>> >> > > > > >   We will offer two alternative architecture options, one for
>> >> pure
>> >> > > > Python
>> >> > > > > > language support and one for extended multi-language design.
>> >> > > > > >
>> >> > > > > > - Job Submission
>> >> > > > > >   Provide a way that can submit(local/remote) Python Table
>> API
>> >> > jobs.
>> >> > > > > >
>> >> > > > > > - Python Shell
>> >> > > > > >   Python Shell is to provide an interactive way for users to
>> >> write
>> >> > > and
>> >> > > > > > execute flink Python Table API jobs.
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > The design document for FLIP-38 can be found here:
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> https://docs.google.com/document/d/1ybYt-0xWRMa1Yf5VsuqGRtOfJBz4p74ZmDxZYg3j_h8/edit?usp=sharing
>> >> > > > > >
>> >> > > > > > I am looking forward to your comments and feedback.
>> >> > > > > >
>> >> > > > > > Best,
>> >> > > > > > Jincheng
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>>
>
12