[SURVEY] Usage of flink-python and flink-streaming-python

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[SURVEY] Usage of flink-python and flink-streaming-python

Till Rohrmann
Dear Flink community,

in order to better understand the needs of our users and to plan for the
future, I wanted to reach out to you and ask how much you use Flink's
Python API, namely flink-python and flink-streaming-python.

In order to gather feedback, I would like to ask all Python users to
respond to this thread and quickly outline how you use Python in
combination with Flink. Thanks a lot for your help!

Cheers,
Till
Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Xianda Ke
After communicating with some of the internal users at Alibaba, my impression is that:
* Most of them need C extentions support, they want to integrated their algorithms with stream processing,but Jython is unacceptable for them.
* For some users, who are only familiar with SQL/Python, developing Java API application/UDF is too complex. Writing Python UDF and declaring it in SQL is preferred.
* Machine Learning users needs richer Python APIs, such as Table API Python support.

From my point of view, currently Python support has a few caveats in Flink.
* For batch, there is only DataSet Python API.
* For streaming, where Flink really shines, only Jython is supported, but Jython has lots of limitations.
* For most of the big data users, SQL/Table API is more friendly, but Python users have no such APIs right now.
* The interactive Python shell is very user-friendly. It can be used to test interactively and is a simple way to learn the API. However, there is no such interactive Python shell in Flink now.

At Alibaba, Python UDF for SQL has been developed and has been delivered to internal users.  Currently, we start to develop the Python API, and we've drafted a design documentation and will publish it to the community soon for discussion.

Regards,
Xianda


> On Dec 7, 2018, at 11:29 PM, Till Rohrmann <[hidden email]> wrote:
>
> Dear Flink community,
>
> in order to better understand the needs of our users and to plan for the
> future, I wanted to reach out to you and ask how much you use Flink's
> Python API, namely flink-python and flink-streaming-python.
>
> In order to gather feedback, I would like to ask all Python users to
> respond to this thread and quickly outline how you use Python in
> combination with Flink. Thanks a lot for your help!
>
> Cheers,
> Till

Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Xianda Ke
In reply to this post by Till Rohrmann
Xianda Ke <[hidden email]>
9:47 AM (11 minutes ago)
to dev, user
After communicating with some of the internal users at Alibaba, my
impression is that:

   - Most of them need C extensions support, they want to integrated their
   algorithms with stream processing,but Jython is unacceptable for them.
   - For some users, who are only familiar with SQL/Python, developing Java
   API application/UDF is too complex. Writing Python UDF and declaring it in
   SQL is preferred.
   - Machine Learning users needs richer Python APIs, such as Table API
   Python support.


From my point of view, currently Python support has a few caveats in Flink.

   - For batch, there is only DataSet Python API.
   - For streaming, where Flink really shines, only Jython is supported,
   but Jython has lots of limitations.
   - For most of the big data users, SQL/Table API is more friendly, but
   Python users have no such APIs right now.
   - The interactive Python shell is very user-friendly. It can be used to
   test interactively and is a simple way to learn the API. However, there is
   no such interactive Python shell in Flink now.


At Alibaba, Python UDF for SQL has been developed and has been delivered to
internal users.  Currently, we start to develop the Python API, and we've
drafted a design documentation and will publish it to the community soon
for discussion.

Regards,
Xianda

On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <[hidden email]> wrote:

> Dear Flink community,
>
> in order to better understand the needs of our users and to plan for the
> future, I wanted to reach out to you and ask how much you use Flink's
> Python API, namely flink-python and flink-streaming-python.
>
> In order to gather feedback, I would like to ask all Python users to
> respond to this thread and quickly outline how you use Python in
> combination with Flink. Thanks a lot for your help!
>
> Cheers,
> Till
>


--
Ke, Xianda
Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Till Rohrmann
Hi Xianda,

thanks for sharing this detailed feedback. Do I understand you correctly
that flink-python and flink-streaming-python are not usable for the use
cases at Alibaba atm?

Could you share a bit more details about the Python UDFs for SQL? How do
you execute the Python code? Will it work with any Python library? If you
are about to publish the design document then feel free to refer me to this
document.

Cheers,
Till

On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <[hidden email]> wrote:

> Xianda Ke <[hidden email]>
> 9:47 AM (11 minutes ago)
> to dev, user
> After communicating with some of the internal users at Alibaba, my
> impression is that:
>
>    - Most of them need C extensions support, they want to integrated their
>    algorithms with stream processing,but Jython is unacceptable for them.
>    - For some users, who are only familiar with SQL/Python, developing Java
>    API application/UDF is too complex. Writing Python UDF and declaring it
> in
>    SQL is preferred.
>    - Machine Learning users needs richer Python APIs, such as Table API
>    Python support.
>
>
> From my point of view, currently Python support has a few caveats in Flink.
>
>    - For batch, there is only DataSet Python API.
>    - For streaming, where Flink really shines, only Jython is supported,
>    but Jython has lots of limitations.
>    - For most of the big data users, SQL/Table API is more friendly, but
>    Python users have no such APIs right now.
>    - The interactive Python shell is very user-friendly. It can be used to
>    test interactively and is a simple way to learn the API. However, there
> is
>    no such interactive Python shell in Flink now.
>
>
> At Alibaba, Python UDF for SQL has been developed and has been delivered to
> internal users.  Currently, we start to develop the Python API, and we've
> drafted a design documentation and will publish it to the community soon
> for discussion.
>
> Regards,
> Xianda
>
> On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <[hidden email]>
> wrote:
>
> > Dear Flink community,
> >
> > in order to better understand the needs of our users and to plan for the
> > future, I wanted to reach out to you and ask how much you use Flink's
> > Python API, namely flink-python and flink-streaming-python.
> >
> > In order to gather feedback, I would like to ask all Python users to
> > respond to this thread and quickly outline how you use Python in
> > combination with Flink. Thanks a lot for your help!
> >
> > Cheers,
> > Till
> >
>
>
> --
> Ke, Xianda
>
Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Xianda Ke
Hi Till,

1. So far as I know, most of the users at Alibaba are using SQL.  Some of
users at Alibaba want integrated python libraries with Flink for streaming
processing, and Jython is unusable.

2. Python UDFs for SQL:
* declaring python UDF based on Alibaba's internal DDL syntax.
* start a Python process in open()
* communicate with JVM process via Socket.
* Yes, it support python libraries, users can upload virutalenv/conda
Python runtime

3. We've draft a design doc for Python API
 [DISCUSS] Flink Python API
<https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web>

Python UDF for SQL is not discussed in this documentation, we'll create a
new proposal when the SQL DDL is ready.

On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <[hidden email]> wrote:

> Hi Xianda,
>
> thanks for sharing this detailed feedback. Do I understand you correctly
> that flink-python and flink-streaming-python are not usable for the use
> cases at Alibaba atm?
>
> Could you share a bit more details about the Python UDFs for SQL? How do
> you execute the Python code? Will it work with any Python library? If you
> are about to publish the design document then feel free to refer me to this
> document.
>
> Cheers,
> Till
>
> On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <[hidden email]> wrote:
>
> > Xianda Ke <[hidden email]>
> > 9:47 AM (11 minutes ago)
> > to dev, user
> > After communicating with some of the internal users at Alibaba, my
> > impression is that:
> >
> >    - Most of them need C extensions support, they want to integrated
> their
> >    algorithms with stream processing,but Jython is unacceptable for them.
> >    - For some users, who are only familiar with SQL/Python, developing
> Java
> >    API application/UDF is too complex. Writing Python UDF and declaring
> it
> > in
> >    SQL is preferred.
> >    - Machine Learning users needs richer Python APIs, such as Table API
> >    Python support.
> >
> >
> > From my point of view, currently Python support has a few caveats in
> Flink.
> >
> >    - For batch, there is only DataSet Python API.
> >    - For streaming, where Flink really shines, only Jython is supported,
> >    but Jython has lots of limitations.
> >    - For most of the big data users, SQL/Table API is more friendly, but
> >    Python users have no such APIs right now.
> >    - The interactive Python shell is very user-friendly. It can be used
> to
> >    test interactively and is a simple way to learn the API. However,
> there
> > is
> >    no such interactive Python shell in Flink now.
> >
> >
> > At Alibaba, Python UDF for SQL has been developed and has been delivered
> to
> > internal users.  Currently, we start to develop the Python API, and we've
> > drafted a design documentation and will publish it to the community soon
> > for discussion.
> >
> > Regards,
> > Xianda
> >
> > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <[hidden email]>
> > wrote:
> >
> > > Dear Flink community,
> > >
> > > in order to better understand the needs of our users and to plan for
> the
> > > future, I wanted to reach out to you and ask how much you use Flink's
> > > Python API, namely flink-python and flink-streaming-python.
> > >
> > > In order to gather feedback, I would like to ask all Python users to
> > > respond to this thread and quickly outline how you use Python in
> > > combination with Flink. Thanks a lot for your help!
> > >
> > > Cheers,
> > > Till
> > >
> >
> >
> > --
> > Ke, Xianda
> >
>


--
Ke, Xianda
Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Thomas Weise
Did you take a look at Apache Beam? It already provides a comprehensive
Python SDK and can be used with Flink:
https://beam.apache.org/roadmap/portability/#python-on-flink

We are using it at Lyft for Python streaming pipelines.

Thomas

On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <[hidden email]> wrote:

> Hi Till,
>
> 1. So far as I know, most of the users at Alibaba are using SQL.  Some of
> users at Alibaba want integrated python libraries with Flink for streaming
> processing, and Jython is unusable.
>
> 2. Python UDFs for SQL:
> * declaring python UDF based on Alibaba's internal DDL syntax.
> * start a Python process in open()
> * communicate with JVM process via Socket.
> * Yes, it support python libraries, users can upload virutalenv/conda
> Python runtime
>
> 3. We've draft a design doc for Python API
>  [DISCUSS] Flink Python API
> <
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> >
>
> Python UDF for SQL is not discussed in this documentation, we'll create a
> new proposal when the SQL DDL is ready.
>
> On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <[hidden email]>
> wrote:
>
> > Hi Xianda,
> >
> > thanks for sharing this detailed feedback. Do I understand you correctly
> > that flink-python and flink-streaming-python are not usable for the use
> > cases at Alibaba atm?
> >
> > Could you share a bit more details about the Python UDFs for SQL? How do
> > you execute the Python code? Will it work with any Python library? If you
> > are about to publish the design document then feel free to refer me to
> this
> > document.
> >
> > Cheers,
> > Till
> >
> > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <[hidden email]> wrote:
> >
> > > Xianda Ke <[hidden email]>
> > > 9:47 AM (11 minutes ago)
> > > to dev, user
> > > After communicating with some of the internal users at Alibaba, my
> > > impression is that:
> > >
> > >    - Most of them need C extensions support, they want to integrated
> > their
> > >    algorithms with stream processing,but Jython is unacceptable for
> them.
> > >    - For some users, who are only familiar with SQL/Python, developing
> > Java
> > >    API application/UDF is too complex. Writing Python UDF and declaring
> > it
> > > in
> > >    SQL is preferred.
> > >    - Machine Learning users needs richer Python APIs, such as Table API
> > >    Python support.
> > >
> > >
> > > From my point of view, currently Python support has a few caveats in
> > Flink.
> > >
> > >    - For batch, there is only DataSet Python API.
> > >    - For streaming, where Flink really shines, only Jython is
> supported,
> > >    but Jython has lots of limitations.
> > >    - For most of the big data users, SQL/Table API is more friendly,
> but
> > >    Python users have no such APIs right now.
> > >    - The interactive Python shell is very user-friendly. It can be used
> > to
> > >    test interactively and is a simple way to learn the API. However,
> > there
> > > is
> > >    no such interactive Python shell in Flink now.
> > >
> > >
> > > At Alibaba, Python UDF for SQL has been developed and has been
> delivered
> > to
> > > internal users.  Currently, we start to develop the Python API, and
> we've
> > > drafted a design documentation and will publish it to the community
> soon
> > > for discussion.
> > >
> > > Regards,
> > > Xianda
> > >
> > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <[hidden email]>
> > > wrote:
> > >
> > > > Dear Flink community,
> > > >
> > > > in order to better understand the needs of our users and to plan for
> > the
> > > > future, I wanted to reach out to you and ask how much you use Flink's
> > > > Python API, namely flink-python and flink-streaming-python.
> > > >
> > > > In order to gather feedback, I would like to ask all Python users to
> > > > respond to this thread and quickly outline how you use Python in
> > > > combination with Flink. Thanks a lot for your help!
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > >
> > >
> > > --
> > > Ke, Xianda
> > >
> >
>
>
> --
> Ke, Xianda
>
Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Stephan Ewen
I like that we are having a general discussion about how to use Python and
Flink together in the future.
The current python support has some shortcomings that were mentioned
before, so we clearly need something better.

Parts of the community have worked together with the Apache Beam project,
which is pretty far in adding a portability layer to support Python.
Before we dive deep into a design proposal for a new Python API in Flink, I
think we should figure out in which general direction Python support should
go.

*Option (1): Language portability via Apache Beam*

Pro:
  - already exists to a large extend and already has users
  - portability layer offers other languages in addition to python. Go is
in the making, NodeJS has been speculated, etc.
  - collaboration with another project / community which means more
manpower and exposure. Beam currently has a strong focus on Flink as a
runner for Python.
  - Python API is used for existing ML libraries from the TensorFlow
ecosystem

Con:
  - Not Flink's API. Python users need to learn the syntax of another API
(Python API is inherently different, but even more different here).

*Option (2): Implement own Python API*

Pro:
  - Python API will be closer to Flink Java / Scala APIs

Con:
  - We will only have Python.
  - Need to to rebuild the Python language bridge (significant work to get
stable)
  - might lose tight collaboration with Beam and the other parties in Beam
  - not benefiting from Beam's ecosystem

*Option (3): **Implement own portability layer*

Pro
  - Flexibility to align APIs across languages within Flink ecosystem

Con
  - A lot of work (for context, to get this feature complete, Beam has
worked on that for a year now)
  - Replicating work that already exists
  - good chance to lose tight collaboration with Beam and parties in that
project
  - not benefiting from Beam's ecosystem

Best,
Stephan


On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise <[hidden email]> wrote:

> Did you take a look at Apache Beam? It already provides a comprehensive
> Python SDK and can be used with Flink:
> https://beam.apache.org/roadmap/portability/#python-on-flink
>
> We are using it at Lyft for Python streaming pipelines.
>
> Thomas
>
> On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <[hidden email]> wrote:
>
> > Hi Till,
> >
> > 1. So far as I know, most of the users at Alibaba are using SQL.  Some of
> > users at Alibaba want integrated python libraries with Flink for
> streaming
> > processing, and Jython is unusable.
> >
> > 2. Python UDFs for SQL:
> > * declaring python UDF based on Alibaba's internal DDL syntax.
> > * start a Python process in open()
> > * communicate with JVM process via Socket.
> > * Yes, it support python libraries, users can upload virutalenv/conda
> > Python runtime
> >
> > 3. We've draft a design doc for Python API
> >  [DISCUSS] Flink Python API
> > <
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > >
> >
> > Python UDF for SQL is not discussed in this documentation, we'll create a
> > new proposal when the SQL DDL is ready.
> >
> > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <[hidden email]>
> > wrote:
> >
> > > Hi Xianda,
> > >
> > > thanks for sharing this detailed feedback. Do I understand you
> correctly
> > > that flink-python and flink-streaming-python are not usable for the use
> > > cases at Alibaba atm?
> > >
> > > Could you share a bit more details about the Python UDFs for SQL? How
> do
> > > you execute the Python code? Will it work with any Python library? If
> you
> > > are about to publish the design document then feel free to refer me to
> > this
> > > document.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <[hidden email]> wrote:
> > >
> > > > Xianda Ke <[hidden email]>
> > > > 9:47 AM (11 minutes ago)
> > > > to dev, user
> > > > After communicating with some of the internal users at Alibaba, my
> > > > impression is that:
> > > >
> > > >    - Most of them need C extensions support, they want to integrated
> > > their
> > > >    algorithms with stream processing,but Jython is unacceptable for
> > them.
> > > >    - For some users, who are only familiar with SQL/Python,
> developing
> > > Java
> > > >    API application/UDF is too complex. Writing Python UDF and
> declaring
> > > it
> > > > in
> > > >    SQL is preferred.
> > > >    - Machine Learning users needs richer Python APIs, such as Table
> API
> > > >    Python support.
> > > >
> > > >
> > > > From my point of view, currently Python support has a few caveats in
> > > Flink.
> > > >
> > > >    - For batch, there is only DataSet Python API.
> > > >    - For streaming, where Flink really shines, only Jython is
> > supported,
> > > >    but Jython has lots of limitations.
> > > >    - For most of the big data users, SQL/Table API is more friendly,
> > but
> > > >    Python users have no such APIs right now.
> > > >    - The interactive Python shell is very user-friendly. It can be
> used
> > > to
> > > >    test interactively and is a simple way to learn the API. However,
> > > there
> > > > is
> > > >    no such interactive Python shell in Flink now.
> > > >
> > > >
> > > > At Alibaba, Python UDF for SQL has been developed and has been
> > delivered
> > > to
> > > > internal users.  Currently, we start to develop the Python API, and
> > we've
> > > > drafted a design documentation and will publish it to the community
> > soon
> > > > for discussion.
> > > >
> > > > Regards,
> > > > Xianda
> > > >
> > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <[hidden email]>
> > > > wrote:
> > > >
> > > > > Dear Flink community,
> > > > >
> > > > > in order to better understand the needs of our users and to plan
> for
> > > the
> > > > > future, I wanted to reach out to you and ask how much you use
> Flink's
> > > > > Python API, namely flink-python and flink-streaming-python.
> > > > >
> > > > > In order to gather feedback, I would like to ask all Python users
> to
> > > > > respond to this thread and quickly outline how you use Python in
> > > > > combination with Flink. Thanks a lot for your help!
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > >
> > > >
> > > > --
> > > > Ke, Xianda
> > > >
> > >
> >
> >
> > --
> > Ke, Xianda
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Xianda Ke
Hi Folks,
To avoid polluting the survey thread with discussions, we started separate
thread and maybe we can continue the discussion over there.

Regards,
Xianda

On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen <[hidden email]> wrote:

> I like that we are having a general discussion about how to use Python and
> Flink together in the future.
> The current python support has some shortcomings that were mentioned
> before, so we clearly need something better.
>
> Parts of the community have worked together with the Apache Beam project,
> which is pretty far in adding a portability layer to support Python.
> Before we dive deep into a design proposal for a new Python API in Flink, I
> think we should figure out in which general direction Python support should
> go.
>
> *Option (1): Language portability via Apache Beam*
>
> Pro:
>   - already exists to a large extend and already has users
>   - portability layer offers other languages in addition to python. Go is
> in the making, NodeJS has been speculated, etc.
>   - collaboration with another project / community which means more
> manpower and exposure. Beam currently has a strong focus on Flink as a
> runner for Python.
>   - Python API is used for existing ML libraries from the TensorFlow
> ecosystem
>
> Con:
>   - Not Flink's API. Python users need to learn the syntax of another API
> (Python API is inherently different, but even more different here).
>
> *Option (2): Implement own Python API*
>
> Pro:
>   - Python API will be closer to Flink Java / Scala APIs
>
> Con:
>   - We will only have Python.
>   - Need to to rebuild the Python language bridge (significant work to get
> stable)
>   - might lose tight collaboration with Beam and the other parties in Beam
>   - not benefiting from Beam's ecosystem
>
> *Option (3): **Implement own portability layer*
>
> Pro
>   - Flexibility to align APIs across languages within Flink ecosystem
>
> Con
>   - A lot of work (for context, to get this feature complete, Beam has
> worked on that for a year now)
>   - Replicating work that already exists
>   - good chance to lose tight collaboration with Beam and parties in that
> project
>   - not benefiting from Beam's ecosystem
>
> Best,
> Stephan
>
>
> On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise <[hidden email]> wrote:
>
> > Did you take a look at Apache Beam? It already provides a comprehensive
> > Python SDK and can be used with Flink:
> > https://beam.apache.org/roadmap/portability/#python-on-flink
> >
> > We are using it at Lyft for Python streaming pipelines.
> >
> > Thomas
> >
> > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <[hidden email]> wrote:
> >
> > > Hi Till,
> > >
> > > 1. So far as I know, most of the users at Alibaba are using SQL.  Some
> of
> > > users at Alibaba want integrated python libraries with Flink for
> > streaming
> > > processing, and Jython is unusable.
> > >
> > > 2. Python UDFs for SQL:
> > > * declaring python UDF based on Alibaba's internal DDL syntax.
> > > * start a Python process in open()
> > > * communicate with JVM process via Socket.
> > > * Yes, it support python libraries, users can upload virutalenv/conda
> > > Python runtime
> > >
> > > 3. We've draft a design doc for Python API
> > >  [DISCUSS] Flink Python API
> > > <
> > >
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > > >
> > >
> > > Python UDF for SQL is not discussed in this documentation, we'll
> create a
> > > new proposal when the SQL DDL is ready.
> > >
> > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <[hidden email]>
> > > wrote:
> > >
> > > > Hi Xianda,
> > > >
> > > > thanks for sharing this detailed feedback. Do I understand you
> > correctly
> > > > that flink-python and flink-streaming-python are not usable for the
> use
> > > > cases at Alibaba atm?
> > > >
> > > > Could you share a bit more details about the Python UDFs for SQL? How
> > do
> > > > you execute the Python code? Will it work with any Python library? If
> > you
> > > > are about to publish the design document then feel free to refer me
> to
> > > this
> > > > document.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <[hidden email]>
> wrote:
> > > >
> > > > > Xianda Ke <[hidden email]>
> > > > > 9:47 AM (11 minutes ago)
> > > > > to dev, user
> > > > > After communicating with some of the internal users at Alibaba, my
> > > > > impression is that:
> > > > >
> > > > >    - Most of them need C extensions support, they want to
> integrated
> > > > their
> > > > >    algorithms with stream processing,but Jython is unacceptable for
> > > them.
> > > > >    - For some users, who are only familiar with SQL/Python,
> > developing
> > > > Java
> > > > >    API application/UDF is too complex. Writing Python UDF and
> > declaring
> > > > it
> > > > > in
> > > > >    SQL is preferred.
> > > > >    - Machine Learning users needs richer Python APIs, such as Table
> > API
> > > > >    Python support.
> > > > >
> > > > >
> > > > > From my point of view, currently Python support has a few caveats
> in
> > > > Flink.
> > > > >
> > > > >    - For batch, there is only DataSet Python API.
> > > > >    - For streaming, where Flink really shines, only Jython is
> > > supported,
> > > > >    but Jython has lots of limitations.
> > > > >    - For most of the big data users, SQL/Table API is more
> friendly,
> > > but
> > > > >    Python users have no such APIs right now.
> > > > >    - The interactive Python shell is very user-friendly. It can be
> > used
> > > > to
> > > > >    test interactively and is a simple way to learn the API.
> However,
> > > > there
> > > > > is
> > > > >    no such interactive Python shell in Flink now.
> > > > >
> > > > >
> > > > > At Alibaba, Python UDF for SQL has been developed and has been
> > > delivered
> > > > to
> > > > > internal users.  Currently, we start to develop the Python API, and
> > > we've
> > > > > drafted a design documentation and will publish it to the community
> > > soon
> > > > > for discussion.
> > > > >
> > > > > Regards,
> > > > > Xianda
> > > > >
> > > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <
> [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Dear Flink community,
> > > > > >
> > > > > > in order to better understand the needs of our users and to plan
> > for
> > > > the
> > > > > > future, I wanted to reach out to you and ask how much you use
> > Flink's
> > > > > > Python API, namely flink-python and flink-streaming-python.
> > > > > >
> > > > > > In order to gather feedback, I would like to ask all Python users
> > to
> > > > > > respond to this thread and quickly outline how you use Python in
> > > > > > combination with Flink. Thanks a lot for your help!
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Ke, Xianda
> > > > >
> > > >
> > >
> > >
> > > --
> > > Ke, Xianda
> > >
> >
>


--
Ke, Xianda
Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Stephan Ewen
You are right. Let's refocus this on the python user survey and spin out
another thread.

On Thu, Dec 13, 2018 at 9:56 AM Xianda Ke <[hidden email]> wrote:

> Hi Folks,
> To avoid polluting the survey thread with discussions, we started separate
> thread and maybe we can continue the discussion over there.
>
> Regards,
> Xianda
>
> On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen <[hidden email]> wrote:
>
> > I like that we are having a general discussion about how to use Python
> and
> > Flink together in the future.
> > The current python support has some shortcomings that were mentioned
> > before, so we clearly need something better.
> >
> > Parts of the community have worked together with the Apache Beam project,
> > which is pretty far in adding a portability layer to support Python.
> > Before we dive deep into a design proposal for a new Python API in
> Flink, I
> > think we should figure out in which general direction Python support
> should
> > go.
> >
> > *Option (1): Language portability via Apache Beam*
> >
> > Pro:
> >   - already exists to a large extend and already has users
> >   - portability layer offers other languages in addition to python. Go is
> > in the making, NodeJS has been speculated, etc.
> >   - collaboration with another project / community which means more
> > manpower and exposure. Beam currently has a strong focus on Flink as a
> > runner for Python.
> >   - Python API is used for existing ML libraries from the TensorFlow
> > ecosystem
> >
> > Con:
> >   - Not Flink's API. Python users need to learn the syntax of another API
> > (Python API is inherently different, but even more different here).
> >
> > *Option (2): Implement own Python API*
> >
> > Pro:
> >   - Python API will be closer to Flink Java / Scala APIs
> >
> > Con:
> >   - We will only have Python.
> >   - Need to to rebuild the Python language bridge (significant work to
> get
> > stable)
> >   - might lose tight collaboration with Beam and the other parties in
> Beam
> >   - not benefiting from Beam's ecosystem
> >
> > *Option (3): **Implement own portability layer*
> >
> > Pro
> >   - Flexibility to align APIs across languages within Flink ecosystem
> >
> > Con
> >   - A lot of work (for context, to get this feature complete, Beam has
> > worked on that for a year now)
> >   - Replicating work that already exists
> >   - good chance to lose tight collaboration with Beam and parties in that
> > project
> >   - not benefiting from Beam's ecosystem
> >
> > Best,
> > Stephan
> >
> >
> > On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise <[hidden email]> wrote:
> >
> > > Did you take a look at Apache Beam? It already provides a comprehensive
> > > Python SDK and can be used with Flink:
> > > https://beam.apache.org/roadmap/portability/#python-on-flink
> > >
> > > We are using it at Lyft for Python streaming pipelines.
> > >
> > > Thomas
> > >
> > > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <[hidden email]> wrote:
> > >
> > > > Hi Till,
> > > >
> > > > 1. So far as I know, most of the users at Alibaba are using SQL.
> Some
> > of
> > > > users at Alibaba want integrated python libraries with Flink for
> > > streaming
> > > > processing, and Jython is unusable.
> > > >
> > > > 2. Python UDFs for SQL:
> > > > * declaring python UDF based on Alibaba's internal DDL syntax.
> > > > * start a Python process in open()
> > > > * communicate with JVM process via Socket.
> > > > * Yes, it support python libraries, users can upload virutalenv/conda
> > > > Python runtime
> > > >
> > > > 3. We've draft a design doc for Python API
> > > >  [DISCUSS] Flink Python API
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > > > >
> > > >
> > > > Python UDF for SQL is not discussed in this documentation, we'll
> > create a
> > > > new proposal when the SQL DDL is ready.
> > > >
> > > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <[hidden email]>
> > > > wrote:
> > > >
> > > > > Hi Xianda,
> > > > >
> > > > > thanks for sharing this detailed feedback. Do I understand you
> > > correctly
> > > > > that flink-python and flink-streaming-python are not usable for the
> > use
> > > > > cases at Alibaba atm?
> > > > >
> > > > > Could you share a bit more details about the Python UDFs for SQL?
> How
> > > do
> > > > > you execute the Python code? Will it work with any Python library?
> If
> > > you
> > > > > are about to publish the design document then feel free to refer me
> > to
> > > > this
> > > > > document.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <[hidden email]>
> > wrote:
> > > > >
> > > > > > Xianda Ke <[hidden email]>
> > > > > > 9:47 AM (11 minutes ago)
> > > > > > to dev, user
> > > > > > After communicating with some of the internal users at Alibaba,
> my
> > > > > > impression is that:
> > > > > >
> > > > > >    - Most of them need C extensions support, they want to
> > integrated
> > > > > their
> > > > > >    algorithms with stream processing,but Jython is unacceptable
> for
> > > > them.
> > > > > >    - For some users, who are only familiar with SQL/Python,
> > > developing
> > > > > Java
> > > > > >    API application/UDF is too complex. Writing Python UDF and
> > > declaring
> > > > > it
> > > > > > in
> > > > > >    SQL is preferred.
> > > > > >    - Machine Learning users needs richer Python APIs, such as
> Table
> > > API
> > > > > >    Python support.
> > > > > >
> > > > > >
> > > > > > From my point of view, currently Python support has a few caveats
> > in
> > > > > Flink.
> > > > > >
> > > > > >    - For batch, there is only DataSet Python API.
> > > > > >    - For streaming, where Flink really shines, only Jython is
> > > > supported,
> > > > > >    but Jython has lots of limitations.
> > > > > >    - For most of the big data users, SQL/Table API is more
> > friendly,
> > > > but
> > > > > >    Python users have no such APIs right now.
> > > > > >    - The interactive Python shell is very user-friendly. It can
> be
> > > used
> > > > > to
> > > > > >    test interactively and is a simple way to learn the API.
> > However,
> > > > > there
> > > > > > is
> > > > > >    no such interactive Python shell in Flink now.
> > > > > >
> > > > > >
> > > > > > At Alibaba, Python UDF for SQL has been developed and has been
> > > > delivered
> > > > > to
> > > > > > internal users.  Currently, we start to develop the Python API,
> and
> > > > we've
> > > > > > drafted a design documentation and will publish it to the
> community
> > > > soon
> > > > > > for discussion.
> > > > > >
> > > > > > Regards,
> > > > > > Xianda
> > > > > >
> > > > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <
> > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Dear Flink community,
> > > > > > >
> > > > > > > in order to better understand the needs of our users and to
> plan
> > > for
> > > > > the
> > > > > > > future, I wanted to reach out to you and ask how much you use
> > > Flink's
> > > > > > > Python API, namely flink-python and flink-streaming-python.
> > > > > > >
> > > > > > > In order to gather feedback, I would like to ask all Python
> users
> > > to
> > > > > > > respond to this thread and quickly outline how you use Python
> in
> > > > > > > combination with Flink. Thanks a lot for your help!
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Till
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Ke, Xianda
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Ke, Xianda
> > > >
> > >
> >
>
>
> --
> Ke, Xianda
>
Reply | Threaded
Open this post in threaded view
|

Re: [SURVEY] Usage of flink-python and flink-streaming-python

Till Rohrmann
Thanks a lot for the feedback for this survey. I will close it now since 6
days have passed without new activity.

To me it seems that we currently don't have many users who use flink-python
or flink-streaming-python because of their limitations (mentioned in the
survey by Xianda). This information might be useful when discussing Flink's
future Python strategy and whether to continue supporting flink-python and
flink-streaming-python in the future.

Cheers,
Till

On Thu, Dec 13, 2018 at 10:50 AM Stephan Ewen <[hidden email]> wrote:

> You are right. Let's refocus this on the python user survey and spin out
> another thread.
>
> On Thu, Dec 13, 2018 at 9:56 AM Xianda Ke <[hidden email]> wrote:
>
> > Hi Folks,
> > To avoid polluting the survey thread with discussions, we started
> separate
> > thread and maybe we can continue the discussion over there.
> >
> > Regards,
> > Xianda
> >
> > On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen <[hidden email]> wrote:
> >
> > > I like that we are having a general discussion about how to use Python
> > and
> > > Flink together in the future.
> > > The current python support has some shortcomings that were mentioned
> > > before, so we clearly need something better.
> > >
> > > Parts of the community have worked together with the Apache Beam
> project,
> > > which is pretty far in adding a portability layer to support Python.
> > > Before we dive deep into a design proposal for a new Python API in
> > Flink, I
> > > think we should figure out in which general direction Python support
> > should
> > > go.
> > >
> > > *Option (1): Language portability via Apache Beam*
> > >
> > > Pro:
> > >   - already exists to a large extend and already has users
> > >   - portability layer offers other languages in addition to python. Go
> is
> > > in the making, NodeJS has been speculated, etc.
> > >   - collaboration with another project / community which means more
> > > manpower and exposure. Beam currently has a strong focus on Flink as a
> > > runner for Python.
> > >   - Python API is used for existing ML libraries from the TensorFlow
> > > ecosystem
> > >
> > > Con:
> > >   - Not Flink's API. Python users need to learn the syntax of another
> API
> > > (Python API is inherently different, but even more different here).
> > >
> > > *Option (2): Implement own Python API*
> > >
> > > Pro:
> > >   - Python API will be closer to Flink Java / Scala APIs
> > >
> > > Con:
> > >   - We will only have Python.
> > >   - Need to to rebuild the Python language bridge (significant work to
> > get
> > > stable)
> > >   - might lose tight collaboration with Beam and the other parties in
> > Beam
> > >   - not benefiting from Beam's ecosystem
> > >
> > > *Option (3): **Implement own portability layer*
> > >
> > > Pro
> > >   - Flexibility to align APIs across languages within Flink ecosystem
> > >
> > > Con
> > >   - A lot of work (for context, to get this feature complete, Beam has
> > > worked on that for a year now)
> > >   - Replicating work that already exists
> > >   - good chance to lose tight collaboration with Beam and parties in
> that
> > > project
> > >   - not benefiting from Beam's ecosystem
> > >
> > > Best,
> > > Stephan
> > >
> > >
> > > On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise <[hidden email]> wrote:
> > >
> > > > Did you take a look at Apache Beam? It already provides a
> comprehensive
> > > > Python SDK and can be used with Flink:
> > > > https://beam.apache.org/roadmap/portability/#python-on-flink
> > > >
> > > > We are using it at Lyft for Python streaming pipelines.
> > > >
> > > > Thomas
> > > >
> > > > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke <[hidden email]>
> wrote:
> > > >
> > > > > Hi Till,
> > > > >
> > > > > 1. So far as I know, most of the users at Alibaba are using SQL.
> > Some
> > > of
> > > > > users at Alibaba want integrated python libraries with Flink for
> > > > streaming
> > > > > processing, and Jython is unusable.
> > > > >
> > > > > 2. Python UDFs for SQL:
> > > > > * declaring python UDF based on Alibaba's internal DDL syntax.
> > > > > * start a Python process in open()
> > > > > * communicate with JVM process via Socket.
> > > > > * Yes, it support python libraries, users can upload
> virutalenv/conda
> > > > > Python runtime
> > > > >
> > > > > 3. We've draft a design doc for Python API
> > > > >  [DISCUSS] Flink Python API
> > > > > <
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > > > > >
> > > > >
> > > > > Python UDF for SQL is not discussed in this documentation, we'll
> > > create a
> > > > > new proposal when the SQL DDL is ready.
> > > > >
> > > > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann <
> [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi Xianda,
> > > > > >
> > > > > > thanks for sharing this detailed feedback. Do I understand you
> > > > correctly
> > > > > > that flink-python and flink-streaming-python are not usable for
> the
> > > use
> > > > > > cases at Alibaba atm?
> > > > > >
> > > > > > Could you share a bit more details about the Python UDFs for SQL?
> > How
> > > > do
> > > > > > you execute the Python code? Will it work with any Python
> library?
> > If
> > > > you
> > > > > > are about to publish the design document then feel free to refer
> me
> > > to
> > > > > this
> > > > > > document.
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > > Xianda Ke <[hidden email]>
> > > > > > > 9:47 AM (11 minutes ago)
> > > > > > > to dev, user
> > > > > > > After communicating with some of the internal users at Alibaba,
> > my
> > > > > > > impression is that:
> > > > > > >
> > > > > > >    - Most of them need C extensions support, they want to
> > > integrated
> > > > > > their
> > > > > > >    algorithms with stream processing,but Jython is unacceptable
> > for
> > > > > them.
> > > > > > >    - For some users, who are only familiar with SQL/Python,
> > > > developing
> > > > > > Java
> > > > > > >    API application/UDF is too complex. Writing Python UDF and
> > > > declaring
> > > > > > it
> > > > > > > in
> > > > > > >    SQL is preferred.
> > > > > > >    - Machine Learning users needs richer Python APIs, such as
> > Table
> > > > API
> > > > > > >    Python support.
> > > > > > >
> > > > > > >
> > > > > > > From my point of view, currently Python support has a few
> caveats
> > > in
> > > > > > Flink.
> > > > > > >
> > > > > > >    - For batch, there is only DataSet Python API.
> > > > > > >    - For streaming, where Flink really shines, only Jython is
> > > > > supported,
> > > > > > >    but Jython has lots of limitations.
> > > > > > >    - For most of the big data users, SQL/Table API is more
> > > friendly,
> > > > > but
> > > > > > >    Python users have no such APIs right now.
> > > > > > >    - The interactive Python shell is very user-friendly. It can
> > be
> > > > used
> > > > > > to
> > > > > > >    test interactively and is a simple way to learn the API.
> > > However,
> > > > > > there
> > > > > > > is
> > > > > > >    no such interactive Python shell in Flink now.
> > > > > > >
> > > > > > >
> > > > > > > At Alibaba, Python UDF for SQL has been developed and has been
> > > > > delivered
> > > > > > to
> > > > > > > internal users.  Currently, we start to develop the Python API,
> > and
> > > > > we've
> > > > > > > drafted a design documentation and will publish it to the
> > community
> > > > > soon
> > > > > > > for discussion.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Xianda
> > > > > > >
> > > > > > > On Fri, Dec 7, 2018 at 11:30 PM Till Rohrmann <
> > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Dear Flink community,
> > > > > > > >
> > > > > > > > in order to better understand the needs of our users and to
> > plan
> > > > for
> > > > > > the
> > > > > > > > future, I wanted to reach out to you and ask how much you use
> > > > Flink's
> > > > > > > > Python API, namely flink-python and flink-streaming-python.
> > > > > > > >
> > > > > > > > In order to gather feedback, I would like to ask all Python
> > users
> > > > to
> > > > > > > > respond to this thread and quickly outline how you use Python
> > in
> > > > > > > > combination with Flink. Thanks a lot for your help!
> > > > > > > >
> > > > > > > > Cheers,
> > > > > > > > Till
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Ke, Xianda
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Ke, Xianda
> > > > >
> > > >
> > >
> >
> >
> > --
> > Ke, Xianda
> >
>