Flink stream python

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink stream python

Milan Simaković

Hello everyone,

I would appreciate if someone could please clarify me several things regarding Flink.

 

Flink 1.10

 

I’m trying to develop PYTHON stream application using pyflink-stream. I managed successfully to run wordcount.py (in attachment). Now, I would like to go one step further and to use other sources such are Twitter API, Kafka topics or socket. However, unfortunately I did not succeed.

  1. Didn’t manage to find any connectors on python docs https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.datastream.html
  2. I tried to import org.apache.flink.streaming.connectors.twitter.TwitterSource, but got an error “ImportError: No module named connectors”
  3. I developed by myself twitter connector (pure python) but realized that it doesn’t have sense to use it because it cannot be distributed in a way it would be with flink.

 

Could you please help me with uncertainties that I have to Flink? I would mean very much to me better understanding flink.

  • Does flink support any other stream connectors except the one from example?
  • Could I use libraries from JAVA source code (org.apache…) when writing python program?
  • How can I efficiently what libraries are available for python hence I didn’t manage to find from python flink docs? Or can someone give me a hint how can I browse better through docs?
  • Could someone share with me examples in python or Scala. I’ve already examined all available on the internet.
  • What is you plan with flink and python in general? Do you plan to catch up python with java and scala?

 

Thanks,

Milan S.

Reply | Threaded
Open this post in threaded view
|

Re: Flink stream python

Till Rohrmann
Hi Milan,

I can only give you some high level answers because I'm not actively
involved in the development of Flink's Python support. But I've cc'ed
Jincheng who is driving this effort and can give you more detailed answers.

At the moment, Flink's Python API can be seen as a thin wrapper around
Flink's Table API. So what the Python program does it to construct a Flink
Table API program by calling Flink's Java API using Py4J. At the moment the
development concentrated on the Table API and hence the DataStream part is
not yet fully feature complete. For the table API there are a couple of
connectors supported [1].

Since Flink's Python API does not actually ship Python code yet, it does
not work to implement a source in pure Python. I'm not sure whether this
will be ever supported tbh but the community is actively working on
supporting Python user code functions which are executed on the Flink
cluster. The long term plan is to make the Python API feature complete wrt
the other available APIs. I'm also confident that we will constantly
improve on the documentation of this new API. Just keep in mind that this
feature has just been released as a preview feature with Flink 1.9 and the
community is heavily working on improving it.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#module-pyflink.table.descriptors

Cheers,
Till

On Fri, Sep 13, 2019 at 10:09 AM Milan Simaković <
[hidden email]> wrote:

> Hello everyone,
>
> I would appreciate if someone could please clarify me several things
> regarding Flink.
>
>
>
> Flink 1.10
>
>
>
> I’m trying to develop PYTHON stream application using pyflink-stream. I
> managed successfully to run wordcount.py (in attachment). Now, I would like
> to go one step further and to use other sources such are Twitter API, Kafka
> topics or socket. However, unfortunately I did not succeed.
>
>    1. Didn’t manage to find any connectors on python docs
>    https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.datastream.html
>    2. I tried to import
>    org.apache.flink.streaming.connectors.twitter.TwitterSource, but got an
>    error “ImportError: No module named connectors”
>    3. I developed by myself twitter connector (pure python) but realized
>    that it doesn’t have sense to use it because it cannot be distributed in a
>    way it would be with flink.
>
>
>
> Could you please help me with uncertainties that I have to Flink? I would
> mean very much to me better understanding flink.
>
>    - Does flink support any other stream connectors except the one from
>    example?
>    - Could I use libraries from JAVA source code (org.apache…) when
>    writing python program?
>    - How can I efficiently what libraries are available for python hence
>    I didn’t manage to find from python flink docs? Or can someone give me a
>    hint how can I browse better through docs?
>    - Could someone share with me examples in python or Scala. I’ve
>    already examined all available on the internet.
>    - What is you plan with flink and python in general? Do you plan to
>    catch up python with java and scala?
>
>
>
> Thanks,
>
> Milan S.
>
Reply | Threaded
Open this post in threaded view
|

Re: Flink stream python

Dian Fu-2
Hi Milan,

Hoping Till has answered your questions. WRT whether it supports to use libraries from Java code in Python, the answer is yes.

Regarding to use Java connectors in Python, there are two ways:
1) If there is a Java TableFactory[1] defined for the connector, you could try to use the pyflink.table.descriptors.CustomConnectorDescriptor [2][3].
2) If there is no Java TableFactory defined, you need to write a Python connector which is a simple wrapper of the Java connector. You can refer to the CSV connector for an example [4]. Built-in ways may be provided In the future to eliminate the need of wrapping. However, you can try this way for now.

Besides, no matter in which case of the above, you must make sure that the Java library be put in the classpath. Otherwise, it could not find the Java library. This is the cause of the error you encountered “ImportError: No module named connectors”. If you submit the Python job via CLI, you could use the "-j" option. You can refer to [5] for more details.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sourceSinks.html#define-a-tablefactory <https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sourceSinks.html#define-a-tablefactory>
[2] https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#pyflink.table.descriptors.CustomConnectorDescriptor <https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#pyflink.table.descriptors.CustomConnectorDescriptor>
[3] https://github.com/apache/flink/blob/master/flink-python/pyflink/table/tests/test_descriptor.py#L360 <https://github.com/apache/flink/blob/master/flink-python/pyflink/table/tests/test_descriptor.py#L360>
[4] https://github.com/apache/flink/blob/master/flink-python/pyflink/table/sources.py#L35 <https://github.com/apache/flink/blob/master/flink-python/pyflink/table/sources.py#L35>
[5] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html <https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html>

Regards,
Dian

> 在 2019年9月13日,下午4:37,Till Rohrmann <[hidden email]> 写道:
>
> Hi Milan,
>
> I can only give you some high level answers because I'm not actively
> involved in the development of Flink's Python support. But I've cc'ed
> Jincheng who is driving this effort and can give you more detailed answers.
>
> At the moment, Flink's Python API can be seen as a thin wrapper around
> Flink's Table API. So what the Python program does it to construct a Flink
> Table API program by calling Flink's Java API using Py4J. At the moment the
> development concentrated on the Table API and hence the DataStream part is
> not yet fully feature complete. For the table API there are a couple of
> connectors supported [1].
>
> Since Flink's Python API does not actually ship Python code yet, it does
> not work to implement a source in pure Python. I'm not sure whether this
> will be ever supported tbh but the community is actively working on
> supporting Python user code functions which are executed on the Flink
> cluster. The long term plan is to make the Python API feature complete wrt
> the other available APIs. I'm also confident that we will constantly
> improve on the documentation of this new API. Just keep in mind that this
> feature has just been released as a preview feature with Flink 1.9 and the
> community is heavily working on improving it.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#module-pyflink.table.descriptors <https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#module-pyflink.table.descriptors>
>
> Cheers,
> Till
>
> On Fri, Sep 13, 2019 at 10:09 AM Milan Simaković <
> [hidden email] <mailto:[hidden email]>> wrote:
>
>> Hello everyone,
>>
>> I would appreciate if someone could please clarify me several things
>> regarding Flink.
>>
>>
>>
>> Flink 1.10
>>
>>
>>
>> I’m trying to develop PYTHON stream application using pyflink-stream. I
>> managed successfully to run wordcount.py (in attachment). Now, I would like
>> to go one step further and to use other sources such are Twitter API, Kafka
>> topics or socket. However, unfortunately I did not succeed.
>>
>>   1. Didn’t manage to find any connectors on python docs
>>   https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.datastream.html <https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.datastream.html>
>>   2. I tried to import
>>   org.apache.flink.streaming.connectors.twitter.TwitterSource, but got an
>>   error “ImportError: No module named connectors”
>>   3. I developed by myself twitter connector (pure python) but realized
>>   that it doesn’t have sense to use it because it cannot be distributed in a
>>   way it would be with flink.
>>
>>
>>
>> Could you please help me with uncertainties that I have to Flink? I would
>> mean very much to me better understanding flink.
>>
>>   - Does flink support any other stream connectors except the one from
>>   example?
>>   - Could I use libraries from JAVA source code (org.apache…) when
>>   writing python program?
>>   - How can I efficiently what libraries are available for python hence
>>   I didn’t manage to find from python flink docs? Or can someone give me a
>>   hint how can I browse better through docs?
>>   - Could someone share with me examples in python or Scala. I’ve
>>   already examined all available on the internet.
>>   - What is you plan with flink and python in general? Do you plan to
>>   catch up python with java and scala?
>>
>>
>>
>> Thanks,
>>
>> Milan S.