Hello everyone, I would appreciate if someone could please clarify me several things regarding Flink. Flink 1.10 I’m trying to develop PYTHON stream application using pyflink-stream. I managed successfully to run wordcount.py (in attachment). Now, I would like to go one step further and to use other sources such are Twitter API, Kafka topics or socket.
However, unfortunately I did not succeed.
Could you please help me with uncertainties that I have to Flink? I would mean very much to me better understanding flink.
Thanks, Milan S. |
Hi Milan,
I can only give you some high level answers because I'm not actively involved in the development of Flink's Python support. But I've cc'ed Jincheng who is driving this effort and can give you more detailed answers. At the moment, Flink's Python API can be seen as a thin wrapper around Flink's Table API. So what the Python program does it to construct a Flink Table API program by calling Flink's Java API using Py4J. At the moment the development concentrated on the Table API and hence the DataStream part is not yet fully feature complete. For the table API there are a couple of connectors supported [1]. Since Flink's Python API does not actually ship Python code yet, it does not work to implement a source in pure Python. I'm not sure whether this will be ever supported tbh but the community is actively working on supporting Python user code functions which are executed on the Flink cluster. The long term plan is to make the Python API feature complete wrt the other available APIs. I'm also confident that we will constantly improve on the documentation of this new API. Just keep in mind that this feature has just been released as a preview feature with Flink 1.9 and the community is heavily working on improving it. [1] https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#module-pyflink.table.descriptors Cheers, Till On Fri, Sep 13, 2019 at 10:09 AM Milan Simaković < [hidden email]> wrote: > Hello everyone, > > I would appreciate if someone could please clarify me several things > regarding Flink. > > > > Flink 1.10 > > > > I’m trying to develop PYTHON stream application using pyflink-stream. I > managed successfully to run wordcount.py (in attachment). Now, I would like > to go one step further and to use other sources such are Twitter API, Kafka > topics or socket. However, unfortunately I did not succeed. > > 1. Didn’t manage to find any connectors on python docs > https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.datastream.html > 2. I tried to import > org.apache.flink.streaming.connectors.twitter.TwitterSource, but got an > error “ImportError: No module named connectors” > 3. I developed by myself twitter connector (pure python) but realized > that it doesn’t have sense to use it because it cannot be distributed in a > way it would be with flink. > > > > Could you please help me with uncertainties that I have to Flink? I would > mean very much to me better understanding flink. > > - Does flink support any other stream connectors except the one from > example? > - Could I use libraries from JAVA source code (org.apache…) when > writing python program? > - How can I efficiently what libraries are available for python hence > I didn’t manage to find from python flink docs? Or can someone give me a > hint how can I browse better through docs? > - Could someone share with me examples in python or Scala. I’ve > already examined all available on the internet. > - What is you plan with flink and python in general? Do you plan to > catch up python with java and scala? > > > > Thanks, > > Milan S. > |
Hi Milan,
Hoping Till has answered your questions. WRT whether it supports to use libraries from Java code in Python, the answer is yes. Regarding to use Java connectors in Python, there are two ways: 1) If there is a Java TableFactory[1] defined for the connector, you could try to use the pyflink.table.descriptors.CustomConnectorDescriptor [2][3]. 2) If there is no Java TableFactory defined, you need to write a Python connector which is a simple wrapper of the Java connector. You can refer to the CSV connector for an example [4]. Built-in ways may be provided In the future to eliminate the need of wrapping. However, you can try this way for now. Besides, no matter in which case of the above, you must make sure that the Java library be put in the classpath. Otherwise, it could not find the Java library. This is the cause of the error you encountered “ImportError: No module named connectors”. If you submit the Python job via CLI, you could use the "-j" option. You can refer to [5] for more details. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sourceSinks.html#define-a-tablefactory <https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sourceSinks.html#define-a-tablefactory> [2] https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#pyflink.table.descriptors.CustomConnectorDescriptor <https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#pyflink.table.descriptors.CustomConnectorDescriptor> [3] https://github.com/apache/flink/blob/master/flink-python/pyflink/table/tests/test_descriptor.py#L360 <https://github.com/apache/flink/blob/master/flink-python/pyflink/table/tests/test_descriptor.py#L360> [4] https://github.com/apache/flink/blob/master/flink-python/pyflink/table/sources.py#L35 <https://github.com/apache/flink/blob/master/flink-python/pyflink/table/sources.py#L35> [5] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html <https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html> Regards, Dian > 在 2019年9月13日,下午4:37,Till Rohrmann <[hidden email]> 写道: > > Hi Milan, > > I can only give you some high level answers because I'm not actively > involved in the development of Flink's Python support. But I've cc'ed > Jincheng who is driving this effort and can give you more detailed answers. > > At the moment, Flink's Python API can be seen as a thin wrapper around > Flink's Table API. So what the Python program does it to construct a Flink > Table API program by calling Flink's Java API using Py4J. At the moment the > development concentrated on the Table API and hence the DataStream part is > not yet fully feature complete. For the table API there are a couple of > connectors supported [1]. > > Since Flink's Python API does not actually ship Python code yet, it does > not work to implement a source in pure Python. I'm not sure whether this > will be ever supported tbh but the community is actively working on > supporting Python user code functions which are executed on the Flink > cluster. The long term plan is to make the Python API feature complete wrt > the other available APIs. I'm also confident that we will constantly > improve on the documentation of this new API. Just keep in mind that this > feature has just been released as a preview feature with Flink 1.9 and the > community is heavily working on improving it. > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#module-pyflink.table.descriptors <https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.table.html#module-pyflink.table.descriptors> > > Cheers, > Till > > On Fri, Sep 13, 2019 at 10:09 AM Milan Simaković < > [hidden email] <mailto:[hidden email]>> wrote: > >> Hello everyone, >> >> I would appreciate if someone could please clarify me several things >> regarding Flink. >> >> >> >> Flink 1.10 >> >> >> >> I’m trying to develop PYTHON stream application using pyflink-stream. I >> managed successfully to run wordcount.py (in attachment). Now, I would like >> to go one step further and to use other sources such are Twitter API, Kafka >> topics or socket. However, unfortunately I did not succeed. >> >> 1. Didn’t manage to find any connectors on python docs >> https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.datastream.html <https://ci.apache.org/projects/flink/flink-docs-master/api/python/pyflink.datastream.html> >> 2. I tried to import >> org.apache.flink.streaming.connectors.twitter.TwitterSource, but got an >> error “ImportError: No module named connectors” >> 3. I developed by myself twitter connector (pure python) but realized >> that it doesn’t have sense to use it because it cannot be distributed in a >> way it would be with flink. >> >> >> >> Could you please help me with uncertainties that I have to Flink? I would >> mean very much to me better understanding flink. >> >> - Does flink support any other stream connectors except the one from >> example? >> - Could I use libraries from JAVA source code (org.apache…) when >> writing python program? >> - How can I efficiently what libraries are available for python hence >> I didn’t manage to find from python flink docs? Or can someone give me a >> hint how can I browse better through docs? >> - Could someone share with me examples in python or Scala. I’ve >> already examined all available on the internet. >> - What is you plan with flink and python in general? Do you plan to >> catch up python with java and scala? >> >> >> >> Thanks, >> >> Milan S. |
Free forum by Nabble | Edit this page |