Dear Flink Community!
I would like to open the discussion of contributing Pulsar Flink connector [0] back to Flink. ## A brief introduction to Apache Pulsar Apache Pulsar[1] is a multi-tenant, high-performance distributed pub-sub messaging system. Pulsar includes multiple features such as native support for multiple clusters in a Pulsar instance, with seamless geo-replication of messages across clusters, very low publish and end-to-end latency, seamless scalability to over a million topics, and guaranteed message delivery with persistent message storage provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by more and more companies[2]. ## The status of Pulsar Flink connector The Pulsar Flink connector we are planning to contribute is built upon Flink 1.9.0 and Pulsar 2.4.0. The main features are: - Pulsar as a streaming source with exactly-once guarantee. - Sink streaming results to Pulsar with at-least-once semantics. (We would update this to exactly-once as well when Pulsar gets all transaction features ready in its 2.5.0 version) - Build upon Flink new Table API Type system (FLIP-37[3]), and can automatically (de)serialize messages with the help of Pulsar schema. - Integrate with Flink new Catalog API (FLIP-30[4]), which enables the use of Pulsar topics as tables in Table API as well as SQL client. ## Reference [0] https://github.com/streamnative/pulsar-flink [1] https://pulsar.apache.org/ [2] https://pulsar.apache.org/en/powered-by/ [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System [4] https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs Best, Yijie Shen |
Hi Yijie
I can see that Pulsar becomes more and more popular recently and very glad to see more people willing to contribute to Flink ecosystem. Before any further discussion, would you please give some explanation of the relationship between this thread to current existing JIRAs of pulsar source [1] and sink [2] connector? Will the contribution contains part of those PRs or totally different implementation? [1] https://issues.apache.org/jira/browse/FLINK-9641 [2] https://issues.apache.org/jira/browse/FLINK-9168 Best Yun Tang ________________________________ From: Yijie Shen <[hidden email]> Sent: Tuesday, September 3, 2019 13:57 To: [hidden email] <[hidden email]> Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink Dear Flink Community! I would like to open the discussion of contributing Pulsar Flink connector [0] back to Flink. ## A brief introduction to Apache Pulsar Apache Pulsar[1] is a multi-tenant, high-performance distributed pub-sub messaging system. Pulsar includes multiple features such as native support for multiple clusters in a Pulsar instance, with seamless geo-replication of messages across clusters, very low publish and end-to-end latency, seamless scalability to over a million topics, and guaranteed message delivery with persistent message storage provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by more and more companies[2]. ## The status of Pulsar Flink connector The Pulsar Flink connector we are planning to contribute is built upon Flink 1.9.0 and Pulsar 2.4.0. The main features are: - Pulsar as a streaming source with exactly-once guarantee. - Sink streaming results to Pulsar with at-least-once semantics. (We would update this to exactly-once as well when Pulsar gets all transaction features ready in its 2.5.0 version) - Build upon Flink new Table API Type system (FLIP-37[3]), and can automatically (de)serialize messages with the help of Pulsar schema. - Integrate with Flink new Catalog API (FLIP-30[4]), which enables the use of Pulsar topics as tables in Table API as well as SQL client. ## Reference [0] https://github.com/streamnative/pulsar-flink [1] https://pulsar.apache.org/ [2] https://pulsar.apache.org/en/powered-by/ [3] https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System [4] https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs Best, Yijie Shen |
Hi Yun,
Since I was the main driver behind FLINK-9641 and FLINK-9168, let me try to add more context on this. FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source and sink for Flink. The integration was done with Flink 1.6.0. We sent out pull requests about a year ago and we ended up maintaining those connectors in Pulsar for Pulsar users to use Flink to process event streams in Pulsar. (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The Flink 1.6 integration is pretty simple and there is no schema considerations. In the past year, we have made a lot of changes in Pulsar and brought Pulsar schema as the first-class citizen in Pulsar. We also integrated with other computing engines for processing Pulsar event streams with Pulsar schema. It led us to rethink how to integrate with Flink in the best way. Then we reimplement the pulsar-flink connectors from the ground up with schema and bring table API and catalog API as the first-class citizen in the integration. With that being said, in the new pulsar-flink implementation, you can register pulsar as a flink catalog and query / process the event streams using Flink SQL. This is an example about how to use Pulsar as a Flink catalog: https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog Yijie has also written a blog post explaining why we re-implement the flink connector with Flink 1.9 and what are the changes we made in the new connector: https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f We believe Pulsar is not just a simple data sink or source for Flink. It actually can be a fully integrated streaming data storage for Flink in many areas (sink, source, schema/catalog and state). The combination of Flink and Pulsar can create a great streaming warehouse architecture for streaming-first, unified data processing. Since we are talking to contribute Pulsar integration to Flink here, we are also dedicated to maintain, improve and evolve the integration with Flink to help the users who use both Flink and Pulsar. Hope this give you a bit more background about the pulsar flink integration. Let me know what are your thoughts. Thanks, Sijie On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> wrote: > Hi Yijie > > I can see that Pulsar becomes more and more popular recently and very glad > to see more people willing to contribute to Flink ecosystem. > > Before any further discussion, would you please give some explanation of > the relationship between this thread to current existing JIRAs of pulsar > source [1] and sink [2] connector? Will the contribution contains part of > those PRs or totally different implementation? > > [1] https://issues.apache.org/jira/browse/FLINK-9641 > [2] https://issues.apache.org/jira/browse/FLINK-9168 > > Best > Yun Tang > ________________________________ > From: Yijie Shen <[hidden email]> > Sent: Tuesday, September 3, 2019 13:57 > To: [hidden email] <[hidden email]> > Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink > > Dear Flink Community! > > I would like to open the discussion of contributing Pulsar Flink > connector [0] back to Flink. > > ## A brief introduction to Apache Pulsar > > Apache Pulsar[1] is a multi-tenant, high-performance distributed > pub-sub messaging system. Pulsar includes multiple features such as > native support for multiple clusters in a Pulsar instance, with > seamless geo-replication of messages across clusters, very low publish > and end-to-end latency, seamless scalability to over a million topics, > and guaranteed message delivery with persistent message storage > provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by > more and more companies[2]. > > ## The status of Pulsar Flink connector > > The Pulsar Flink connector we are planning to contribute is built upon > Flink 1.9.0 and Pulsar 2.4.0. The main features are: > - Pulsar as a streaming source with exactly-once guarantee. > - Sink streaming results to Pulsar with at-least-once semantics. (We > would update this to exactly-once as well when Pulsar gets all > transaction features ready in its 2.5.0 version) > - Build upon Flink new Table API Type system (FLIP-37[3]), and can > automatically (de)serialize messages with the help of Pulsar schema. > - Integrate with Flink new Catalog API (FLIP-30[4]), which enables the > use of Pulsar topics as tables in Table API as well as SQL client. > > ## Reference > [0] https://github.com/streamnative/pulsar-flink > [1] https://pulsar.apache.org/ > [2] https://pulsar.apache.org/en/powered-by/ > [3] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > [4] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > Best, > Yijie Shen > |
Hi everyone,
thanks a lot for starting this discussion Yijie. I think the Pulsar connector would be a very valuable addition since Pulsar becomes more and more popular and it would further expand Flink's interoperability. Also from a project perspective it makes sense for me to place the connector in the downstream project. My main concern/question is how can the Flink community maintain the connector? We have seen in the past that connectors are some of the most actively developed components because they need to be kept in sync with the external system and with Flink. Given that the Pulsar community is willing to help with maintaining, improving and evolving the connector, I'm optimistic that we can achieve this. Hence, +1 for contributing it back to Flink. Cheers, Till On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email]> wrote: > Hi Yun, > > Since I was the main driver behind FLINK-9641 and FLINK-9168, let me try to > add more context on this. > > FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source and > sink for Flink. The integration was done with Flink 1.6.0. We sent out pull > requests about a year ago and we ended up maintaining those connectors in > Pulsar for Pulsar users to use Flink to process event streams in Pulsar. > (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The Flink > 1.6 integration is pretty simple and there is no schema considerations. > > In the past year, we have made a lot of changes in Pulsar and brought > Pulsar schema as the first-class citizen in Pulsar. We also integrated with > other computing engines for processing Pulsar event streams with Pulsar > schema. > > It led us to rethink how to integrate with Flink in the best way. Then we > reimplement the pulsar-flink connectors from the ground up with schema and > bring table API and catalog API as the first-class citizen in the > integration. With that being said, in the new pulsar-flink implementation, > you can register pulsar as a flink catalog and query / process the event > streams using Flink SQL. > > This is an example about how to use Pulsar as a Flink catalog: > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > Yijie has also written a blog post explaining why we re-implement the flink > connector with Flink 1.9 and what are the changes we made in the new > connector: > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > We believe Pulsar is not just a simple data sink or source for Flink. It > actually can be a fully integrated streaming data storage for Flink in many > areas (sink, source, schema/catalog and state). The combination of Flink > and Pulsar can create a great streaming warehouse architecture for > streaming-first, unified data processing. Since we are talking to > contribute Pulsar integration to Flink here, we are also dedicated to > maintain, improve and evolve the integration with Flink to help the users > who use both Flink and Pulsar. > > Hope this give you a bit more background about the pulsar flink > integration. Let me know what are your thoughts. > > Thanks, > Sijie > > > On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> wrote: > > > Hi Yijie > > > > I can see that Pulsar becomes more and more popular recently and very > glad > > to see more people willing to contribute to Flink ecosystem. > > > > Before any further discussion, would you please give some explanation of > > the relationship between this thread to current existing JIRAs of pulsar > > source [1] and sink [2] connector? Will the contribution contains part of > > those PRs or totally different implementation? > > > > [1] https://issues.apache.org/jira/browse/FLINK-9641 > > [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > Best > > Yun Tang > > ________________________________ > > From: Yijie Shen <[hidden email]> > > Sent: Tuesday, September 3, 2019 13:57 > > To: [hidden email] <[hidden email]> > > Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink > > > > Dear Flink Community! > > > > I would like to open the discussion of contributing Pulsar Flink > > connector [0] back to Flink. > > > > ## A brief introduction to Apache Pulsar > > > > Apache Pulsar[1] is a multi-tenant, high-performance distributed > > pub-sub messaging system. Pulsar includes multiple features such as > > native support for multiple clusters in a Pulsar instance, with > > seamless geo-replication of messages across clusters, very low publish > > and end-to-end latency, seamless scalability to over a million topics, > > and guaranteed message delivery with persistent message storage > > provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by > > more and more companies[2]. > > > > ## The status of Pulsar Flink connector > > > > The Pulsar Flink connector we are planning to contribute is built upon > > Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > - Pulsar as a streaming source with exactly-once guarantee. > > - Sink streaming results to Pulsar with at-least-once semantics. (We > > would update this to exactly-once as well when Pulsar gets all > > transaction features ready in its 2.5.0 version) > > - Build upon Flink new Table API Type system (FLIP-37[3]), and can > > automatically (de)serialize messages with the help of Pulsar schema. > > - Integrate with Flink new Catalog API (FLIP-30[4]), which enables the > > use of Pulsar topics as tables in Table API as well as SQL client. > > > > ## Reference > > [0] https://github.com/streamnative/pulsar-flink > > [1] https://pulsar.apache.org/ > > [2] https://pulsar.apache.org/en/powered-by/ > > [3] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > [4] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > Best, > > Yijie Shen > > > |
I'm quite worried that we may end up repeating history.
There were already 2 attempts at contributing a pulsar connector, both of which failed because no committer was getting involved, despite the contributor opening a dedicated discussion thread about the contribution beforehand and getting several +1's from committers. We should really make sure that if we welcome/approve such a contribution it will actually get the attention it deserves. As such, I'm inclined to recommend maintaining the connector outside of Flink. We could link to it from the documentation to give it more exposure. With the upcoming page for sharing artifacts among the community (what's the state of that anyway?), this may be a better option. On 04/09/2019 10:16, Till Rohrmann wrote: > Hi everyone, > > thanks a lot for starting this discussion Yijie. I think the Pulsar > connector would be a very valuable addition since Pulsar becomes more and > more popular and it would further expand Flink's interoperability. Also > from a project perspective it makes sense for me to place the connector in > the downstream project. > > My main concern/question is how can the Flink community maintain the > connector? We have seen in the past that connectors are some of the most > actively developed components because they need to be kept in sync with the > external system and with Flink. Given that the Pulsar community is willing > to help with maintaining, improving and evolving the connector, I'm > optimistic that we can achieve this. Hence, +1 for contributing it back to > Flink. > > Cheers, > Till > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email]> wrote: > >> Hi Yun, >> >> Since I was the main driver behind FLINK-9641 and FLINK-9168, let me try to >> add more context on this. >> >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source and >> sink for Flink. The integration was done with Flink 1.6.0. We sent out pull >> requests about a year ago and we ended up maintaining those connectors in >> Pulsar for Pulsar users to use Flink to process event streams in Pulsar. >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The Flink >> 1.6 integration is pretty simple and there is no schema considerations. >> >> In the past year, we have made a lot of changes in Pulsar and brought >> Pulsar schema as the first-class citizen in Pulsar. We also integrated with >> other computing engines for processing Pulsar event streams with Pulsar >> schema. >> >> It led us to rethink how to integrate with Flink in the best way. Then we >> reimplement the pulsar-flink connectors from the ground up with schema and >> bring table API and catalog API as the first-class citizen in the >> integration. With that being said, in the new pulsar-flink implementation, >> you can register pulsar as a flink catalog and query / process the event >> streams using Flink SQL. >> >> This is an example about how to use Pulsar as a Flink catalog: >> >> https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog >> >> Yijie has also written a blog post explaining why we re-implement the flink >> connector with Flink 1.9 and what are the changes we made in the new >> connector: >> >> https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f >> >> We believe Pulsar is not just a simple data sink or source for Flink. It >> actually can be a fully integrated streaming data storage for Flink in many >> areas (sink, source, schema/catalog and state). The combination of Flink >> and Pulsar can create a great streaming warehouse architecture for >> streaming-first, unified data processing. Since we are talking to >> contribute Pulsar integration to Flink here, we are also dedicated to >> maintain, improve and evolve the integration with Flink to help the users >> who use both Flink and Pulsar. >> >> Hope this give you a bit more background about the pulsar flink >> integration. Let me know what are your thoughts. >> >> Thanks, >> Sijie >> >> >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> wrote: >> >>> Hi Yijie >>> >>> I can see that Pulsar becomes more and more popular recently and very >> glad >>> to see more people willing to contribute to Flink ecosystem. >>> >>> Before any further discussion, would you please give some explanation of >>> the relationship between this thread to current existing JIRAs of pulsar >>> source [1] and sink [2] connector? Will the contribution contains part of >>> those PRs or totally different implementation? >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 >>> >>> Best >>> Yun Tang >>> ________________________________ >>> From: Yijie Shen <[hidden email]> >>> Sent: Tuesday, September 3, 2019 13:57 >>> To: [hidden email] <[hidden email]> >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink >>> >>> Dear Flink Community! >>> >>> I would like to open the discussion of contributing Pulsar Flink >>> connector [0] back to Flink. >>> >>> ## A brief introduction to Apache Pulsar >>> >>> Apache Pulsar[1] is a multi-tenant, high-performance distributed >>> pub-sub messaging system. Pulsar includes multiple features such as >>> native support for multiple clusters in a Pulsar instance, with >>> seamless geo-replication of messages across clusters, very low publish >>> and end-to-end latency, seamless scalability to over a million topics, >>> and guaranteed message delivery with persistent message storage >>> provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by >>> more and more companies[2]. >>> >>> ## The status of Pulsar Flink connector >>> >>> The Pulsar Flink connector we are planning to contribute is built upon >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: >>> - Pulsar as a streaming source with exactly-once guarantee. >>> - Sink streaming results to Pulsar with at-least-once semantics. (We >>> would update this to exactly-once as well when Pulsar gets all >>> transaction features ready in its 2.5.0 version) >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and can >>> automatically (de)serialize messages with the help of Pulsar schema. >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which enables the >>> use of Pulsar topics as tables in Table API as well as SQL client. >>> >>> ## Reference >>> [0] https://github.com/streamnative/pulsar-flink >>> [1] https://pulsar.apache.org/ >>> [2] https://pulsar.apache.org/en/powered-by/ >>> [3] >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System >>> [4] >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs >>> >>> Best, >>> Yijie Shen >>> |
Hi Yijie,
Thanks for the interest in contributing the Pulsar connector. In general, I think having Pulsar connector with strong support is a valuable addition to Flink. So I am happy the shepherd this effort. Meanwhile, I would also like to provide some context and recent efforts on the Flink connectors ecosystem. The current way Flink maintains its connector has hit the scalability bar. With more and more connectors coming into Flink repo, we are facing a few problems such as long build and testing time. To address this problem, we have attempted to do the following: 1. Split out the connectors into a separate repository. This is temporarily on hold due to potential solution to shorten the build time. 2. Encourage the connectors to stay as ecosystem project while Flink tries to provide good support for functionality and compatibility tests. Robert has driven to create a Flink Ecosystem project website and it is going through some final approval process. Given the above efforts, it would be great to first see if we can have Pulsar connector as an ecosystem project with great support. It would be good to hear how the Flink Pulsar connector is tested currently to see if we can learn something to maintain it as an ecosystem project with good quality and test coverage. If the quality as an ecosystem project is hard to guarantee, we may as well adopt it into the main repo. BTW, another ongoing effort is FLIP-27 where we are making changes to the Flink source connector architecture and interface. This change will likely land in 1.10. Therefore timing wise, if we are going to have the Pulsar connector in main repo, I am wondering if we should hold a little bit and let the Pulsar connector adapt to the new interface to avoid shortly deprecated work? Thanks, Jiangjie (Becket) Qin On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <[hidden email]> wrote: > I'm quite worried that we may end up repeating history. > > There were already 2 attempts at contributing a pulsar connector, both > of which failed because no committer was getting involved, despite the > contributor opening a dedicated discussion thread about the contribution > beforehand and getting several +1's from committers. > > We should really make sure that if we welcome/approve such a > contribution it will actually get the attention it deserves. > > As such, I'm inclined to recommend maintaining the connector outside of > Flink. We could link to it from the documentation to give it more exposure. > With the upcoming page for sharing artifacts among the community (what's > the state of that anyway?), this may be a better option. > > On 04/09/2019 10:16, Till Rohrmann wrote: > > Hi everyone, > > > > thanks a lot for starting this discussion Yijie. I think the Pulsar > > connector would be a very valuable addition since Pulsar becomes more and > > more popular and it would further expand Flink's interoperability. Also > > from a project perspective it makes sense for me to place the connector > in > > the downstream project. > > > > My main concern/question is how can the Flink community maintain the > > connector? We have seen in the past that connectors are some of the most > > actively developed components because they need to be kept in sync with > the > > external system and with Flink. Given that the Pulsar community is > willing > > to help with maintaining, improving and evolving the connector, I'm > > optimistic that we can achieve this. Hence, +1 for contributing it back > to > > Flink. > > > > Cheers, > > Till > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email]> wrote: > > > >> Hi Yun, > >> > >> Since I was the main driver behind FLINK-9641 and FLINK-9168, let me > try to > >> add more context on this. > >> > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source and > >> sink for Flink. The integration was done with Flink 1.6.0. We sent out > pull > >> requests about a year ago and we ended up maintaining those connectors > in > >> Pulsar for Pulsar users to use Flink to process event streams in Pulsar. > >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The > Flink > >> 1.6 integration is pretty simple and there is no schema considerations. > >> > >> In the past year, we have made a lot of changes in Pulsar and brought > >> Pulsar schema as the first-class citizen in Pulsar. We also integrated > with > >> other computing engines for processing Pulsar event streams with Pulsar > >> schema. > >> > >> It led us to rethink how to integrate with Flink in the best way. Then > we > >> reimplement the pulsar-flink connectors from the ground up with schema > and > >> bring table API and catalog API as the first-class citizen in the > >> integration. With that being said, in the new pulsar-flink > implementation, > >> you can register pulsar as a flink catalog and query / process the event > >> streams using Flink SQL. > >> > >> This is an example about how to use Pulsar as a Flink catalog: > >> > >> > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > >> > >> Yijie has also written a blog post explaining why we re-implement the > flink > >> connector with Flink 1.9 and what are the changes we made in the new > >> connector: > >> > >> > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > >> > >> We believe Pulsar is not just a simple data sink or source for Flink. It > >> actually can be a fully integrated streaming data storage for Flink in > many > >> areas (sink, source, schema/catalog and state). The combination of Flink > >> and Pulsar can create a great streaming warehouse architecture for > >> streaming-first, unified data processing. Since we are talking to > >> contribute Pulsar integration to Flink here, we are also dedicated to > >> maintain, improve and evolve the integration with Flink to help the > users > >> who use both Flink and Pulsar. > >> > >> Hope this give you a bit more background about the pulsar flink > >> integration. Let me know what are your thoughts. > >> > >> Thanks, > >> Sijie > >> > >> > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> wrote: > >> > >>> Hi Yijie > >>> > >>> I can see that Pulsar becomes more and more popular recently and very > >> glad > >>> to see more people willing to contribute to Flink ecosystem. > >>> > >>> Before any further discussion, would you please give some explanation > of > >>> the relationship between this thread to current existing JIRAs of > pulsar > >>> source [1] and sink [2] connector? Will the contribution contains part > of > >>> those PRs or totally different implementation? > >>> > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > >>> > >>> Best > >>> Yun Tang > >>> ________________________________ > >>> From: Yijie Shen <[hidden email]> > >>> Sent: Tuesday, September 3, 2019 13:57 > >>> To: [hidden email] <[hidden email]> > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink > >>> > >>> Dear Flink Community! > >>> > >>> I would like to open the discussion of contributing Pulsar Flink > >>> connector [0] back to Flink. > >>> > >>> ## A brief introduction to Apache Pulsar > >>> > >>> Apache Pulsar[1] is a multi-tenant, high-performance distributed > >>> pub-sub messaging system. Pulsar includes multiple features such as > >>> native support for multiple clusters in a Pulsar instance, with > >>> seamless geo-replication of messages across clusters, very low publish > >>> and end-to-end latency, seamless scalability to over a million topics, > >>> and guaranteed message delivery with persistent message storage > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by > >>> more and more companies[2]. > >>> > >>> ## The status of Pulsar Flink connector > >>> > >>> The Pulsar Flink connector we are planning to contribute is built upon > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > >>> - Pulsar as a streaming source with exactly-once guarantee. > >>> - Sink streaming results to Pulsar with at-least-once semantics. (We > >>> would update this to exactly-once as well when Pulsar gets all > >>> transaction features ready in its 2.5.0 version) > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and can > >>> automatically (de)serialize messages with the help of Pulsar schema. > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which enables the > >>> use of Pulsar topics as tables in Table API as well as SQL client. > >>> > >>> ## Reference > >>> [0] https://github.com/streamnative/pulsar-flink > >>> [1] https://pulsar.apache.org/ > >>> [2] https://pulsar.apache.org/en/powered-by/ > >>> [3] > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > >>> [4] > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > >>> > >>> Best, > >>> Yijie Shen > >>> > > |
Thanks everyone for the comments and feedback.
It seems to me that the main question here is about - "how can the Flink community maintain the connector?". Here are two thoughts from myself. 1) I think how and where to host this integration is kind of less important here. I believe there can be many ways to achieve it. As part of the contribution, what we are looking for here is how these two communities can build the collaboration relationship on developing the integration between Pulsar and Flink. Even we can try our best to catch up all the updates in Flink community. We are still facing the fact that we have less experiences in Flink than folks in Flink community. In order to make sure we maintain and deliver a high-quality pulsar-flink integration to the users who use both technologies, we need some help from the experts from Flink community. 2) We have been following FLIP-27 for a while. Originally we were thinking of contributing the connectors back after integrating with the new API introduced in FLIP-27. But we decided to initiate the conversation as early as possible. Because we believe there are more benefits doing it now rather than later. As part of contribution, it can help Flink community understand more about Pulsar and the potential integration points. Also we can also help Flink community verify the new connector API as well as other new API (e.g. catalog API). Thanks, Sijie On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <[hidden email]> wrote: > Hi Yijie, > > Thanks for the interest in contributing the Pulsar connector. > > In general, I think having Pulsar connector with strong support is a > valuable addition to Flink. So I am happy the shepherd this effort. > Meanwhile, I would also like to provide some context and recent efforts on > the Flink connectors ecosystem. > > The current way Flink maintains its connector has hit the scalability bar. > With more and more connectors coming into Flink repo, we are facing a few > problems such as long build and testing time. To address this problem, we > have attempted to do the following: > 1. Split out the connectors into a separate repository. This is temporarily > on hold due to potential solution to shorten the build time. > 2. Encourage the connectors to stay as ecosystem project while Flink tries > to provide good support for functionality and compatibility tests. Robert > has driven to create a Flink Ecosystem project website and it is going > through some final approval process. > > Given the above efforts, it would be great to first see if we can have > Pulsar connector as an ecosystem project with great support. It would be > good to hear how the Flink Pulsar connector is tested currently to see if > we can learn something to maintain it as an ecosystem project with good > quality and test coverage. If the quality as an ecosystem project is hard > to guarantee, we may as well adopt it into the main repo. > > BTW, another ongoing effort is FLIP-27 where we are making changes to the > Flink source connector architecture and interface. This change will likely > land in 1.10. Therefore timing wise, if we are going to have the Pulsar > connector in main repo, I am wondering if we should hold a little bit and > let the Pulsar connector adapt to the new interface to avoid shortly > deprecated work? > > Thanks, > > Jiangjie (Becket) Qin > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <[hidden email]> > wrote: > > > I'm quite worried that we may end up repeating history. > > > > There were already 2 attempts at contributing a pulsar connector, both > > of which failed because no committer was getting involved, despite the > > contributor opening a dedicated discussion thread about the contribution > > beforehand and getting several +1's from committers. > > > > We should really make sure that if we welcome/approve such a > > contribution it will actually get the attention it deserves. > > > > As such, I'm inclined to recommend maintaining the connector outside of > > Flink. We could link to it from the documentation to give it more > exposure. > > With the upcoming page for sharing artifacts among the community (what's > > the state of that anyway?), this may be a better option. > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > Hi everyone, > > > > > > thanks a lot for starting this discussion Yijie. I think the Pulsar > > > connector would be a very valuable addition since Pulsar becomes more > and > > > more popular and it would further expand Flink's interoperability. Also > > > from a project perspective it makes sense for me to place the connector > > in > > > the downstream project. > > > > > > My main concern/question is how can the Flink community maintain the > > > connector? We have seen in the past that connectors are some of the > most > > > actively developed components because they need to be kept in sync with > > the > > > external system and with Flink. Given that the Pulsar community is > > willing > > > to help with maintaining, improving and evolving the connector, I'm > > > optimistic that we can achieve this. Hence, +1 for contributing it back > > to > > > Flink. > > > > > > Cheers, > > > Till > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email]> wrote: > > > > > >> Hi Yun, > > >> > > >> Since I was the main driver behind FLINK-9641 and FLINK-9168, let me > > try to > > >> add more context on this. > > >> > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source > and > > >> sink for Flink. The integration was done with Flink 1.6.0. We sent out > > pull > > >> requests about a year ago and we ended up maintaining those connectors > > in > > >> Pulsar for Pulsar users to use Flink to process event streams in > Pulsar. > > >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The > > Flink > > >> 1.6 integration is pretty simple and there is no schema > considerations. > > >> > > >> In the past year, we have made a lot of changes in Pulsar and brought > > >> Pulsar schema as the first-class citizen in Pulsar. We also integrated > > with > > >> other computing engines for processing Pulsar event streams with > Pulsar > > >> schema. > > >> > > >> It led us to rethink how to integrate with Flink in the best way. Then > > we > > >> reimplement the pulsar-flink connectors from the ground up with schema > > and > > >> bring table API and catalog API as the first-class citizen in the > > >> integration. With that being said, in the new pulsar-flink > > implementation, > > >> you can register pulsar as a flink catalog and query / process the > event > > >> streams using Flink SQL. > > >> > > >> This is an example about how to use Pulsar as a Flink catalog: > > >> > > >> > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > >> > > >> Yijie has also written a blog post explaining why we re-implement the > > flink > > >> connector with Flink 1.9 and what are the changes we made in the new > > >> connector: > > >> > > >> > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > >> > > >> We believe Pulsar is not just a simple data sink or source for Flink. > It > > >> actually can be a fully integrated streaming data storage for Flink in > > many > > >> areas (sink, source, schema/catalog and state). The combination of > Flink > > >> and Pulsar can create a great streaming warehouse architecture for > > >> streaming-first, unified data processing. Since we are talking to > > >> contribute Pulsar integration to Flink here, we are also dedicated to > > >> maintain, improve and evolve the integration with Flink to help the > > users > > >> who use both Flink and Pulsar. > > >> > > >> Hope this give you a bit more background about the pulsar flink > > >> integration. Let me know what are your thoughts. > > >> > > >> Thanks, > > >> Sijie > > >> > > >> > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> wrote: > > >> > > >>> Hi Yijie > > >>> > > >>> I can see that Pulsar becomes more and more popular recently and very > > >> glad > > >>> to see more people willing to contribute to Flink ecosystem. > > >>> > > >>> Before any further discussion, would you please give some explanation > > of > > >>> the relationship between this thread to current existing JIRAs of > > pulsar > > >>> source [1] and sink [2] connector? Will the contribution contains > part > > of > > >>> those PRs or totally different implementation? > > >>> > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > >>> > > >>> Best > > >>> Yun Tang > > >>> ________________________________ > > >>> From: Yijie Shen <[hidden email]> > > >>> Sent: Tuesday, September 3, 2019 13:57 > > >>> To: [hidden email] <[hidden email]> > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink > > >>> > > >>> Dear Flink Community! > > >>> > > >>> I would like to open the discussion of contributing Pulsar Flink > > >>> connector [0] back to Flink. > > >>> > > >>> ## A brief introduction to Apache Pulsar > > >>> > > >>> Apache Pulsar[1] is a multi-tenant, high-performance distributed > > >>> pub-sub messaging system. Pulsar includes multiple features such as > > >>> native support for multiple clusters in a Pulsar instance, with > > >>> seamless geo-replication of messages across clusters, very low > publish > > >>> and end-to-end latency, seamless scalability to over a million > topics, > > >>> and guaranteed message delivery with persistent message storage > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by > > >>> more and more companies[2]. > > >>> > > >>> ## The status of Pulsar Flink connector > > >>> > > >>> The Pulsar Flink connector we are planning to contribute is built > upon > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > >>> - Pulsar as a streaming source with exactly-once guarantee. > > >>> - Sink streaming results to Pulsar with at-least-once semantics. (We > > >>> would update this to exactly-once as well when Pulsar gets all > > >>> transaction features ready in its 2.5.0 version) > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and can > > >>> automatically (de)serialize messages with the help of Pulsar schema. > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which enables > the > > >>> use of Pulsar topics as tables in Table API as well as SQL client. > > >>> > > >>> ## Reference > > >>> [0] https://github.com/streamnative/pulsar-flink > > >>> [1] https://pulsar.apache.org/ > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > >>> [3] > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > >>> [4] > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > >>> > > >>> Best, > > >>> Yijie Shen > > >>> > > > > > |
Thanks for all the feedback and suggestions!
As Sijie said, the goal of the connector has always been to provide users with the latest features of both systems as soon as possible. We propose to contribute the connector to Flink and hope to get more suggestions and feedback from Flink experts to ensure the high quality of the connector. For FLIP-27, we noticed its existence at the beginning of reworking the connector implementation based on Flink 1.9; we also wanted to build a connector that supports both batch and stream computing based on it. However, it has been inactive for some time, so we decided to provide a connector with most of the new features, such as the new type system and the new catalog API first. We will pay attention to the progress of FLIP-27 continually and incorporate it with the connector as soon as possible. Regarding the test status of the connector, we are following the other connectors' test in Flink repository and aimed to provide throughout tests as we could. We are also happy to hear suggestions and supervision from the Flink community to improve the stability and performance of the connector continuously. Best, Yijie On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email]> wrote: > > Thanks everyone for the comments and feedback. > > It seems to me that the main question here is about - "how can the Flink > community maintain the connector?". > > Here are two thoughts from myself. > > 1) I think how and where to host this integration is kind of less important > here. I believe there can be many ways to achieve it. > As part of the contribution, what we are looking for here is how these two > communities can build the collaboration relationship on developing > the integration between Pulsar and Flink. Even we can try our best to catch > up all the updates in Flink community. We are still > facing the fact that we have less experiences in Flink than folks in Flink > community. In order to make sure we maintain and deliver > a high-quality pulsar-flink integration to the users who use both > technologies, we need some help from the experts from Flink community. > > 2) We have been following FLIP-27 for a while. Originally we were thinking > of contributing the connectors back after integrating with the > new API introduced in FLIP-27. But we decided to initiate the conversation > as early as possible. Because we believe there are more benefits doing > it now rather than later. As part of contribution, it can help Flink > community understand more about Pulsar and the potential integration points. > Also we can also help Flink community verify the new connector API as well > as other new API (e.g. catalog API). > > Thanks, > Sijie > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <[hidden email]> wrote: > > > Hi Yijie, > > > > Thanks for the interest in contributing the Pulsar connector. > > > > In general, I think having Pulsar connector with strong support is a > > valuable addition to Flink. So I am happy the shepherd this effort. > > Meanwhile, I would also like to provide some context and recent efforts on > > the Flink connectors ecosystem. > > > > The current way Flink maintains its connector has hit the scalability bar. > > With more and more connectors coming into Flink repo, we are facing a few > > problems such as long build and testing time. To address this problem, we > > have attempted to do the following: > > 1. Split out the connectors into a separate repository. This is temporarily > > on hold due to potential solution to shorten the build time. > > 2. Encourage the connectors to stay as ecosystem project while Flink tries > > to provide good support for functionality and compatibility tests. Robert > > has driven to create a Flink Ecosystem project website and it is going > > through some final approval process. > > > > Given the above efforts, it would be great to first see if we can have > > Pulsar connector as an ecosystem project with great support. It would be > > good to hear how the Flink Pulsar connector is tested currently to see if > > we can learn something to maintain it as an ecosystem project with good > > quality and test coverage. If the quality as an ecosystem project is hard > > to guarantee, we may as well adopt it into the main repo. > > > > BTW, another ongoing effort is FLIP-27 where we are making changes to the > > Flink source connector architecture and interface. This change will likely > > land in 1.10. Therefore timing wise, if we are going to have the Pulsar > > connector in main repo, I am wondering if we should hold a little bit and > > let the Pulsar connector adapt to the new interface to avoid shortly > > deprecated work? > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <[hidden email]> > > wrote: > > > > > I'm quite worried that we may end up repeating history. > > > > > > There were already 2 attempts at contributing a pulsar connector, both > > > of which failed because no committer was getting involved, despite the > > > contributor opening a dedicated discussion thread about the contribution > > > beforehand and getting several +1's from committers. > > > > > > We should really make sure that if we welcome/approve such a > > > contribution it will actually get the attention it deserves. > > > > > > As such, I'm inclined to recommend maintaining the connector outside of > > > Flink. We could link to it from the documentation to give it more > > exposure. > > > With the upcoming page for sharing artifacts among the community (what's > > > the state of that anyway?), this may be a better option. > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > Hi everyone, > > > > > > > > thanks a lot for starting this discussion Yijie. I think the Pulsar > > > > connector would be a very valuable addition since Pulsar becomes more > > and > > > > more popular and it would further expand Flink's interoperability. Also > > > > from a project perspective it makes sense for me to place the connector > > > in > > > > the downstream project. > > > > > > > > My main concern/question is how can the Flink community maintain the > > > > connector? We have seen in the past that connectors are some of the > > most > > > > actively developed components because they need to be kept in sync with > > > the > > > > external system and with Flink. Given that the Pulsar community is > > > willing > > > > to help with maintaining, improving and evolving the connector, I'm > > > > optimistic that we can achieve this. Hence, +1 for contributing it back > > > to > > > > Flink. > > > > > > > > Cheers, > > > > Till > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email]> wrote: > > > > > > > >> Hi Yun, > > > >> > > > >> Since I was the main driver behind FLINK-9641 and FLINK-9168, let me > > > try to > > > >> add more context on this. > > > >> > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as source > > and > > > >> sink for Flink. The integration was done with Flink 1.6.0. We sent out > > > pull > > > >> requests about a year ago and we ended up maintaining those connectors > > > in > > > >> Pulsar for Pulsar users to use Flink to process event streams in > > Pulsar. > > > >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink). The > > > Flink > > > >> 1.6 integration is pretty simple and there is no schema > > considerations. > > > >> > > > >> In the past year, we have made a lot of changes in Pulsar and brought > > > >> Pulsar schema as the first-class citizen in Pulsar. We also integrated > > > with > > > >> other computing engines for processing Pulsar event streams with > > Pulsar > > > >> schema. > > > >> > > > >> It led us to rethink how to integrate with Flink in the best way. Then > > > we > > > >> reimplement the pulsar-flink connectors from the ground up with schema > > > and > > > >> bring table API and catalog API as the first-class citizen in the > > > >> integration. With that being said, in the new pulsar-flink > > > implementation, > > > >> you can register pulsar as a flink catalog and query / process the > > event > > > >> streams using Flink SQL. > > > >> > > > >> This is an example about how to use Pulsar as a Flink catalog: > > > >> > > > >> > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > >> > > > >> Yijie has also written a blog post explaining why we re-implement the > > > flink > > > >> connector with Flink 1.9 and what are the changes we made in the new > > > >> connector: > > > >> > > > >> > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > >> > > > >> We believe Pulsar is not just a simple data sink or source for Flink. > > It > > > >> actually can be a fully integrated streaming data storage for Flink in > > > many > > > >> areas (sink, source, schema/catalog and state). The combination of > > Flink > > > >> and Pulsar can create a great streaming warehouse architecture for > > > >> streaming-first, unified data processing. Since we are talking to > > > >> contribute Pulsar integration to Flink here, we are also dedicated to > > > >> maintain, improve and evolve the integration with Flink to help the > > > users > > > >> who use both Flink and Pulsar. > > > >> > > > >> Hope this give you a bit more background about the pulsar flink > > > >> integration. Let me know what are your thoughts. > > > >> > > > >> Thanks, > > > >> Sijie > > > >> > > > >> > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> wrote: > > > >> > > > >>> Hi Yijie > > > >>> > > > >>> I can see that Pulsar becomes more and more popular recently and very > > > >> glad > > > >>> to see more people willing to contribute to Flink ecosystem. > > > >>> > > > >>> Before any further discussion, would you please give some explanation > > > of > > > >>> the relationship between this thread to current existing JIRAs of > > > pulsar > > > >>> source [1] and sink [2] connector? Will the contribution contains > > part > > > of > > > >>> those PRs or totally different implementation? > > > >>> > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > >>> > > > >>> Best > > > >>> Yun Tang > > > >>> ________________________________ > > > >>> From: Yijie Shen <[hidden email]> > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > >>> To: [hidden email] <[hidden email]> > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to Flink > > > >>> > > > >>> Dear Flink Community! > > > >>> > > > >>> I would like to open the discussion of contributing Pulsar Flink > > > >>> connector [0] back to Flink. > > > >>> > > > >>> ## A brief introduction to Apache Pulsar > > > >>> > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance distributed > > > >>> pub-sub messaging system. Pulsar includes multiple features such as > > > >>> native support for multiple clusters in a Pulsar instance, with > > > >>> seamless geo-replication of messages across clusters, very low > > publish > > > >>> and end-to-end latency, seamless scalability to over a million > > topics, > > > >>> and guaranteed message delivery with persistent message storage > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been adopted by > > > >>> more and more companies[2]. > > > >>> > > > >>> ## The status of Pulsar Flink connector > > > >>> > > > >>> The Pulsar Flink connector we are planning to contribute is built > > upon > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > >>> - Pulsar as a streaming source with exactly-once guarantee. > > > >>> - Sink streaming results to Pulsar with at-least-once semantics. (We > > > >>> would update this to exactly-once as well when Pulsar gets all > > > >>> transaction features ready in its 2.5.0 version) > > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and can > > > >>> automatically (de)serialize messages with the help of Pulsar schema. > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which enables > > the > > > >>> use of Pulsar topics as tables in Table API as well as SQL client. > > > >>> > > > >>> ## Reference > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > >>> [1] https://pulsar.apache.org/ > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > >>> [3] > > > >>> > > > >> > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > >>> [4] > > > >>> > > > >> > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > >>> > > > >>> Best, > > > >>> Yijie Shen > > > >>> > > > > > > > > |
Hi Sijie and Yijie,
Thanks for sharing your thoughts. Just want to have some update on FLIP-27. Although the FLIP wiki and discussion thread has been quiet for some time, a few committer / contributors in Flink community were actually prototyping the entire thing. We have made some good progress there but want to update the FLIP wiki after the entire thing is verified to work in case there are some last minute surprise in the implementation. I don't have an exact ETA yet, but I guess it is going to be within a month or so. I am happy to review the current Flink Pulsar connector and see if it would fit in FLIP-27. It would be good to avoid the case that we checked in the Pulsar connector with some review efforts and shortly after that the new Source interface is ready. Thanks, Jiangjie (Becket) Qin On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen <[hidden email]> wrote: > Thanks for all the feedback and suggestions! > > As Sijie said, the goal of the connector has always been to provide > users with the latest features of both systems as soon as possible. We > propose to contribute the connector to Flink and hope to get more > suggestions and feedback from Flink experts to ensure the high quality > of the connector. > > For FLIP-27, we noticed its existence at the beginning of reworking > the connector implementation based on Flink 1.9; we also wanted to > build a connector that supports both batch and stream computing based > on it. > However, it has been inactive for some time, so we decided to provide > a connector with most of the new features, such as the new type system > and the new catalog API first. We will pay attention to the progress > of FLIP-27 continually and incorporate it with the connector as soon > as possible. > > Regarding the test status of the connector, we are following the other > connectors' test in Flink repository and aimed to provide throughout > tests as we could. We are also happy to hear suggestions and > supervision from the Flink community to improve the stability and > performance of the connector continuously. > > Best, > Yijie > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email]> wrote: > > > > Thanks everyone for the comments and feedback. > > > > It seems to me that the main question here is about - "how can the Flink > > community maintain the connector?". > > > > Here are two thoughts from myself. > > > > 1) I think how and where to host this integration is kind of less > important > > here. I believe there can be many ways to achieve it. > > As part of the contribution, what we are looking for here is how these > two > > communities can build the collaboration relationship on developing > > the integration between Pulsar and Flink. Even we can try our best to > catch > > up all the updates in Flink community. We are still > > facing the fact that we have less experiences in Flink than folks in > Flink > > community. In order to make sure we maintain and deliver > > a high-quality pulsar-flink integration to the users who use both > > technologies, we need some help from the experts from Flink community. > > > > 2) We have been following FLIP-27 for a while. Originally we were > thinking > > of contributing the connectors back after integrating with the > > new API introduced in FLIP-27. But we decided to initiate the > conversation > > as early as possible. Because we believe there are more benefits doing > > it now rather than later. As part of contribution, it can help Flink > > community understand more about Pulsar and the potential integration > points. > > Also we can also help Flink community verify the new connector API as > well > > as other new API (e.g. catalog API). > > > > Thanks, > > Sijie > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <[hidden email]> wrote: > > > > > Hi Yijie, > > > > > > Thanks for the interest in contributing the Pulsar connector. > > > > > > In general, I think having Pulsar connector with strong support is a > > > valuable addition to Flink. So I am happy the shepherd this effort. > > > Meanwhile, I would also like to provide some context and recent > efforts on > > > the Flink connectors ecosystem. > > > > > > The current way Flink maintains its connector has hit the scalability > bar. > > > With more and more connectors coming into Flink repo, we are facing a > few > > > problems such as long build and testing time. To address this problem, > we > > > have attempted to do the following: > > > 1. Split out the connectors into a separate repository. This is > temporarily > > > on hold due to potential solution to shorten the build time. > > > 2. Encourage the connectors to stay as ecosystem project while Flink > tries > > > to provide good support for functionality and compatibility tests. > Robert > > > has driven to create a Flink Ecosystem project website and it is going > > > through some final approval process. > > > > > > Given the above efforts, it would be great to first see if we can have > > > Pulsar connector as an ecosystem project with great support. It would > be > > > good to hear how the Flink Pulsar connector is tested currently to see > if > > > we can learn something to maintain it as an ecosystem project with good > > > quality and test coverage. If the quality as an ecosystem project is > hard > > > to guarantee, we may as well adopt it into the main repo. > > > > > > BTW, another ongoing effort is FLIP-27 where we are making changes to > the > > > Flink source connector architecture and interface. This change will > likely > > > land in 1.10. Therefore timing wise, if we are going to have the Pulsar > > > connector in main repo, I am wondering if we should hold a little bit > and > > > let the Pulsar connector adapt to the new interface to avoid shortly > > > deprecated work? > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <[hidden email]> > > > wrote: > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > There were already 2 attempts at contributing a pulsar connector, > both > > > > of which failed because no committer was getting involved, despite > the > > > > contributor opening a dedicated discussion thread about the > contribution > > > > beforehand and getting several +1's from committers. > > > > > > > > We should really make sure that if we welcome/approve such a > > > > contribution it will actually get the attention it deserves. > > > > > > > > As such, I'm inclined to recommend maintaining the connector outside > of > > > > Flink. We could link to it from the documentation to give it more > > > exposure. > > > > With the upcoming page for sharing artifacts among the community > (what's > > > > the state of that anyway?), this may be a better option. > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > Hi everyone, > > > > > > > > > > thanks a lot for starting this discussion Yijie. I think the Pulsar > > > > > connector would be a very valuable addition since Pulsar becomes > more > > > and > > > > > more popular and it would further expand Flink's interoperability. > Also > > > > > from a project perspective it makes sense for me to place the > connector > > > > in > > > > > the downstream project. > > > > > > > > > > My main concern/question is how can the Flink community maintain > the > > > > > connector? We have seen in the past that connectors are some of the > > > most > > > > > actively developed components because they need to be kept in sync > with > > > > the > > > > > external system and with Flink. Given that the Pulsar community is > > > > willing > > > > > to help with maintaining, improving and evolving the connector, I'm > > > > > optimistic that we can achieve this. Hence, +1 for contributing it > back > > > > to > > > > > Flink. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email]> > wrote: > > > > > > > > > >> Hi Yun, > > > > >> > > > > >> Since I was the main driver behind FLINK-9641 and FLINK-9168, let > me > > > > try to > > > > >> add more context on this. > > > > >> > > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as > source > > > and > > > > >> sink for Flink. The integration was done with Flink 1.6.0. We > sent out > > > > pull > > > > >> requests about a year ago and we ended up maintaining those > connectors > > > > in > > > > >> Pulsar for Pulsar users to use Flink to process event streams in > > > Pulsar. > > > > >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink). > The > > > > Flink > > > > >> 1.6 integration is pretty simple and there is no schema > > > considerations. > > > > >> > > > > >> In the past year, we have made a lot of changes in Pulsar and > brought > > > > >> Pulsar schema as the first-class citizen in Pulsar. We also > integrated > > > > with > > > > >> other computing engines for processing Pulsar event streams with > > > Pulsar > > > > >> schema. > > > > >> > > > > >> It led us to rethink how to integrate with Flink in the best way. > Then > > > > we > > > > >> reimplement the pulsar-flink connectors from the ground up with > schema > > > > and > > > > >> bring table API and catalog API as the first-class citizen in the > > > > >> integration. With that being said, in the new pulsar-flink > > > > implementation, > > > > >> you can register pulsar as a flink catalog and query / process the > > > event > > > > >> streams using Flink SQL. > > > > >> > > > > >> This is an example about how to use Pulsar as a Flink catalog: > > > > >> > > > > >> > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > >> > > > > >> Yijie has also written a blog post explaining why we re-implement > the > > > > flink > > > > >> connector with Flink 1.9 and what are the changes we made in the > new > > > > >> connector: > > > > >> > > > > >> > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > >> > > > > >> We believe Pulsar is not just a simple data sink or source for > Flink. > > > It > > > > >> actually can be a fully integrated streaming data storage for > Flink in > > > > many > > > > >> areas (sink, source, schema/catalog and state). The combination of > > > Flink > > > > >> and Pulsar can create a great streaming warehouse architecture for > > > > >> streaming-first, unified data processing. Since we are talking to > > > > >> contribute Pulsar integration to Flink here, we are also > dedicated to > > > > >> maintain, improve and evolve the integration with Flink to help > the > > > > users > > > > >> who use both Flink and Pulsar. > > > > >> > > > > >> Hope this give you a bit more background about the pulsar flink > > > > >> integration. Let me know what are your thoughts. > > > > >> > > > > >> Thanks, > > > > >> Sijie > > > > >> > > > > >> > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> > wrote: > > > > >> > > > > >>> Hi Yijie > > > > >>> > > > > >>> I can see that Pulsar becomes more and more popular recently and > very > > > > >> glad > > > > >>> to see more people willing to contribute to Flink ecosystem. > > > > >>> > > > > >>> Before any further discussion, would you please give some > explanation > > > > of > > > > >>> the relationship between this thread to current existing JIRAs of > > > > pulsar > > > > >>> source [1] and sink [2] connector? Will the contribution contains > > > part > > > > of > > > > >>> those PRs or totally different implementation? > > > > >>> > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > >>> > > > > >>> Best > > > > >>> Yun Tang > > > > >>> ________________________________ > > > > >>> From: Yijie Shen <[hidden email]> > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > >>> To: [hidden email] <[hidden email]> > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to > Flink > > > > >>> > > > > >>> Dear Flink Community! > > > > >>> > > > > >>> I would like to open the discussion of contributing Pulsar Flink > > > > >>> connector [0] back to Flink. > > > > >>> > > > > >>> ## A brief introduction to Apache Pulsar > > > > >>> > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance distributed > > > > >>> pub-sub messaging system. Pulsar includes multiple features such > as > > > > >>> native support for multiple clusters in a Pulsar instance, with > > > > >>> seamless geo-replication of messages across clusters, very low > > > publish > > > > >>> and end-to-end latency, seamless scalability to over a million > > > topics, > > > > >>> and guaranteed message delivery with persistent message storage > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been adopted > by > > > > >>> more and more companies[2]. > > > > >>> > > > > >>> ## The status of Pulsar Flink connector > > > > >>> > > > > >>> The Pulsar Flink connector we are planning to contribute is built > > > upon > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > > >>> - Pulsar as a streaming source with exactly-once guarantee. > > > > >>> - Sink streaming results to Pulsar with at-least-once semantics. > (We > > > > >>> would update this to exactly-once as well when Pulsar gets all > > > > >>> transaction features ready in its 2.5.0 version) > > > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and > can > > > > >>> automatically (de)serialize messages with the help of Pulsar > schema. > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which > enables > > > the > > > > >>> use of Pulsar topics as tables in Table API as well as SQL > client. > > > > >>> > > > > >>> ## Reference > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > >>> [1] https://pulsar.apache.org/ > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > >>> [3] > > > > >>> > > > > >> > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > >>> [4] > > > > >>> > > > > >> > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > >>> > > > > >>> Best, > > > > >>> Yijie Shen > > > > >>> > > > > > > > > > > > > |
Hi everyone,
I'm wondering what the problem would be if we committed the Pulsar connector before the new source interface is ready. If I understood it correctly, then we need to support the old source interface anyway for the existing connectors. By checking it in early I could see the benefit that our users could start using the connector earlier. Moreover, it would prevent that the Pulsar integration is being delayed in case that the source interface should be delayed. The only downside I see is the extra review effort and potential fixes which might be irrelevant for the new source interface implementation. I guess it mainly depends on how certain we are when the new source interface will be ready. Cheers, Till On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <[hidden email]> wrote: > Hi Sijie and Yijie, > > Thanks for sharing your thoughts. > > Just want to have some update on FLIP-27. Although the FLIP wiki and > discussion thread has been quiet for some time, a few committer / > contributors in Flink community were actually prototyping the entire thing. > We have made some good progress there but want to update the FLIP wiki > after the entire thing is verified to work in case there are some last > minute surprise in the implementation. I don't have an exact ETA yet, but I > guess it is going to be within a month or so. > > I am happy to review the current Flink Pulsar connector and see if it would > fit in FLIP-27. It would be good to avoid the case that we checked in the > Pulsar connector with some review efforts and shortly after that the new > Source interface is ready. > > Thanks, > > Jiangjie (Becket) Qin > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen <[hidden email]> > wrote: > > > Thanks for all the feedback and suggestions! > > > > As Sijie said, the goal of the connector has always been to provide > > users with the latest features of both systems as soon as possible. We > > propose to contribute the connector to Flink and hope to get more > > suggestions and feedback from Flink experts to ensure the high quality > > of the connector. > > > > For FLIP-27, we noticed its existence at the beginning of reworking > > the connector implementation based on Flink 1.9; we also wanted to > > build a connector that supports both batch and stream computing based > > on it. > > However, it has been inactive for some time, so we decided to provide > > a connector with most of the new features, such as the new type system > > and the new catalog API first. We will pay attention to the progress > > of FLIP-27 continually and incorporate it with the connector as soon > > as possible. > > > > Regarding the test status of the connector, we are following the other > > connectors' test in Flink repository and aimed to provide throughout > > tests as we could. We are also happy to hear suggestions and > > supervision from the Flink community to improve the stability and > > performance of the connector continuously. > > > > Best, > > Yijie > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email]> wrote: > > > > > > Thanks everyone for the comments and feedback. > > > > > > It seems to me that the main question here is about - "how can the > Flink > > > community maintain the connector?". > > > > > > Here are two thoughts from myself. > > > > > > 1) I think how and where to host this integration is kind of less > > important > > > here. I believe there can be many ways to achieve it. > > > As part of the contribution, what we are looking for here is how these > > two > > > communities can build the collaboration relationship on developing > > > the integration between Pulsar and Flink. Even we can try our best to > > catch > > > up all the updates in Flink community. We are still > > > facing the fact that we have less experiences in Flink than folks in > > Flink > > > community. In order to make sure we maintain and deliver > > > a high-quality pulsar-flink integration to the users who use both > > > technologies, we need some help from the experts from Flink community. > > > > > > 2) We have been following FLIP-27 for a while. Originally we were > > thinking > > > of contributing the connectors back after integrating with the > > > new API introduced in FLIP-27. But we decided to initiate the > > conversation > > > as early as possible. Because we believe there are more benefits doing > > > it now rather than later. As part of contribution, it can help Flink > > > community understand more about Pulsar and the potential integration > > points. > > > Also we can also help Flink community verify the new connector API as > > well > > > as other new API (e.g. catalog API). > > > > > > Thanks, > > > Sijie > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <[hidden email]> > wrote: > > > > > > > Hi Yijie, > > > > > > > > Thanks for the interest in contributing the Pulsar connector. > > > > > > > > In general, I think having Pulsar connector with strong support is a > > > > valuable addition to Flink. So I am happy the shepherd this effort. > > > > Meanwhile, I would also like to provide some context and recent > > efforts on > > > > the Flink connectors ecosystem. > > > > > > > > The current way Flink maintains its connector has hit the scalability > > bar. > > > > With more and more connectors coming into Flink repo, we are facing a > > few > > > > problems such as long build and testing time. To address this > problem, > > we > > > > have attempted to do the following: > > > > 1. Split out the connectors into a separate repository. This is > > temporarily > > > > on hold due to potential solution to shorten the build time. > > > > 2. Encourage the connectors to stay as ecosystem project while Flink > > tries > > > > to provide good support for functionality and compatibility tests. > > Robert > > > > has driven to create a Flink Ecosystem project website and it is > going > > > > through some final approval process. > > > > > > > > Given the above efforts, it would be great to first see if we can > have > > > > Pulsar connector as an ecosystem project with great support. It would > > be > > > > good to hear how the Flink Pulsar connector is tested currently to > see > > if > > > > we can learn something to maintain it as an ecosystem project with > good > > > > quality and test coverage. If the quality as an ecosystem project is > > hard > > > > to guarantee, we may as well adopt it into the main repo. > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are making changes to > > the > > > > Flink source connector architecture and interface. This change will > > likely > > > > land in 1.10. Therefore timing wise, if we are going to have the > Pulsar > > > > connector in main repo, I am wondering if we should hold a little bit > > and > > > > let the Pulsar connector adapt to the new interface to avoid shortly > > > > deprecated work? > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler <[hidden email]> > > > > wrote: > > > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > > > There were already 2 attempts at contributing a pulsar connector, > > both > > > > > of which failed because no committer was getting involved, despite > > the > > > > > contributor opening a dedicated discussion thread about the > > contribution > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > We should really make sure that if we welcome/approve such a > > > > > contribution it will actually get the attention it deserves. > > > > > > > > > > As such, I'm inclined to recommend maintaining the connector > outside > > of > > > > > Flink. We could link to it from the documentation to give it more > > > > exposure. > > > > > With the upcoming page for sharing artifacts among the community > > (what's > > > > > the state of that anyway?), this may be a better option. > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > Hi everyone, > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I think the > Pulsar > > > > > > connector would be a very valuable addition since Pulsar becomes > > more > > > > and > > > > > > more popular and it would further expand Flink's > interoperability. > > Also > > > > > > from a project perspective it makes sense for me to place the > > connector > > > > > in > > > > > > the downstream project. > > > > > > > > > > > > My main concern/question is how can the Flink community maintain > > the > > > > > > connector? We have seen in the past that connectors are some of > the > > > > most > > > > > > actively developed components because they need to be kept in > sync > > with > > > > > the > > > > > > external system and with Flink. Given that the Pulsar community > is > > > > > willing > > > > > > to help with maintaining, improving and evolving the connector, > I'm > > > > > > optimistic that we can achieve this. Hence, +1 for contributing > it > > back > > > > > to > > > > > > Flink. > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email]> > > wrote: > > > > > > > > > > > >> Hi Yun, > > > > > >> > > > > > >> Since I was the main driver behind FLINK-9641 and FLINK-9168, > let > > me > > > > > try to > > > > > >> add more context on this. > > > > > >> > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as > > source > > > > and > > > > > >> sink for Flink. The integration was done with Flink 1.6.0. We > > sent out > > > > > pull > > > > > >> requests about a year ago and we ended up maintaining those > > connectors > > > > > in > > > > > >> Pulsar for Pulsar users to use Flink to process event streams in > > > > Pulsar. > > > > > >> (See https://github.com/apache/pulsar/tree/master/pulsar-flink > ). > > The > > > > > Flink > > > > > >> 1.6 integration is pretty simple and there is no schema > > > > considerations. > > > > > >> > > > > > >> In the past year, we have made a lot of changes in Pulsar and > > brought > > > > > >> Pulsar schema as the first-class citizen in Pulsar. We also > > integrated > > > > > with > > > > > >> other computing engines for processing Pulsar event streams with > > > > Pulsar > > > > > >> schema. > > > > > >> > > > > > >> It led us to rethink how to integrate with Flink in the best > way. > > Then > > > > > we > > > > > >> reimplement the pulsar-flink connectors from the ground up with > > schema > > > > > and > > > > > >> bring table API and catalog API as the first-class citizen in > the > > > > > >> integration. With that being said, in the new pulsar-flink > > > > > implementation, > > > > > >> you can register pulsar as a flink catalog and query / process > the > > > > event > > > > > >> streams using Flink SQL. > > > > > >> > > > > > >> This is an example about how to use Pulsar as a Flink catalog: > > > > > >> > > > > > >> > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > >> > > > > > >> Yijie has also written a blog post explaining why we > re-implement > > the > > > > > flink > > > > > >> connector with Flink 1.9 and what are the changes we made in the > > new > > > > > >> connector: > > > > > >> > > > > > >> > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > >> > > > > > >> We believe Pulsar is not just a simple data sink or source for > > Flink. > > > > It > > > > > >> actually can be a fully integrated streaming data storage for > > Flink in > > > > > many > > > > > >> areas (sink, source, schema/catalog and state). The combination > of > > > > Flink > > > > > >> and Pulsar can create a great streaming warehouse architecture > for > > > > > >> streaming-first, unified data processing. Since we are talking > to > > > > > >> contribute Pulsar integration to Flink here, we are also > > dedicated to > > > > > >> maintain, improve and evolve the integration with Flink to help > > the > > > > > users > > > > > >> who use both Flink and Pulsar. > > > > > >> > > > > > >> Hope this give you a bit more background about the pulsar flink > > > > > >> integration. Let me know what are your thoughts. > > > > > >> > > > > > >> Thanks, > > > > > >> Sijie > > > > > >> > > > > > >> > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> > > wrote: > > > > > >> > > > > > >>> Hi Yijie > > > > > >>> > > > > > >>> I can see that Pulsar becomes more and more popular recently > and > > very > > > > > >> glad > > > > > >>> to see more people willing to contribute to Flink ecosystem. > > > > > >>> > > > > > >>> Before any further discussion, would you please give some > > explanation > > > > > of > > > > > >>> the relationship between this thread to current existing JIRAs > of > > > > > pulsar > > > > > >>> source [1] and sink [2] connector? Will the contribution > contains > > > > part > > > > > of > > > > > >>> those PRs or totally different implementation? > > > > > >>> > > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > > >>> > > > > > >>> Best > > > > > >>> Yun Tang > > > > > >>> ________________________________ > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > >>> To: [hidden email] <[hidden email]> > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to > > Flink > > > > > >>> > > > > > >>> Dear Flink Community! > > > > > >>> > > > > > >>> I would like to open the discussion of contributing Pulsar > Flink > > > > > >>> connector [0] back to Flink. > > > > > >>> > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > >>> > > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance > distributed > > > > > >>> pub-sub messaging system. Pulsar includes multiple features > such > > as > > > > > >>> native support for multiple clusters in a Pulsar instance, with > > > > > >>> seamless geo-replication of messages across clusters, very low > > > > publish > > > > > >>> and end-to-end latency, seamless scalability to over a million > > > > topics, > > > > > >>> and guaranteed message delivery with persistent message storage > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been > adopted > > by > > > > > >>> more and more companies[2]. > > > > > >>> > > > > > >>> ## The status of Pulsar Flink connector > > > > > >>> > > > > > >>> The Pulsar Flink connector we are planning to contribute is > built > > > > upon > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > > > >>> - Pulsar as a streaming source with exactly-once guarantee. > > > > > >>> - Sink streaming results to Pulsar with at-least-once > semantics. > > (We > > > > > >>> would update this to exactly-once as well when Pulsar gets all > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), and > > can > > > > > >>> automatically (de)serialize messages with the help of Pulsar > > schema. > > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which > > enables > > > > the > > > > > >>> use of Pulsar topics as tables in Table API as well as SQL > > client. > > > > > >>> > > > > > >>> ## Reference > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > >>> [1] https://pulsar.apache.org/ > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > >>> [3] > > > > > >>> > > > > > >> > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > >>> [4] > > > > > >>> > > > > > >> > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > >>> > > > > > >>> Best, > > > > > >>> Yijie Shen > > > > > >>> > > > > > > > > > > > > > > > > > |
Hi Till,
You are right. It all depends on when the new source interface is going to be ready. Personally I think it would be there in about a month or so. But I could be too optimistic. It would also be good to hear what do Aljoscha and Stephan think as they are also involved in FLIP-27. In general I think we should have Pulsar connector in Flink 1.10, preferably with the new source interface. We can also check it in right now with old source interface, but I suspect few users will use it before the next official release. Therefore, it seems reasonable to wait a little bit to see whether we can jump to the new source interface. As long as we make sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much. Thanks, Jiangjie (Becket) Qin On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann <[hidden email]> wrote: > Hi everyone, > > I'm wondering what the problem would be if we committed the Pulsar > connector before the new source interface is ready. If I understood it > correctly, then we need to support the old source interface anyway for the > existing connectors. By checking it in early I could see the benefit that > our users could start using the connector earlier. Moreover, it would > prevent that the Pulsar integration is being delayed in case that the > source interface should be delayed. The only downside I see is the extra > review effort and potential fixes which might be irrelevant for the new > source interface implementation. I guess it mainly depends on how certain > we are when the new source interface will be ready. > > Cheers, > Till > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <[hidden email]> wrote: > > > Hi Sijie and Yijie, > > > > Thanks for sharing your thoughts. > > > > Just want to have some update on FLIP-27. Although the FLIP wiki and > > discussion thread has been quiet for some time, a few committer / > > contributors in Flink community were actually prototyping the entire > thing. > > We have made some good progress there but want to update the FLIP wiki > > after the entire thing is verified to work in case there are some last > > minute surprise in the implementation. I don't have an exact ETA yet, > but I > > guess it is going to be within a month or so. > > > > I am happy to review the current Flink Pulsar connector and see if it > would > > fit in FLIP-27. It would be good to avoid the case that we checked in the > > Pulsar connector with some review efforts and shortly after that the new > > Source interface is ready. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen <[hidden email]> > > wrote: > > > > > Thanks for all the feedback and suggestions! > > > > > > As Sijie said, the goal of the connector has always been to provide > > > users with the latest features of both systems as soon as possible. We > > > propose to contribute the connector to Flink and hope to get more > > > suggestions and feedback from Flink experts to ensure the high quality > > > of the connector. > > > > > > For FLIP-27, we noticed its existence at the beginning of reworking > > > the connector implementation based on Flink 1.9; we also wanted to > > > build a connector that supports both batch and stream computing based > > > on it. > > > However, it has been inactive for some time, so we decided to provide > > > a connector with most of the new features, such as the new type system > > > and the new catalog API first. We will pay attention to the progress > > > of FLIP-27 continually and incorporate it with the connector as soon > > > as possible. > > > > > > Regarding the test status of the connector, we are following the other > > > connectors' test in Flink repository and aimed to provide throughout > > > tests as we could. We are also happy to hear suggestions and > > > supervision from the Flink community to improve the stability and > > > performance of the connector continuously. > > > > > > Best, > > > Yijie > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email]> wrote: > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > It seems to me that the main question here is about - "how can the > > Flink > > > > community maintain the connector?". > > > > > > > > Here are two thoughts from myself. > > > > > > > > 1) I think how and where to host this integration is kind of less > > > important > > > > here. I believe there can be many ways to achieve it. > > > > As part of the contribution, what we are looking for here is how > these > > > two > > > > communities can build the collaboration relationship on developing > > > > the integration between Pulsar and Flink. Even we can try our best to > > > catch > > > > up all the updates in Flink community. We are still > > > > facing the fact that we have less experiences in Flink than folks in > > > Flink > > > > community. In order to make sure we maintain and deliver > > > > a high-quality pulsar-flink integration to the users who use both > > > > technologies, we need some help from the experts from Flink > community. > > > > > > > > 2) We have been following FLIP-27 for a while. Originally we were > > > thinking > > > > of contributing the connectors back after integrating with the > > > > new API introduced in FLIP-27. But we decided to initiate the > > > conversation > > > > as early as possible. Because we believe there are more benefits > doing > > > > it now rather than later. As part of contribution, it can help Flink > > > > community understand more about Pulsar and the potential integration > > > points. > > > > Also we can also help Flink community verify the new connector API as > > > well > > > > as other new API (e.g. catalog API). > > > > > > > > Thanks, > > > > Sijie > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <[hidden email]> > > wrote: > > > > > > > > > Hi Yijie, > > > > > > > > > > Thanks for the interest in contributing the Pulsar connector. > > > > > > > > > > In general, I think having Pulsar connector with strong support is > a > > > > > valuable addition to Flink. So I am happy the shepherd this effort. > > > > > Meanwhile, I would also like to provide some context and recent > > > efforts on > > > > > the Flink connectors ecosystem. > > > > > > > > > > The current way Flink maintains its connector has hit the > scalability > > > bar. > > > > > With more and more connectors coming into Flink repo, we are > facing a > > > few > > > > > problems such as long build and testing time. To address this > > problem, > > > we > > > > > have attempted to do the following: > > > > > 1. Split out the connectors into a separate repository. This is > > > temporarily > > > > > on hold due to potential solution to shorten the build time. > > > > > 2. Encourage the connectors to stay as ecosystem project while > Flink > > > tries > > > > > to provide good support for functionality and compatibility tests. > > > Robert > > > > > has driven to create a Flink Ecosystem project website and it is > > going > > > > > through some final approval process. > > > > > > > > > > Given the above efforts, it would be great to first see if we can > > have > > > > > Pulsar connector as an ecosystem project with great support. It > would > > > be > > > > > good to hear how the Flink Pulsar connector is tested currently to > > see > > > if > > > > > we can learn something to maintain it as an ecosystem project with > > good > > > > > quality and test coverage. If the quality as an ecosystem project > is > > > hard > > > > > to guarantee, we may as well adopt it into the main repo. > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are making changes > to > > > the > > > > > Flink source connector architecture and interface. This change will > > > likely > > > > > land in 1.10. Therefore timing wise, if we are going to have the > > Pulsar > > > > > connector in main repo, I am wondering if we should hold a little > bit > > > and > > > > > let the Pulsar connector adapt to the new interface to avoid > shortly > > > > > deprecated work? > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > [hidden email]> > > > > > wrote: > > > > > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > > > > > There were already 2 attempts at contributing a pulsar connector, > > > both > > > > > > of which failed because no committer was getting involved, > despite > > > the > > > > > > contributor opening a dedicated discussion thread about the > > > contribution > > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > > > We should really make sure that if we welcome/approve such a > > > > > > contribution it will actually get the attention it deserves. > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the connector > > outside > > > of > > > > > > Flink. We could link to it from the documentation to give it more > > > > > exposure. > > > > > > With the upcoming page for sharing artifacts among the community > > > (what's > > > > > > the state of that anyway?), this may be a better option. > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > Hi everyone, > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I think the > > Pulsar > > > > > > > connector would be a very valuable addition since Pulsar > becomes > > > more > > > > > and > > > > > > > more popular and it would further expand Flink's > > interoperability. > > > Also > > > > > > > from a project perspective it makes sense for me to place the > > > connector > > > > > > in > > > > > > > the downstream project. > > > > > > > > > > > > > > My main concern/question is how can the Flink community > maintain > > > the > > > > > > > connector? We have seen in the past that connectors are some of > > the > > > > > most > > > > > > > actively developed components because they need to be kept in > > sync > > > with > > > > > > the > > > > > > > external system and with Flink. Given that the Pulsar community > > is > > > > > > willing > > > > > > > to help with maintaining, improving and evolving the connector, > > I'm > > > > > > > optimistic that we can achieve this. Hence, +1 for contributing > > it > > > back > > > > > > to > > > > > > > Flink. > > > > > > > > > > > > > > Cheers, > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email]> > > > wrote: > > > > > > > > > > > > > >> Hi Yun, > > > > > > >> > > > > > > >> Since I was the main driver behind FLINK-9641 and FLINK-9168, > > let > > > me > > > > > > try to > > > > > > >> add more context on this. > > > > > > >> > > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as > > > source > > > > > and > > > > > > >> sink for Flink. The integration was done with Flink 1.6.0. We > > > sent out > > > > > > pull > > > > > > >> requests about a year ago and we ended up maintaining those > > > connectors > > > > > > in > > > > > > >> Pulsar for Pulsar users to use Flink to process event streams > in > > > > > Pulsar. > > > > > > >> (See > https://github.com/apache/pulsar/tree/master/pulsar-flink > > ). > > > The > > > > > > Flink > > > > > > >> 1.6 integration is pretty simple and there is no schema > > > > > considerations. > > > > > > >> > > > > > > >> In the past year, we have made a lot of changes in Pulsar and > > > brought > > > > > > >> Pulsar schema as the first-class citizen in Pulsar. We also > > > integrated > > > > > > with > > > > > > >> other computing engines for processing Pulsar event streams > with > > > > > Pulsar > > > > > > >> schema. > > > > > > >> > > > > > > >> It led us to rethink how to integrate with Flink in the best > > way. > > > Then > > > > > > we > > > > > > >> reimplement the pulsar-flink connectors from the ground up > with > > > schema > > > > > > and > > > > > > >> bring table API and catalog API as the first-class citizen in > > the > > > > > > >> integration. With that being said, in the new pulsar-flink > > > > > > implementation, > > > > > > >> you can register pulsar as a flink catalog and query / process > > the > > > > > event > > > > > > >> streams using Flink SQL. > > > > > > >> > > > > > > >> This is an example about how to use Pulsar as a Flink catalog: > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > >> > > > > > > >> Yijie has also written a blog post explaining why we > > re-implement > > > the > > > > > > flink > > > > > > >> connector with Flink 1.9 and what are the changes we made in > the > > > new > > > > > > >> connector: > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > >> > > > > > > >> We believe Pulsar is not just a simple data sink or source for > > > Flink. > > > > > It > > > > > > >> actually can be a fully integrated streaming data storage for > > > Flink in > > > > > > many > > > > > > >> areas (sink, source, schema/catalog and state). The > combination > > of > > > > > Flink > > > > > > >> and Pulsar can create a great streaming warehouse architecture > > for > > > > > > >> streaming-first, unified data processing. Since we are talking > > to > > > > > > >> contribute Pulsar integration to Flink here, we are also > > > dedicated to > > > > > > >> maintain, improve and evolve the integration with Flink to > help > > > the > > > > > > users > > > > > > >> who use both Flink and Pulsar. > > > > > > >> > > > > > > >> Hope this give you a bit more background about the pulsar > flink > > > > > > >> integration. Let me know what are your thoughts. > > > > > > >> > > > > > > >> Thanks, > > > > > > >> Sijie > > > > > > >> > > > > > > >> > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> > > > wrote: > > > > > > >> > > > > > > >>> Hi Yijie > > > > > > >>> > > > > > > >>> I can see that Pulsar becomes more and more popular recently > > and > > > very > > > > > > >> glad > > > > > > >>> to see more people willing to contribute to Flink ecosystem. > > > > > > >>> > > > > > > >>> Before any further discussion, would you please give some > > > explanation > > > > > > of > > > > > > >>> the relationship between this thread to current existing > JIRAs > > of > > > > > > pulsar > > > > > > >>> source [1] and sink [2] connector? Will the contribution > > contains > > > > > part > > > > > > of > > > > > > >>> those PRs or totally different implementation? > > > > > > >>> > > > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > >>> > > > > > > >>> Best > > > > > > >>> Yun Tang > > > > > > >>> ________________________________ > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > >>> To: [hidden email] <[hidden email]> > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back to > > > Flink > > > > > > >>> > > > > > > >>> Dear Flink Community! > > > > > > >>> > > > > > > >>> I would like to open the discussion of contributing Pulsar > > Flink > > > > > > >>> connector [0] back to Flink. > > > > > > >>> > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > >>> > > > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance > > distributed > > > > > > >>> pub-sub messaging system. Pulsar includes multiple features > > such > > > as > > > > > > >>> native support for multiple clusters in a Pulsar instance, > with > > > > > > >>> seamless geo-replication of messages across clusters, very > low > > > > > publish > > > > > > >>> and end-to-end latency, seamless scalability to over a > million > > > > > topics, > > > > > > >>> and guaranteed message delivery with persistent message > storage > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been > > adopted > > > by > > > > > > >>> more and more companies[2]. > > > > > > >>> > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > >>> > > > > > > >>> The Pulsar Flink connector we are planning to contribute is > > built > > > > > upon > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > > > > >>> - Pulsar as a streaming source with exactly-once guarantee. > > > > > > >>> - Sink streaming results to Pulsar with at-least-once > > semantics. > > > (We > > > > > > >>> would update this to exactly-once as well when Pulsar gets > all > > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), > and > > > can > > > > > > >>> automatically (de)serialize messages with the help of Pulsar > > > schema. > > > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which > > > enables > > > > > the > > > > > > >>> use of Pulsar topics as tables in Table API as well as SQL > > > client. > > > > > > >>> > > > > > > >>> ## Reference > > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > >>> [3] > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > >>> [4] > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > >>> > > > > > > >>> Best, > > > > > > >>> Yijie Shen > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > |
Hi,
I think having a Pulsar connector in Flink can be a good mutual benefit to both communities. Another perspective is that Pulsar connector is the 1st streaming connector that integrates with Flink's metadata management system and Catalog APIs. It'll be cool to see how the integration turns out and whether we need to improve Flink Catalog stack, which are currently in Beta, to cater to streaming source/sink. Thus I'm in favor of merging Pulsar connector into Flink 1.10. I'd suggest to submit smaller sized PRs, e.g. maybe one for basic source/sink functionalities and another for schema and catalog integration, just to make them easier to review. It doesn't seem to hurt to wait for FLIP-27. But I don't think FLIP-27 should be a blocker in cases where it cannot make its way into 1.10 or doesn't leave reasonable amount of time for committers to review or for Pulsar connector to fully adapt to new interfaces. Bowen On Thu, Sep 5, 2019 at 3:21 AM Becket Qin <[hidden email]> wrote: > Hi Till, > > You are right. It all depends on when the new source interface is going to > be ready. Personally I think it would be there in about a month or so. But > I could be too optimistic. It would also be good to hear what do Aljoscha > and Stephan think as they are also involved in FLIP-27. > > In general I think we should have Pulsar connector in Flink 1.10, > preferably with the new source interface. We can also check it in right now > with old source interface, but I suspect few users will use it before the > next official release. Therefore, it seems reasonable to wait a little bit > to see whether we can jump to the new source interface. As long as we make > sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much. > > Thanks, > > Jiangjie (Becket) Qin > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann <[hidden email]> wrote: > > > Hi everyone, > > > > I'm wondering what the problem would be if we committed the Pulsar > > connector before the new source interface is ready. If I understood it > > correctly, then we need to support the old source interface anyway for > the > > existing connectors. By checking it in early I could see the benefit that > > our users could start using the connector earlier. Moreover, it would > > prevent that the Pulsar integration is being delayed in case that the > > source interface should be delayed. The only downside I see is the extra > > review effort and potential fixes which might be irrelevant for the new > > source interface implementation. I guess it mainly depends on how certain > > we are when the new source interface will be ready. > > > > Cheers, > > Till > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <[hidden email]> wrote: > > > > > Hi Sijie and Yijie, > > > > > > Thanks for sharing your thoughts. > > > > > > Just want to have some update on FLIP-27. Although the FLIP wiki and > > > discussion thread has been quiet for some time, a few committer / > > > contributors in Flink community were actually prototyping the entire > > thing. > > > We have made some good progress there but want to update the FLIP wiki > > > after the entire thing is verified to work in case there are some last > > > minute surprise in the implementation. I don't have an exact ETA yet, > > but I > > > guess it is going to be within a month or so. > > > > > > I am happy to review the current Flink Pulsar connector and see if it > > would > > > fit in FLIP-27. It would be good to avoid the case that we checked in > the > > > Pulsar connector with some review efforts and shortly after that the > new > > > Source interface is ready. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen <[hidden email]> > > > wrote: > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > As Sijie said, the goal of the connector has always been to provide > > > > users with the latest features of both systems as soon as possible. > We > > > > propose to contribute the connector to Flink and hope to get more > > > > suggestions and feedback from Flink experts to ensure the high > quality > > > > of the connector. > > > > > > > > For FLIP-27, we noticed its existence at the beginning of reworking > > > > the connector implementation based on Flink 1.9; we also wanted to > > > > build a connector that supports both batch and stream computing based > > > > on it. > > > > However, it has been inactive for some time, so we decided to provide > > > > a connector with most of the new features, such as the new type > system > > > > and the new catalog API first. We will pay attention to the progress > > > > of FLIP-27 continually and incorporate it with the connector as soon > > > > as possible. > > > > > > > > Regarding the test status of the connector, we are following the > other > > > > connectors' test in Flink repository and aimed to provide throughout > > > > tests as we could. We are also happy to hear suggestions and > > > > supervision from the Flink community to improve the stability and > > > > performance of the connector continuously. > > > > > > > > Best, > > > > Yijie > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email]> wrote: > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > It seems to me that the main question here is about - "how can the > > > Flink > > > > > community maintain the connector?". > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > 1) I think how and where to host this integration is kind of less > > > > important > > > > > here. I believe there can be many ways to achieve it. > > > > > As part of the contribution, what we are looking for here is how > > these > > > > two > > > > > communities can build the collaboration relationship on developing > > > > > the integration between Pulsar and Flink. Even we can try our best > to > > > > catch > > > > > up all the updates in Flink community. We are still > > > > > facing the fact that we have less experiences in Flink than folks > in > > > > Flink > > > > > community. In order to make sure we maintain and deliver > > > > > a high-quality pulsar-flink integration to the users who use both > > > > > technologies, we need some help from the experts from Flink > > community. > > > > > > > > > > 2) We have been following FLIP-27 for a while. Originally we were > > > > thinking > > > > > of contributing the connectors back after integrating with the > > > > > new API introduced in FLIP-27. But we decided to initiate the > > > > conversation > > > > > as early as possible. Because we believe there are more benefits > > doing > > > > > it now rather than later. As part of contribution, it can help > Flink > > > > > community understand more about Pulsar and the potential > integration > > > > points. > > > > > Also we can also help Flink community verify the new connector API > as > > > > well > > > > > as other new API (e.g. catalog API). > > > > > > > > > > Thanks, > > > > > Sijie > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <[hidden email]> > > > wrote: > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar connector. > > > > > > > > > > > > In general, I think having Pulsar connector with strong support > is > > a > > > > > > valuable addition to Flink. So I am happy the shepherd this > effort. > > > > > > Meanwhile, I would also like to provide some context and recent > > > > efforts on > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > The current way Flink maintains its connector has hit the > > scalability > > > > bar. > > > > > > With more and more connectors coming into Flink repo, we are > > facing a > > > > few > > > > > > problems such as long build and testing time. To address this > > > problem, > > > > we > > > > > > have attempted to do the following: > > > > > > 1. Split out the connectors into a separate repository. This is > > > > temporarily > > > > > > on hold due to potential solution to shorten the build time. > > > > > > 2. Encourage the connectors to stay as ecosystem project while > > Flink > > > > tries > > > > > > to provide good support for functionality and compatibility > tests. > > > > Robert > > > > > > has driven to create a Flink Ecosystem project website and it is > > > going > > > > > > through some final approval process. > > > > > > > > > > > > Given the above efforts, it would be great to first see if we can > > > have > > > > > > Pulsar connector as an ecosystem project with great support. It > > would > > > > be > > > > > > good to hear how the Flink Pulsar connector is tested currently > to > > > see > > > > if > > > > > > we can learn something to maintain it as an ecosystem project > with > > > good > > > > > > quality and test coverage. If the quality as an ecosystem project > > is > > > > hard > > > > > > to guarantee, we may as well adopt it into the main repo. > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are making > changes > > to > > > > the > > > > > > Flink source connector architecture and interface. This change > will > > > > likely > > > > > > land in 1.10. Therefore timing wise, if we are going to have the > > > Pulsar > > > > > > connector in main repo, I am wondering if we should hold a little > > bit > > > > and > > > > > > let the Pulsar connector adapt to the new interface to avoid > > shortly > > > > > > deprecated work? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > [hidden email]> > > > > > > wrote: > > > > > > > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > > > > > > > There were already 2 attempts at contributing a pulsar > connector, > > > > both > > > > > > > of which failed because no committer was getting involved, > > despite > > > > the > > > > > > > contributor opening a dedicated discussion thread about the > > > > contribution > > > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > > > > > We should really make sure that if we welcome/approve such a > > > > > > > contribution it will actually get the attention it deserves. > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the connector > > > outside > > > > of > > > > > > > Flink. We could link to it from the documentation to give it > more > > > > > > exposure. > > > > > > > With the upcoming page for sharing artifacts among the > community > > > > (what's > > > > > > > the state of that anyway?), this may be a better option. > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I think the > > > Pulsar > > > > > > > > connector would be a very valuable addition since Pulsar > > becomes > > > > more > > > > > > and > > > > > > > > more popular and it would further expand Flink's > > > interoperability. > > > > Also > > > > > > > > from a project perspective it makes sense for me to place the > > > > connector > > > > > > > in > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > My main concern/question is how can the Flink community > > maintain > > > > the > > > > > > > > connector? We have seen in the past that connectors are some > of > > > the > > > > > > most > > > > > > > > actively developed components because they need to be kept in > > > sync > > > > with > > > > > > > the > > > > > > > > external system and with Flink. Given that the Pulsar > community > > > is > > > > > > > willing > > > > > > > > to help with maintaining, improving and evolving the > connector, > > > I'm > > > > > > > > optimistic that we can achieve this. Hence, +1 for > contributing > > > it > > > > back > > > > > > > to > > > > > > > > Flink. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo <[hidden email] > > > > > > wrote: > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > >> > > > > > > > >> Since I was the main driver behind FLINK-9641 and > FLINK-9168, > > > let > > > > me > > > > > > > try to > > > > > > > >> add more context on this. > > > > > > > >> > > > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar as > > > > source > > > > > > and > > > > > > > >> sink for Flink. The integration was done with Flink 1.6.0. > We > > > > sent out > > > > > > > pull > > > > > > > >> requests about a year ago and we ended up maintaining those > > > > connectors > > > > > > > in > > > > > > > >> Pulsar for Pulsar users to use Flink to process event > streams > > in > > > > > > Pulsar. > > > > > > > >> (See > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > ). > > > > The > > > > > > > Flink > > > > > > > >> 1.6 integration is pretty simple and there is no schema > > > > > > considerations. > > > > > > > >> > > > > > > > >> In the past year, we have made a lot of changes in Pulsar > and > > > > brought > > > > > > > >> Pulsar schema as the first-class citizen in Pulsar. We also > > > > integrated > > > > > > > with > > > > > > > >> other computing engines for processing Pulsar event streams > > with > > > > > > Pulsar > > > > > > > >> schema. > > > > > > > >> > > > > > > > >> It led us to rethink how to integrate with Flink in the best > > > way. > > > > Then > > > > > > > we > > > > > > > >> reimplement the pulsar-flink connectors from the ground up > > with > > > > schema > > > > > > > and > > > > > > > >> bring table API and catalog API as the first-class citizen > in > > > the > > > > > > > >> integration. With that being said, in the new pulsar-flink > > > > > > > implementation, > > > > > > > >> you can register pulsar as a flink catalog and query / > process > > > the > > > > > > event > > > > > > > >> streams using Flink SQL. > > > > > > > >> > > > > > > > >> This is an example about how to use Pulsar as a Flink > catalog: > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > >> > > > > > > > >> Yijie has also written a blog post explaining why we > > > re-implement > > > > the > > > > > > > flink > > > > > > > >> connector with Flink 1.9 and what are the changes we made in > > the > > > > new > > > > > > > >> connector: > > > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > >> > > > > > > > >> We believe Pulsar is not just a simple data sink or source > for > > > > Flink. > > > > > > It > > > > > > > >> actually can be a fully integrated streaming data storage > for > > > > Flink in > > > > > > > many > > > > > > > >> areas (sink, source, schema/catalog and state). The > > combination > > > of > > > > > > Flink > > > > > > > >> and Pulsar can create a great streaming warehouse > architecture > > > for > > > > > > > >> streaming-first, unified data processing. Since we are > talking > > > to > > > > > > > >> contribute Pulsar integration to Flink here, we are also > > > > dedicated to > > > > > > > >> maintain, improve and evolve the integration with Flink to > > help > > > > the > > > > > > > users > > > > > > > >> who use both Flink and Pulsar. > > > > > > > >> > > > > > > > >> Hope this give you a bit more background about the pulsar > > flink > > > > > > > >> integration. Let me know what are your thoughts. > > > > > > > >> > > > > > > > >> Thanks, > > > > > > > >> Sijie > > > > > > > >> > > > > > > > >> > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang <[hidden email]> > > > > wrote: > > > > > > > >> > > > > > > > >>> Hi Yijie > > > > > > > >>> > > > > > > > >>> I can see that Pulsar becomes more and more popular > recently > > > and > > > > very > > > > > > > >> glad > > > > > > > >>> to see more people willing to contribute to Flink > ecosystem. > > > > > > > >>> > > > > > > > >>> Before any further discussion, would you please give some > > > > explanation > > > > > > > of > > > > > > > >>> the relationship between this thread to current existing > > JIRAs > > > of > > > > > > > pulsar > > > > > > > >>> source [1] and sink [2] connector? Will the contribution > > > contains > > > > > > part > > > > > > > of > > > > > > > >>> those PRs or totally different implementation? > > > > > > > >>> > > > > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > >>> > > > > > > > >>> Best > > > > > > > >>> Yun Tang > > > > > > > >>> ________________________________ > > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > >>> To: [hidden email] <[hidden email]> > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back > to > > > > Flink > > > > > > > >>> > > > > > > > >>> Dear Flink Community! > > > > > > > >>> > > > > > > > >>> I would like to open the discussion of contributing Pulsar > > > Flink > > > > > > > >>> connector [0] back to Flink. > > > > > > > >>> > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > >>> > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance > > > distributed > > > > > > > >>> pub-sub messaging system. Pulsar includes multiple features > > > such > > > > as > > > > > > > >>> native support for multiple clusters in a Pulsar instance, > > with > > > > > > > >>> seamless geo-replication of messages across clusters, very > > low > > > > > > publish > > > > > > > >>> and end-to-end latency, seamless scalability to over a > > million > > > > > > topics, > > > > > > > >>> and guaranteed message delivery with persistent message > > storage > > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been > > > adopted > > > > by > > > > > > > >>> more and more companies[2]. > > > > > > > >>> > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > >>> > > > > > > > >>> The Pulsar Flink connector we are planning to contribute is > > > built > > > > > > upon > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > > > > > >>> - Pulsar as a streaming source with exactly-once guarantee. > > > > > > > >>> - Sink streaming results to Pulsar with at-least-once > > > semantics. > > > > (We > > > > > > > >>> would update this to exactly-once as well when Pulsar gets > > all > > > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > > > >>> - Build upon Flink new Table API Type system (FLIP-37[3]), > > and > > > > can > > > > > > > >>> automatically (de)serialize messages with the help of > Pulsar > > > > schema. > > > > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), which > > > > enables > > > > > > the > > > > > > > >>> use of Pulsar topics as tables in Table API as well as SQL > > > > client. > > > > > > > >>> > > > > > > > >>> ## Reference > > > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > >>> [3] > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > >>> [4] > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > >>> > > > > > > > >>> Best, > > > > > > > >>> Yijie Shen > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Thank you Bowen and Becket.
What's the take from Flink community? Shall we wait for FLIP-27 or shall we proceed to next steps? And what the next steps are? :-) Thanks, Sijie On Thu, Sep 5, 2019 at 2:43 PM Bowen Li <[hidden email]> wrote: > Hi, > > I think having a Pulsar connector in Flink can be a good mutual benefit to > both communities. > > Another perspective is that Pulsar connector is the 1st streaming connector > that integrates with Flink's metadata management system and Catalog APIs. > It'll be cool to see how the integration turns out and whether we need to > improve Flink Catalog stack, which are currently in Beta, to cater to > streaming source/sink. Thus I'm in favor of merging Pulsar connector into > Flink 1.10. > > I'd suggest to submit smaller sized PRs, e.g. maybe one for basic > source/sink functionalities and another for schema and catalog integration, > just to make them easier to review. > > It doesn't seem to hurt to wait for FLIP-27. But I don't think FLIP-27 > should be a blocker in cases where it cannot make its way into 1.10 or > doesn't leave reasonable amount of time for committers to review or for > Pulsar connector to fully adapt to new interfaces. > > Bowen > > > > On Thu, Sep 5, 2019 at 3:21 AM Becket Qin <[hidden email]> wrote: > > > Hi Till, > > > > You are right. It all depends on when the new source interface is going > to > > be ready. Personally I think it would be there in about a month or so. > But > > I could be too optimistic. It would also be good to hear what do Aljoscha > > and Stephan think as they are also involved in FLIP-27. > > > > In general I think we should have Pulsar connector in Flink 1.10, > > preferably with the new source interface. We can also check it in right > now > > with old source interface, but I suspect few users will use it before the > > next official release. Therefore, it seems reasonable to wait a little > bit > > to see whether we can jump to the new source interface. As long as we > make > > sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann <[hidden email]> > wrote: > > > > > Hi everyone, > > > > > > I'm wondering what the problem would be if we committed the Pulsar > > > connector before the new source interface is ready. If I understood it > > > correctly, then we need to support the old source interface anyway for > > the > > > existing connectors. By checking it in early I could see the benefit > that > > > our users could start using the connector earlier. Moreover, it would > > > prevent that the Pulsar integration is being delayed in case that the > > > source interface should be delayed. The only downside I see is the > extra > > > review effort and potential fixes which might be irrelevant for the new > > > source interface implementation. I guess it mainly depends on how > certain > > > we are when the new source interface will be ready. > > > > > > Cheers, > > > Till > > > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <[hidden email]> > wrote: > > > > > > > Hi Sijie and Yijie, > > > > > > > > Thanks for sharing your thoughts. > > > > > > > > Just want to have some update on FLIP-27. Although the FLIP wiki and > > > > discussion thread has been quiet for some time, a few committer / > > > > contributors in Flink community were actually prototyping the entire > > > thing. > > > > We have made some good progress there but want to update the FLIP > wiki > > > > after the entire thing is verified to work in case there are some > last > > > > minute surprise in the implementation. I don't have an exact ETA yet, > > > but I > > > > guess it is going to be within a month or so. > > > > > > > > I am happy to review the current Flink Pulsar connector and see if it > > > would > > > > fit in FLIP-27. It would be good to avoid the case that we checked in > > the > > > > Pulsar connector with some review efforts and shortly after that the > > new > > > > Source interface is ready. > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen <[hidden email] > > > > > > wrote: > > > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > > > As Sijie said, the goal of the connector has always been to provide > > > > > users with the latest features of both systems as soon as possible. > > We > > > > > propose to contribute the connector to Flink and hope to get more > > > > > suggestions and feedback from Flink experts to ensure the high > > quality > > > > > of the connector. > > > > > > > > > > For FLIP-27, we noticed its existence at the beginning of reworking > > > > > the connector implementation based on Flink 1.9; we also wanted to > > > > > build a connector that supports both batch and stream computing > based > > > > > on it. > > > > > However, it has been inactive for some time, so we decided to > provide > > > > > a connector with most of the new features, such as the new type > > system > > > > > and the new catalog API first. We will pay attention to the > progress > > > > > of FLIP-27 continually and incorporate it with the connector as > soon > > > > > as possible. > > > > > > > > > > Regarding the test status of the connector, we are following the > > other > > > > > connectors' test in Flink repository and aimed to provide > throughout > > > > > tests as we could. We are also happy to hear suggestions and > > > > > supervision from the Flink community to improve the stability and > > > > > performance of the connector continuously. > > > > > > > > > > Best, > > > > > Yijie > > > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email]> > wrote: > > > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > > > It seems to me that the main question here is about - "how can > the > > > > Flink > > > > > > community maintain the connector?". > > > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > > > 1) I think how and where to host this integration is kind of less > > > > > important > > > > > > here. I believe there can be many ways to achieve it. > > > > > > As part of the contribution, what we are looking for here is how > > > these > > > > > two > > > > > > communities can build the collaboration relationship on > developing > > > > > > the integration between Pulsar and Flink. Even we can try our > best > > to > > > > > catch > > > > > > up all the updates in Flink community. We are still > > > > > > facing the fact that we have less experiences in Flink than folks > > in > > > > > Flink > > > > > > community. In order to make sure we maintain and deliver > > > > > > a high-quality pulsar-flink integration to the users who use both > > > > > > technologies, we need some help from the experts from Flink > > > community. > > > > > > > > > > > > 2) We have been following FLIP-27 for a while. Originally we were > > > > > thinking > > > > > > of contributing the connectors back after integrating with the > > > > > > new API introduced in FLIP-27. But we decided to initiate the > > > > > conversation > > > > > > as early as possible. Because we believe there are more benefits > > > doing > > > > > > it now rather than later. As part of contribution, it can help > > Flink > > > > > > community understand more about Pulsar and the potential > > integration > > > > > points. > > > > > > Also we can also help Flink community verify the new connector > API > > as > > > > > well > > > > > > as other new API (e.g. catalog API). > > > > > > > > > > > > Thanks, > > > > > > Sijie > > > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin <[hidden email]> > > > > wrote: > > > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar connector. > > > > > > > > > > > > > > In general, I think having Pulsar connector with strong support > > is > > > a > > > > > > > valuable addition to Flink. So I am happy the shepherd this > > effort. > > > > > > > Meanwhile, I would also like to provide some context and recent > > > > > efforts on > > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > > > The current way Flink maintains its connector has hit the > > > scalability > > > > > bar. > > > > > > > With more and more connectors coming into Flink repo, we are > > > facing a > > > > > few > > > > > > > problems such as long build and testing time. To address this > > > > problem, > > > > > we > > > > > > > have attempted to do the following: > > > > > > > 1. Split out the connectors into a separate repository. This is > > > > > temporarily > > > > > > > on hold due to potential solution to shorten the build time. > > > > > > > 2. Encourage the connectors to stay as ecosystem project while > > > Flink > > > > > tries > > > > > > > to provide good support for functionality and compatibility > > tests. > > > > > Robert > > > > > > > has driven to create a Flink Ecosystem project website and it > is > > > > going > > > > > > > through some final approval process. > > > > > > > > > > > > > > Given the above efforts, it would be great to first see if we > can > > > > have > > > > > > > Pulsar connector as an ecosystem project with great support. It > > > would > > > > > be > > > > > > > good to hear how the Flink Pulsar connector is tested currently > > to > > > > see > > > > > if > > > > > > > we can learn something to maintain it as an ecosystem project > > with > > > > good > > > > > > > quality and test coverage. If the quality as an ecosystem > project > > > is > > > > > hard > > > > > > > to guarantee, we may as well adopt it into the main repo. > > > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are making > > changes > > > to > > > > > the > > > > > > > Flink source connector architecture and interface. This change > > will > > > > > likely > > > > > > > land in 1.10. Therefore timing wise, if we are going to have > the > > > > Pulsar > > > > > > > connector in main repo, I am wondering if we should hold a > little > > > bit > > > > > and > > > > > > > let the Pulsar connector adapt to the new interface to avoid > > > shortly > > > > > > > deprecated work? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > > [hidden email]> > > > > > > > wrote: > > > > > > > > > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > > > > > > > > > There were already 2 attempts at contributing a pulsar > > connector, > > > > > both > > > > > > > > of which failed because no committer was getting involved, > > > despite > > > > > the > > > > > > > > contributor opening a dedicated discussion thread about the > > > > > contribution > > > > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > > > > > > > We should really make sure that if we welcome/approve such a > > > > > > > > contribution it will actually get the attention it deserves. > > > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the connector > > > > outside > > > > > of > > > > > > > > Flink. We could link to it from the documentation to give it > > more > > > > > > > exposure. > > > > > > > > With the upcoming page for sharing artifacts among the > > community > > > > > (what's > > > > > > > > the state of that anyway?), this may be a better option. > > > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I think > the > > > > Pulsar > > > > > > > > > connector would be a very valuable addition since Pulsar > > > becomes > > > > > more > > > > > > > and > > > > > > > > > more popular and it would further expand Flink's > > > > interoperability. > > > > > Also > > > > > > > > > from a project perspective it makes sense for me to place > the > > > > > connector > > > > > > > > in > > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > > > My main concern/question is how can the Flink community > > > maintain > > > > > the > > > > > > > > > connector? We have seen in the past that connectors are > some > > of > > > > the > > > > > > > most > > > > > > > > > actively developed components because they need to be kept > in > > > > sync > > > > > with > > > > > > > > the > > > > > > > > > external system and with Flink. Given that the Pulsar > > community > > > > is > > > > > > > > willing > > > > > > > > > to help with maintaining, improving and evolving the > > connector, > > > > I'm > > > > > > > > > optimistic that we can achieve this. Hence, +1 for > > contributing > > > > it > > > > > back > > > > > > > > to > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo < > [hidden email] > > > > > > > > wrote: > > > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > > >> > > > > > > > > >> Since I was the main driver behind FLINK-9641 and > > FLINK-9168, > > > > let > > > > > me > > > > > > > > try to > > > > > > > > >> add more context on this. > > > > > > > > >> > > > > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing Pulsar > as > > > > > source > > > > > > > and > > > > > > > > >> sink for Flink. The integration was done with Flink 1.6.0. > > We > > > > > sent out > > > > > > > > pull > > > > > > > > >> requests about a year ago and we ended up maintaining > those > > > > > connectors > > > > > > > > in > > > > > > > > >> Pulsar for Pulsar users to use Flink to process event > > streams > > > in > > > > > > > Pulsar. > > > > > > > > >> (See > > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > > ). > > > > > The > > > > > > > > Flink > > > > > > > > >> 1.6 integration is pretty simple and there is no schema > > > > > > > considerations. > > > > > > > > >> > > > > > > > > >> In the past year, we have made a lot of changes in Pulsar > > and > > > > > brought > > > > > > > > >> Pulsar schema as the first-class citizen in Pulsar. We > also > > > > > integrated > > > > > > > > with > > > > > > > > >> other computing engines for processing Pulsar event > streams > > > with > > > > > > > Pulsar > > > > > > > > >> schema. > > > > > > > > >> > > > > > > > > >> It led us to rethink how to integrate with Flink in the > best > > > > way. > > > > > Then > > > > > > > > we > > > > > > > > >> reimplement the pulsar-flink connectors from the ground up > > > with > > > > > schema > > > > > > > > and > > > > > > > > >> bring table API and catalog API as the first-class citizen > > in > > > > the > > > > > > > > >> integration. With that being said, in the new pulsar-flink > > > > > > > > implementation, > > > > > > > > >> you can register pulsar as a flink catalog and query / > > process > > > > the > > > > > > > event > > > > > > > > >> streams using Flink SQL. > > > > > > > > >> > > > > > > > > >> This is an example about how to use Pulsar as a Flink > > catalog: > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > > >> > > > > > > > > >> Yijie has also written a blog post explaining why we > > > > re-implement > > > > > the > > > > > > > > flink > > > > > > > > >> connector with Flink 1.9 and what are the changes we made > in > > > the > > > > > new > > > > > > > > >> connector: > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > > >> > > > > > > > > >> We believe Pulsar is not just a simple data sink or source > > for > > > > > Flink. > > > > > > > It > > > > > > > > >> actually can be a fully integrated streaming data storage > > for > > > > > Flink in > > > > > > > > many > > > > > > > > >> areas (sink, source, schema/catalog and state). The > > > combination > > > > of > > > > > > > Flink > > > > > > > > >> and Pulsar can create a great streaming warehouse > > architecture > > > > for > > > > > > > > >> streaming-first, unified data processing. Since we are > > talking > > > > to > > > > > > > > >> contribute Pulsar integration to Flink here, we are also > > > > > dedicated to > > > > > > > > >> maintain, improve and evolve the integration with Flink to > > > help > > > > > the > > > > > > > > users > > > > > > > > >> who use both Flink and Pulsar. > > > > > > > > >> > > > > > > > > >> Hope this give you a bit more background about the pulsar > > > flink > > > > > > > > >> integration. Let me know what are your thoughts. > > > > > > > > >> > > > > > > > > >> Thanks, > > > > > > > > >> Sijie > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang < > [hidden email]> > > > > > wrote: > > > > > > > > >> > > > > > > > > >>> Hi Yijie > > > > > > > > >>> > > > > > > > > >>> I can see that Pulsar becomes more and more popular > > recently > > > > and > > > > > very > > > > > > > > >> glad > > > > > > > > >>> to see more people willing to contribute to Flink > > ecosystem. > > > > > > > > >>> > > > > > > > > >>> Before any further discussion, would you please give some > > > > > explanation > > > > > > > > of > > > > > > > > >>> the relationship between this thread to current existing > > > JIRAs > > > > of > > > > > > > > pulsar > > > > > > > > >>> source [1] and sink [2] connector? Will the contribution > > > > contains > > > > > > > part > > > > > > > > of > > > > > > > > >>> those PRs or totally different implementation? > > > > > > > > >>> > > > > > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > > >>> > > > > > > > > >>> Best > > > > > > > > >>> Yun Tang > > > > > > > > >>> ________________________________ > > > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > > >>> To: [hidden email] <[hidden email]> > > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector back > > to > > > > > Flink > > > > > > > > >>> > > > > > > > > >>> Dear Flink Community! > > > > > > > > >>> > > > > > > > > >>> I would like to open the discussion of contributing > Pulsar > > > > Flink > > > > > > > > >>> connector [0] back to Flink. > > > > > > > > >>> > > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > > >>> > > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance > > > > distributed > > > > > > > > >>> pub-sub messaging system. Pulsar includes multiple > features > > > > such > > > > > as > > > > > > > > >>> native support for multiple clusters in a Pulsar > instance, > > > with > > > > > > > > >>> seamless geo-replication of messages across clusters, > very > > > low > > > > > > > publish > > > > > > > > >>> and end-to-end latency, seamless scalability to over a > > > million > > > > > > > topics, > > > > > > > > >>> and guaranteed message delivery with persistent message > > > storage > > > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has been > > > > adopted > > > > > by > > > > > > > > >>> more and more companies[2]. > > > > > > > > >>> > > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > > >>> > > > > > > > > >>> The Pulsar Flink connector we are planning to contribute > is > > > > built > > > > > > > upon > > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > > > > > > >>> - Pulsar as a streaming source with exactly-once > guarantee. > > > > > > > > >>> - Sink streaming results to Pulsar with at-least-once > > > > semantics. > > > > > (We > > > > > > > > >>> would update this to exactly-once as well when Pulsar > gets > > > all > > > > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > > > > >>> - Build upon Flink new Table API Type system > (FLIP-37[3]), > > > and > > > > > can > > > > > > > > >>> automatically (de)serialize messages with the help of > > Pulsar > > > > > schema. > > > > > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), > which > > > > > enables > > > > > > > the > > > > > > > > >>> use of Pulsar topics as tables in Table API as well as > SQL > > > > > client. > > > > > > > > >>> > > > > > > > > >>> ## Reference > > > > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > > >>> [3] > > > > > > > > >>> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > > >>> [4] > > > > > > > > >>> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > > >>> > > > > > > > > >>> Best, > > > > > > > > >>> Yijie Shen > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi Sijie,
If we agree that the goal is to have Pulsar connector in 1.10, how about we do the following: 0. Start a FLIP to add Pulsar connector to Flink main repo as it is a new public interface to Flink main repo. 1. Start to review the Pulsar sink right away as there is no change to the sink interface so far. 2. Wait a little bit on FLIP-27. Flink 1.10 is going to be code freeze in late Nov and let's say we give a month to the development and review of Pulsar connector, we need to have FLIP-27 by late Oct. There are still 7 weeks. Personally I think it is doable. If FLIP-27 is not ready by late Oct, we can review and check in Pulsar connector with the existing source interface. This means we will have Pulsar connector in Flink 1.10, either with or without FLIP-27. Because we are going to have Pulsar sink and source checked in separately, it might make sense to have two FLIPs, one for Pulsar sink and another for Pulsar source. And we can start the work on Pulsar sink right away. Thanks, Jiangjie (Becket) Qin On Mon, Sep 9, 2019 at 4:13 PM Sijie Guo <[hidden email]> wrote: > Thank you Bowen and Becket. > > What's the take from Flink community? Shall we wait for FLIP-27 or shall we > proceed to next steps? And what the next steps are? :-) > > Thanks, > Sijie > > On Thu, Sep 5, 2019 at 2:43 PM Bowen Li <[hidden email]> wrote: > > > Hi, > > > > I think having a Pulsar connector in Flink can be a good mutual benefit > to > > both communities. > > > > Another perspective is that Pulsar connector is the 1st streaming > connector > > that integrates with Flink's metadata management system and Catalog APIs. > > It'll be cool to see how the integration turns out and whether we need to > > improve Flink Catalog stack, which are currently in Beta, to cater to > > streaming source/sink. Thus I'm in favor of merging Pulsar connector into > > Flink 1.10. > > > > I'd suggest to submit smaller sized PRs, e.g. maybe one for basic > > source/sink functionalities and another for schema and catalog > integration, > > just to make them easier to review. > > > > It doesn't seem to hurt to wait for FLIP-27. But I don't think FLIP-27 > > should be a blocker in cases where it cannot make its way into 1.10 or > > doesn't leave reasonable amount of time for committers to review or for > > Pulsar connector to fully adapt to new interfaces. > > > > Bowen > > > > > > > > On Thu, Sep 5, 2019 at 3:21 AM Becket Qin <[hidden email]> wrote: > > > > > Hi Till, > > > > > > You are right. It all depends on when the new source interface is going > > to > > > be ready. Personally I think it would be there in about a month or so. > > But > > > I could be too optimistic. It would also be good to hear what do > Aljoscha > > > and Stephan think as they are also involved in FLIP-27. > > > > > > In general I think we should have Pulsar connector in Flink 1.10, > > > preferably with the new source interface. We can also check it in right > > now > > > with old source interface, but I suspect few users will use it before > the > > > next official release. Therefore, it seems reasonable to wait a little > > bit > > > to see whether we can jump to the new source interface. As long as we > > make > > > sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann <[hidden email]> > > wrote: > > > > > > > Hi everyone, > > > > > > > > I'm wondering what the problem would be if we committed the Pulsar > > > > connector before the new source interface is ready. If I understood > it > > > > correctly, then we need to support the old source interface anyway > for > > > the > > > > existing connectors. By checking it in early I could see the benefit > > that > > > > our users could start using the connector earlier. Moreover, it would > > > > prevent that the Pulsar integration is being delayed in case that the > > > > source interface should be delayed. The only downside I see is the > > extra > > > > review effort and potential fixes which might be irrelevant for the > new > > > > source interface implementation. I guess it mainly depends on how > > certain > > > > we are when the new source interface will be ready. > > > > > > > > Cheers, > > > > Till > > > > > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <[hidden email]> > > wrote: > > > > > > > > > Hi Sijie and Yijie, > > > > > > > > > > Thanks for sharing your thoughts. > > > > > > > > > > Just want to have some update on FLIP-27. Although the FLIP wiki > and > > > > > discussion thread has been quiet for some time, a few committer / > > > > > contributors in Flink community were actually prototyping the > entire > > > > thing. > > > > > We have made some good progress there but want to update the FLIP > > wiki > > > > > after the entire thing is verified to work in case there are some > > last > > > > > minute surprise in the implementation. I don't have an exact ETA > yet, > > > > but I > > > > > guess it is going to be within a month or so. > > > > > > > > > > I am happy to review the current Flink Pulsar connector and see if > it > > > > would > > > > > fit in FLIP-27. It would be good to avoid the case that we checked > in > > > the > > > > > Pulsar connector with some review efforts and shortly after that > the > > > new > > > > > Source interface is ready. > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen < > [hidden email] > > > > > > > > wrote: > > > > > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > > > > > As Sijie said, the goal of the connector has always been to > provide > > > > > > users with the latest features of both systems as soon as > possible. > > > We > > > > > > propose to contribute the connector to Flink and hope to get more > > > > > > suggestions and feedback from Flink experts to ensure the high > > > quality > > > > > > of the connector. > > > > > > > > > > > > For FLIP-27, we noticed its existence at the beginning of > reworking > > > > > > the connector implementation based on Flink 1.9; we also wanted > to > > > > > > build a connector that supports both batch and stream computing > > based > > > > > > on it. > > > > > > However, it has been inactive for some time, so we decided to > > provide > > > > > > a connector with most of the new features, such as the new type > > > system > > > > > > and the new catalog API first. We will pay attention to the > > progress > > > > > > of FLIP-27 continually and incorporate it with the connector as > > soon > > > > > > as possible. > > > > > > > > > > > > Regarding the test status of the connector, we are following the > > > other > > > > > > connectors' test in Flink repository and aimed to provide > > throughout > > > > > > tests as we could. We are also happy to hear suggestions and > > > > > > supervision from the Flink community to improve the stability and > > > > > > performance of the connector continuously. > > > > > > > > > > > > Best, > > > > > > Yijie > > > > > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email]> > > wrote: > > > > > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > > > > > It seems to me that the main question here is about - "how can > > the > > > > > Flink > > > > > > > community maintain the connector?". > > > > > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > > > > > 1) I think how and where to host this integration is kind of > less > > > > > > important > > > > > > > here. I believe there can be many ways to achieve it. > > > > > > > As part of the contribution, what we are looking for here is > how > > > > these > > > > > > two > > > > > > > communities can build the collaboration relationship on > > developing > > > > > > > the integration between Pulsar and Flink. Even we can try our > > best > > > to > > > > > > catch > > > > > > > up all the updates in Flink community. We are still > > > > > > > facing the fact that we have less experiences in Flink than > folks > > > in > > > > > > Flink > > > > > > > community. In order to make sure we maintain and deliver > > > > > > > a high-quality pulsar-flink integration to the users who use > both > > > > > > > technologies, we need some help from the experts from Flink > > > > community. > > > > > > > > > > > > > > 2) We have been following FLIP-27 for a while. Originally we > were > > > > > > thinking > > > > > > > of contributing the connectors back after integrating with the > > > > > > > new API introduced in FLIP-27. But we decided to initiate the > > > > > > conversation > > > > > > > as early as possible. Because we believe there are more > benefits > > > > doing > > > > > > > it now rather than later. As part of contribution, it can help > > > Flink > > > > > > > community understand more about Pulsar and the potential > > > integration > > > > > > points. > > > > > > > Also we can also help Flink community verify the new connector > > API > > > as > > > > > > well > > > > > > > as other new API (e.g. catalog API). > > > > > > > > > > > > > > Thanks, > > > > > > > Sijie > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin < > [hidden email]> > > > > > wrote: > > > > > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar connector. > > > > > > > > > > > > > > > > In general, I think having Pulsar connector with strong > support > > > is > > > > a > > > > > > > > valuable addition to Flink. So I am happy the shepherd this > > > effort. > > > > > > > > Meanwhile, I would also like to provide some context and > recent > > > > > > efforts on > > > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > > > > > The current way Flink maintains its connector has hit the > > > > scalability > > > > > > bar. > > > > > > > > With more and more connectors coming into Flink repo, we are > > > > facing a > > > > > > few > > > > > > > > problems such as long build and testing time. To address this > > > > > problem, > > > > > > we > > > > > > > > have attempted to do the following: > > > > > > > > 1. Split out the connectors into a separate repository. This > is > > > > > > temporarily > > > > > > > > on hold due to potential solution to shorten the build time. > > > > > > > > 2. Encourage the connectors to stay as ecosystem project > while > > > > Flink > > > > > > tries > > > > > > > > to provide good support for functionality and compatibility > > > tests. > > > > > > Robert > > > > > > > > has driven to create a Flink Ecosystem project website and it > > is > > > > > going > > > > > > > > through some final approval process. > > > > > > > > > > > > > > > > Given the above efforts, it would be great to first see if we > > can > > > > > have > > > > > > > > Pulsar connector as an ecosystem project with great support. > It > > > > would > > > > > > be > > > > > > > > good to hear how the Flink Pulsar connector is tested > currently > > > to > > > > > see > > > > > > if > > > > > > > > we can learn something to maintain it as an ecosystem project > > > with > > > > > good > > > > > > > > quality and test coverage. If the quality as an ecosystem > > project > > > > is > > > > > > hard > > > > > > > > to guarantee, we may as well adopt it into the main repo. > > > > > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are making > > > changes > > > > to > > > > > > the > > > > > > > > Flink source connector architecture and interface. This > change > > > will > > > > > > likely > > > > > > > > land in 1.10. Therefore timing wise, if we are going to have > > the > > > > > Pulsar > > > > > > > > connector in main repo, I am wondering if we should hold a > > little > > > > bit > > > > > > and > > > > > > > > let the Pulsar connector adapt to the new interface to avoid > > > > shortly > > > > > > > > deprecated work? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > > > [hidden email]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > > > > > > > > > > > There were already 2 attempts at contributing a pulsar > > > connector, > > > > > > both > > > > > > > > > of which failed because no committer was getting involved, > > > > despite > > > > > > the > > > > > > > > > contributor opening a dedicated discussion thread about the > > > > > > contribution > > > > > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > > > > > > > > > We should really make sure that if we welcome/approve such > a > > > > > > > > > contribution it will actually get the attention it > deserves. > > > > > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the > connector > > > > > outside > > > > > > of > > > > > > > > > Flink. We could link to it from the documentation to give > it > > > more > > > > > > > > exposure. > > > > > > > > > With the upcoming page for sharing artifacts among the > > > community > > > > > > (what's > > > > > > > > > the state of that anyway?), this may be a better option. > > > > > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I think > > the > > > > > Pulsar > > > > > > > > > > connector would be a very valuable addition since Pulsar > > > > becomes > > > > > > more > > > > > > > > and > > > > > > > > > > more popular and it would further expand Flink's > > > > > interoperability. > > > > > > Also > > > > > > > > > > from a project perspective it makes sense for me to place > > the > > > > > > connector > > > > > > > > > in > > > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > > > > > My main concern/question is how can the Flink community > > > > maintain > > > > > > the > > > > > > > > > > connector? We have seen in the past that connectors are > > some > > > of > > > > > the > > > > > > > > most > > > > > > > > > > actively developed components because they need to be > kept > > in > > > > > sync > > > > > > with > > > > > > > > > the > > > > > > > > > > external system and with Flink. Given that the Pulsar > > > community > > > > > is > > > > > > > > > willing > > > > > > > > > > to help with maintaining, improving and evolving the > > > connector, > > > > > I'm > > > > > > > > > > optimistic that we can achieve this. Hence, +1 for > > > contributing > > > > > it > > > > > > back > > > > > > > > > to > > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo < > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > > > >> > > > > > > > > > >> Since I was the main driver behind FLINK-9641 and > > > FLINK-9168, > > > > > let > > > > > > me > > > > > > > > > try to > > > > > > > > > >> add more context on this. > > > > > > > > > >> > > > > > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing > Pulsar > > as > > > > > > source > > > > > > > > and > > > > > > > > > >> sink for Flink. The integration was done with Flink > 1.6.0. > > > We > > > > > > sent out > > > > > > > > > pull > > > > > > > > > >> requests about a year ago and we ended up maintaining > > those > > > > > > connectors > > > > > > > > > in > > > > > > > > > >> Pulsar for Pulsar users to use Flink to process event > > > streams > > > > in > > > > > > > > Pulsar. > > > > > > > > > >> (See > > > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > > > ). > > > > > > The > > > > > > > > > Flink > > > > > > > > > >> 1.6 integration is pretty simple and there is no schema > > > > > > > > considerations. > > > > > > > > > >> > > > > > > > > > >> In the past year, we have made a lot of changes in > Pulsar > > > and > > > > > > brought > > > > > > > > > >> Pulsar schema as the first-class citizen in Pulsar. We > > also > > > > > > integrated > > > > > > > > > with > > > > > > > > > >> other computing engines for processing Pulsar event > > streams > > > > with > > > > > > > > Pulsar > > > > > > > > > >> schema. > > > > > > > > > >> > > > > > > > > > >> It led us to rethink how to integrate with Flink in the > > best > > > > > way. > > > > > > Then > > > > > > > > > we > > > > > > > > > >> reimplement the pulsar-flink connectors from the ground > up > > > > with > > > > > > schema > > > > > > > > > and > > > > > > > > > >> bring table API and catalog API as the first-class > citizen > > > in > > > > > the > > > > > > > > > >> integration. With that being said, in the new > pulsar-flink > > > > > > > > > implementation, > > > > > > > > > >> you can register pulsar as a flink catalog and query / > > > process > > > > > the > > > > > > > > event > > > > > > > > > >> streams using Flink SQL. > > > > > > > > > >> > > > > > > > > > >> This is an example about how to use Pulsar as a Flink > > > catalog: > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > > > >> > > > > > > > > > >> Yijie has also written a blog post explaining why we > > > > > re-implement > > > > > > the > > > > > > > > > flink > > > > > > > > > >> connector with Flink 1.9 and what are the changes we > made > > in > > > > the > > > > > > new > > > > > > > > > >> connector: > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > > > >> > > > > > > > > > >> We believe Pulsar is not just a simple data sink or > source > > > for > > > > > > Flink. > > > > > > > > It > > > > > > > > > >> actually can be a fully integrated streaming data > storage > > > for > > > > > > Flink in > > > > > > > > > many > > > > > > > > > >> areas (sink, source, schema/catalog and state). The > > > > combination > > > > > of > > > > > > > > Flink > > > > > > > > > >> and Pulsar can create a great streaming warehouse > > > architecture > > > > > for > > > > > > > > > >> streaming-first, unified data processing. Since we are > > > talking > > > > > to > > > > > > > > > >> contribute Pulsar integration to Flink here, we are also > > > > > > dedicated to > > > > > > > > > >> maintain, improve and evolve the integration with Flink > to > > > > help > > > > > > the > > > > > > > > > users > > > > > > > > > >> who use both Flink and Pulsar. > > > > > > > > > >> > > > > > > > > > >> Hope this give you a bit more background about the > pulsar > > > > flink > > > > > > > > > >> integration. Let me know what are your thoughts. > > > > > > > > > >> > > > > > > > > > >> Thanks, > > > > > > > > > >> Sijie > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang < > > [hidden email]> > > > > > > wrote: > > > > > > > > > >> > > > > > > > > > >>> Hi Yijie > > > > > > > > > >>> > > > > > > > > > >>> I can see that Pulsar becomes more and more popular > > > recently > > > > > and > > > > > > very > > > > > > > > > >> glad > > > > > > > > > >>> to see more people willing to contribute to Flink > > > ecosystem. > > > > > > > > > >>> > > > > > > > > > >>> Before any further discussion, would you please give > some > > > > > > explanation > > > > > > > > > of > > > > > > > > > >>> the relationship between this thread to current > existing > > > > JIRAs > > > > > of > > > > > > > > > pulsar > > > > > > > > > >>> source [1] and sink [2] connector? Will the > contribution > > > > > contains > > > > > > > > part > > > > > > > > > of > > > > > > > > > >>> those PRs or totally different implementation? > > > > > > > > > >>> > > > > > > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > > > >>> > > > > > > > > > >>> Best > > > > > > > > > >>> Yun Tang > > > > > > > > > >>> ________________________________ > > > > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > > > >>> To: [hidden email] <[hidden email]> > > > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector > back > > > to > > > > > > Flink > > > > > > > > > >>> > > > > > > > > > >>> Dear Flink Community! > > > > > > > > > >>> > > > > > > > > > >>> I would like to open the discussion of contributing > > Pulsar > > > > > Flink > > > > > > > > > >>> connector [0] back to Flink. > > > > > > > > > >>> > > > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > > > >>> > > > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance > > > > > distributed > > > > > > > > > >>> pub-sub messaging system. Pulsar includes multiple > > features > > > > > such > > > > > > as > > > > > > > > > >>> native support for multiple clusters in a Pulsar > > instance, > > > > with > > > > > > > > > >>> seamless geo-replication of messages across clusters, > > very > > > > low > > > > > > > > publish > > > > > > > > > >>> and end-to-end latency, seamless scalability to over a > > > > million > > > > > > > > topics, > > > > > > > > > >>> and guaranteed message delivery with persistent message > > > > storage > > > > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has > been > > > > > adopted > > > > > > by > > > > > > > > > >>> more and more companies[2]. > > > > > > > > > >>> > > > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > > > >>> > > > > > > > > > >>> The Pulsar Flink connector we are planning to > contribute > > is > > > > > built > > > > > > > > upon > > > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > > > > > > > >>> - Pulsar as a streaming source with exactly-once > > guarantee. > > > > > > > > > >>> - Sink streaming results to Pulsar with at-least-once > > > > > semantics. > > > > > > (We > > > > > > > > > >>> would update this to exactly-once as well when Pulsar > > gets > > > > all > > > > > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > > > > > >>> - Build upon Flink new Table API Type system > > (FLIP-37[3]), > > > > and > > > > > > can > > > > > > > > > >>> automatically (de)serialize messages with the help of > > > Pulsar > > > > > > schema. > > > > > > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), > > which > > > > > > enables > > > > > > > > the > > > > > > > > > >>> use of Pulsar topics as tables in Table API as well as > > SQL > > > > > > client. > > > > > > > > > >>> > > > > > > > > > >>> ## Reference > > > > > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > > > >>> [3] > > > > > > > > > >>> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > > > >>> [4] > > > > > > > > > >>> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > > > >>> > > > > > > > > > >>> Best, > > > > > > > > > >>> Yijie Shen > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi everyone!
Thanks for your attention and the promotion of this work. We will prepare a FLIP as soon as possible for more specific discussions. For FLIP-27, it seems that we have not reached a consensus. Therefore, I will explain all the functionalities of the existing connector in the FLIP (including Source, Sink, and Catalog) to continue our discussions in FLIP. Thanks for your kind help. Best, Yijie On Tue, Sep 10, 2019 at 9:57 AM Becket Qin <[hidden email]> wrote: > > Hi Sijie, > > If we agree that the goal is to have Pulsar connector in 1.10, how about we > do the following: > > 0. Start a FLIP to add Pulsar connector to Flink main repo as it is a new > public interface to Flink main repo. > 1. Start to review the Pulsar sink right away as there is no change to the > sink interface so far. > 2. Wait a little bit on FLIP-27. Flink 1.10 is going to be code freeze in > late Nov and let's say we give a month to the development and review of > Pulsar connector, we need to have FLIP-27 by late Oct. There are still 7 > weeks. Personally I think it is doable. If FLIP-27 is not ready by late > Oct, we can review and check in Pulsar connector with the existing source > interface. This means we will have Pulsar connector in Flink 1.10, either > with or without FLIP-27. > > Because we are going to have Pulsar sink and source checked in separately, > it might make sense to have two FLIPs, one for Pulsar sink and another for > Pulsar source. And we can start the work on Pulsar sink right away. > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Sep 9, 2019 at 4:13 PM Sijie Guo <[hidden email]> wrote: > > > Thank you Bowen and Becket. > > > > What's the take from Flink community? Shall we wait for FLIP-27 or shall we > > proceed to next steps? And what the next steps are? :-) > > > > Thanks, > > Sijie > > > > On Thu, Sep 5, 2019 at 2:43 PM Bowen Li <[hidden email]> wrote: > > > > > Hi, > > > > > > I think having a Pulsar connector in Flink can be a good mutual benefit > > to > > > both communities. > > > > > > Another perspective is that Pulsar connector is the 1st streaming > > connector > > > that integrates with Flink's metadata management system and Catalog APIs. > > > It'll be cool to see how the integration turns out and whether we need to > > > improve Flink Catalog stack, which are currently in Beta, to cater to > > > streaming source/sink. Thus I'm in favor of merging Pulsar connector into > > > Flink 1.10. > > > > > > I'd suggest to submit smaller sized PRs, e.g. maybe one for basic > > > source/sink functionalities and another for schema and catalog > > integration, > > > just to make them easier to review. > > > > > > It doesn't seem to hurt to wait for FLIP-27. But I don't think FLIP-27 > > > should be a blocker in cases where it cannot make its way into 1.10 or > > > doesn't leave reasonable amount of time for committers to review or for > > > Pulsar connector to fully adapt to new interfaces. > > > > > > Bowen > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:21 AM Becket Qin <[hidden email]> wrote: > > > > > > > Hi Till, > > > > > > > > You are right. It all depends on when the new source interface is going > > > to > > > > be ready. Personally I think it would be there in about a month or so. > > > But > > > > I could be too optimistic. It would also be good to hear what do > > Aljoscha > > > > and Stephan think as they are also involved in FLIP-27. > > > > > > > > In general I think we should have Pulsar connector in Flink 1.10, > > > > preferably with the new source interface. We can also check it in right > > > now > > > > with old source interface, but I suspect few users will use it before > > the > > > > next official release. Therefore, it seems reasonable to wait a little > > > bit > > > > to see whether we can jump to the new source interface. As long as we > > > make > > > > sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt much. > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann <[hidden email]> > > > wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > I'm wondering what the problem would be if we committed the Pulsar > > > > > connector before the new source interface is ready. If I understood > > it > > > > > correctly, then we need to support the old source interface anyway > > for > > > > the > > > > > existing connectors. By checking it in early I could see the benefit > > > that > > > > > our users could start using the connector earlier. Moreover, it would > > > > > prevent that the Pulsar integration is being delayed in case that the > > > > > source interface should be delayed. The only downside I see is the > > > extra > > > > > review effort and potential fixes which might be irrelevant for the > > new > > > > > source interface implementation. I guess it mainly depends on how > > > certain > > > > > we are when the new source interface will be ready. > > > > > > > > > > Cheers, > > > > > Till > > > > > > > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <[hidden email]> > > > wrote: > > > > > > > > > > > Hi Sijie and Yijie, > > > > > > > > > > > > Thanks for sharing your thoughts. > > > > > > > > > > > > Just want to have some update on FLIP-27. Although the FLIP wiki > > and > > > > > > discussion thread has been quiet for some time, a few committer / > > > > > > contributors in Flink community were actually prototyping the > > entire > > > > > thing. > > > > > > We have made some good progress there but want to update the FLIP > > > wiki > > > > > > after the entire thing is verified to work in case there are some > > > last > > > > > > minute surprise in the implementation. I don't have an exact ETA > > yet, > > > > > but I > > > > > > guess it is going to be within a month or so. > > > > > > > > > > > > I am happy to review the current Flink Pulsar connector and see if > > it > > > > > would > > > > > > fit in FLIP-27. It would be good to avoid the case that we checked > > in > > > > the > > > > > > Pulsar connector with some review efforts and shortly after that > > the > > > > new > > > > > > Source interface is ready. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen < > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > > > > > > > As Sijie said, the goal of the connector has always been to > > provide > > > > > > > users with the latest features of both systems as soon as > > possible. > > > > We > > > > > > > propose to contribute the connector to Flink and hope to get more > > > > > > > suggestions and feedback from Flink experts to ensure the high > > > > quality > > > > > > > of the connector. > > > > > > > > > > > > > > For FLIP-27, we noticed its existence at the beginning of > > reworking > > > > > > > the connector implementation based on Flink 1.9; we also wanted > > to > > > > > > > build a connector that supports both batch and stream computing > > > based > > > > > > > on it. > > > > > > > However, it has been inactive for some time, so we decided to > > > provide > > > > > > > a connector with most of the new features, such as the new type > > > > system > > > > > > > and the new catalog API first. We will pay attention to the > > > progress > > > > > > > of FLIP-27 continually and incorporate it with the connector as > > > soon > > > > > > > as possible. > > > > > > > > > > > > > > Regarding the test status of the connector, we are following the > > > > other > > > > > > > connectors' test in Flink repository and aimed to provide > > > throughout > > > > > > > tests as we could. We are also happy to hear suggestions and > > > > > > > supervision from the Flink community to improve the stability and > > > > > > > performance of the connector continuously. > > > > > > > > > > > > > > Best, > > > > > > > Yijie > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email]> > > > wrote: > > > > > > > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > > > > > > > It seems to me that the main question here is about - "how can > > > the > > > > > > Flink > > > > > > > > community maintain the connector?". > > > > > > > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > > > > > > > 1) I think how and where to host this integration is kind of > > less > > > > > > > important > > > > > > > > here. I believe there can be many ways to achieve it. > > > > > > > > As part of the contribution, what we are looking for here is > > how > > > > > these > > > > > > > two > > > > > > > > communities can build the collaboration relationship on > > > developing > > > > > > > > the integration between Pulsar and Flink. Even we can try our > > > best > > > > to > > > > > > > catch > > > > > > > > up all the updates in Flink community. We are still > > > > > > > > facing the fact that we have less experiences in Flink than > > folks > > > > in > > > > > > > Flink > > > > > > > > community. In order to make sure we maintain and deliver > > > > > > > > a high-quality pulsar-flink integration to the users who use > > both > > > > > > > > technologies, we need some help from the experts from Flink > > > > > community. > > > > > > > > > > > > > > > > 2) We have been following FLIP-27 for a while. Originally we > > were > > > > > > > thinking > > > > > > > > of contributing the connectors back after integrating with the > > > > > > > > new API introduced in FLIP-27. But we decided to initiate the > > > > > > > conversation > > > > > > > > as early as possible. Because we believe there are more > > benefits > > > > > doing > > > > > > > > it now rather than later. As part of contribution, it can help > > > > Flink > > > > > > > > community understand more about Pulsar and the potential > > > > integration > > > > > > > points. > > > > > > > > Also we can also help Flink community verify the new connector > > > API > > > > as > > > > > > > well > > > > > > > > as other new API (e.g. catalog API). > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Sijie > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin < > > [hidden email]> > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar connector. > > > > > > > > > > > > > > > > > > In general, I think having Pulsar connector with strong > > support > > > > is > > > > > a > > > > > > > > > valuable addition to Flink. So I am happy the shepherd this > > > > effort. > > > > > > > > > Meanwhile, I would also like to provide some context and > > recent > > > > > > > efforts on > > > > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > > > > > > > The current way Flink maintains its connector has hit the > > > > > scalability > > > > > > > bar. > > > > > > > > > With more and more connectors coming into Flink repo, we are > > > > > facing a > > > > > > > few > > > > > > > > > problems such as long build and testing time. To address this > > > > > > problem, > > > > > > > we > > > > > > > > > have attempted to do the following: > > > > > > > > > 1. Split out the connectors into a separate repository. This > > is > > > > > > > temporarily > > > > > > > > > on hold due to potential solution to shorten the build time. > > > > > > > > > 2. Encourage the connectors to stay as ecosystem project > > while > > > > > Flink > > > > > > > tries > > > > > > > > > to provide good support for functionality and compatibility > > > > tests. > > > > > > > Robert > > > > > > > > > has driven to create a Flink Ecosystem project website and it > > > is > > > > > > going > > > > > > > > > through some final approval process. > > > > > > > > > > > > > > > > > > Given the above efforts, it would be great to first see if we > > > can > > > > > > have > > > > > > > > > Pulsar connector as an ecosystem project with great support. > > It > > > > > would > > > > > > > be > > > > > > > > > good to hear how the Flink Pulsar connector is tested > > currently > > > > to > > > > > > see > > > > > > > if > > > > > > > > > we can learn something to maintain it as an ecosystem project > > > > with > > > > > > good > > > > > > > > > quality and test coverage. If the quality as an ecosystem > > > project > > > > > is > > > > > > > hard > > > > > > > > > to guarantee, we may as well adopt it into the main repo. > > > > > > > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are making > > > > changes > > > > > to > > > > > > > the > > > > > > > > > Flink source connector architecture and interface. This > > change > > > > will > > > > > > > likely > > > > > > > > > land in 1.10. Therefore timing wise, if we are going to have > > > the > > > > > > Pulsar > > > > > > > > > connector in main repo, I am wondering if we should hold a > > > little > > > > > bit > > > > > > > and > > > > > > > > > let the Pulsar connector adapt to the new interface to avoid > > > > > shortly > > > > > > > > > deprecated work? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > > > > [hidden email]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > > > > > > > > > > > > > There were already 2 attempts at contributing a pulsar > > > > connector, > > > > > > > both > > > > > > > > > > of which failed because no committer was getting involved, > > > > > despite > > > > > > > the > > > > > > > > > > contributor opening a dedicated discussion thread about the > > > > > > > contribution > > > > > > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > > > > > > > > > > > We should really make sure that if we welcome/approve such > > a > > > > > > > > > > contribution it will actually get the attention it > > deserves. > > > > > > > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the > > connector > > > > > > outside > > > > > > > of > > > > > > > > > > Flink. We could link to it from the documentation to give > > it > > > > more > > > > > > > > > exposure. > > > > > > > > > > With the upcoming page for sharing artifacts among the > > > > community > > > > > > > (what's > > > > > > > > > > the state of that anyway?), this may be a better option. > > > > > > > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I think > > > the > > > > > > Pulsar > > > > > > > > > > > connector would be a very valuable addition since Pulsar > > > > > becomes > > > > > > > more > > > > > > > > > and > > > > > > > > > > > more popular and it would further expand Flink's > > > > > > interoperability. > > > > > > > Also > > > > > > > > > > > from a project perspective it makes sense for me to place > > > the > > > > > > > connector > > > > > > > > > > in > > > > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > > > > > > > My main concern/question is how can the Flink community > > > > > maintain > > > > > > > the > > > > > > > > > > > connector? We have seen in the past that connectors are > > > some > > > > of > > > > > > the > > > > > > > > > most > > > > > > > > > > > actively developed components because they need to be > > kept > > > in > > > > > > sync > > > > > > > with > > > > > > > > > > the > > > > > > > > > > > external system and with Flink. Given that the Pulsar > > > > community > > > > > > is > > > > > > > > > > willing > > > > > > > > > > > to help with maintaining, improving and evolving the > > > > connector, > > > > > > I'm > > > > > > > > > > > optimistic that we can achieve this. Hence, +1 for > > > > contributing > > > > > > it > > > > > > > back > > > > > > > > > > to > > > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo < > > > [hidden email] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > > > > >> > > > > > > > > > > >> Since I was the main driver behind FLINK-9641 and > > > > FLINK-9168, > > > > > > let > > > > > > > me > > > > > > > > > > try to > > > > > > > > > > >> add more context on this. > > > > > > > > > > >> > > > > > > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing > > Pulsar > > > as > > > > > > > source > > > > > > > > > and > > > > > > > > > > >> sink for Flink. The integration was done with Flink > > 1.6.0. > > > > We > > > > > > > sent out > > > > > > > > > > pull > > > > > > > > > > >> requests about a year ago and we ended up maintaining > > > those > > > > > > > connectors > > > > > > > > > > in > > > > > > > > > > >> Pulsar for Pulsar users to use Flink to process event > > > > streams > > > > > in > > > > > > > > > Pulsar. > > > > > > > > > > >> (See > > > > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > > > > ). > > > > > > > The > > > > > > > > > > Flink > > > > > > > > > > >> 1.6 integration is pretty simple and there is no schema > > > > > > > > > considerations. > > > > > > > > > > >> > > > > > > > > > > >> In the past year, we have made a lot of changes in > > Pulsar > > > > and > > > > > > > brought > > > > > > > > > > >> Pulsar schema as the first-class citizen in Pulsar. We > > > also > > > > > > > integrated > > > > > > > > > > with > > > > > > > > > > >> other computing engines for processing Pulsar event > > > streams > > > > > with > > > > > > > > > Pulsar > > > > > > > > > > >> schema. > > > > > > > > > > >> > > > > > > > > > > >> It led us to rethink how to integrate with Flink in the > > > best > > > > > > way. > > > > > > > Then > > > > > > > > > > we > > > > > > > > > > >> reimplement the pulsar-flink connectors from the ground > > up > > > > > with > > > > > > > schema > > > > > > > > > > and > > > > > > > > > > >> bring table API and catalog API as the first-class > > citizen > > > > in > > > > > > the > > > > > > > > > > >> integration. With that being said, in the new > > pulsar-flink > > > > > > > > > > implementation, > > > > > > > > > > >> you can register pulsar as a flink catalog and query / > > > > process > > > > > > the > > > > > > > > > event > > > > > > > > > > >> streams using Flink SQL. > > > > > > > > > > >> > > > > > > > > > > >> This is an example about how to use Pulsar as a Flink > > > > catalog: > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > > > > >> > > > > > > > > > > >> Yijie has also written a blog post explaining why we > > > > > > re-implement > > > > > > > the > > > > > > > > > > flink > > > > > > > > > > >> connector with Flink 1.9 and what are the changes we > > made > > > in > > > > > the > > > > > > > new > > > > > > > > > > >> connector: > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > > > > >> > > > > > > > > > > >> We believe Pulsar is not just a simple data sink or > > source > > > > for > > > > > > > Flink. > > > > > > > > > It > > > > > > > > > > >> actually can be a fully integrated streaming data > > storage > > > > for > > > > > > > Flink in > > > > > > > > > > many > > > > > > > > > > >> areas (sink, source, schema/catalog and state). The > > > > > combination > > > > > > of > > > > > > > > > Flink > > > > > > > > > > >> and Pulsar can create a great streaming warehouse > > > > architecture > > > > > > for > > > > > > > > > > >> streaming-first, unified data processing. Since we are > > > > talking > > > > > > to > > > > > > > > > > >> contribute Pulsar integration to Flink here, we are also > > > > > > > dedicated to > > > > > > > > > > >> maintain, improve and evolve the integration with Flink > > to > > > > > help > > > > > > > the > > > > > > > > > > users > > > > > > > > > > >> who use both Flink and Pulsar. > > > > > > > > > > >> > > > > > > > > > > >> Hope this give you a bit more background about the > > pulsar > > > > > flink > > > > > > > > > > >> integration. Let me know what are your thoughts. > > > > > > > > > > >> > > > > > > > > > > >> Thanks, > > > > > > > > > > >> Sijie > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang < > > > [hidden email]> > > > > > > > wrote: > > > > > > > > > > >> > > > > > > > > > > >>> Hi Yijie > > > > > > > > > > >>> > > > > > > > > > > >>> I can see that Pulsar becomes more and more popular > > > > recently > > > > > > and > > > > > > > very > > > > > > > > > > >> glad > > > > > > > > > > >>> to see more people willing to contribute to Flink > > > > ecosystem. > > > > > > > > > > >>> > > > > > > > > > > >>> Before any further discussion, would you please give > > some > > > > > > > explanation > > > > > > > > > > of > > > > > > > > > > >>> the relationship between this thread to current > > existing > > > > > JIRAs > > > > > > of > > > > > > > > > > pulsar > > > > > > > > > > >>> source [1] and sink [2] connector? Will the > > contribution > > > > > > contains > > > > > > > > > part > > > > > > > > > > of > > > > > > > > > > >>> those PRs or totally different implementation? > > > > > > > > > > >>> > > > > > > > > > > >>> [1] https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > > > > >>> [2] https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > > > > >>> > > > > > > > > > > >>> Best > > > > > > > > > > >>> Yun Tang > > > > > > > > > > >>> ________________________________ > > > > > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > > > > >>> To: [hidden email] <[hidden email]> > > > > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink connector > > back > > > > to > > > > > > > Flink > > > > > > > > > > >>> > > > > > > > > > > >>> Dear Flink Community! > > > > > > > > > > >>> > > > > > > > > > > >>> I would like to open the discussion of contributing > > > Pulsar > > > > > > Flink > > > > > > > > > > >>> connector [0] back to Flink. > > > > > > > > > > >>> > > > > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > > > > >>> > > > > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, high-performance > > > > > > distributed > > > > > > > > > > >>> pub-sub messaging system. Pulsar includes multiple > > > features > > > > > > such > > > > > > > as > > > > > > > > > > >>> native support for multiple clusters in a Pulsar > > > instance, > > > > > with > > > > > > > > > > >>> seamless geo-replication of messages across clusters, > > > very > > > > > low > > > > > > > > > publish > > > > > > > > > > >>> and end-to-end latency, seamless scalability to over a > > > > > million > > > > > > > > > topics, > > > > > > > > > > >>> and guaranteed message delivery with persistent message > > > > > storage > > > > > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has > > been > > > > > > adopted > > > > > > > by > > > > > > > > > > >>> more and more companies[2]. > > > > > > > > > > >>> > > > > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > > > > >>> > > > > > > > > > > >>> The Pulsar Flink connector we are planning to > > contribute > > > is > > > > > > built > > > > > > > > > upon > > > > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features are: > > > > > > > > > > >>> - Pulsar as a streaming source with exactly-once > > > guarantee. > > > > > > > > > > >>> - Sink streaming results to Pulsar with at-least-once > > > > > > semantics. > > > > > > > (We > > > > > > > > > > >>> would update this to exactly-once as well when Pulsar > > > gets > > > > > all > > > > > > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > > > > > > >>> - Build upon Flink new Table API Type system > > > (FLIP-37[3]), > > > > > and > > > > > > > can > > > > > > > > > > >>> automatically (de)serialize messages with the help of > > > > Pulsar > > > > > > > schema. > > > > > > > > > > >>> - Integrate with Flink new Catalog API (FLIP-30[4]), > > > which > > > > > > > enables > > > > > > > > > the > > > > > > > > > > >>> use of Pulsar topics as tables in Table API as well as > > > SQL > > > > > > > client. > > > > > > > > > > >>> > > > > > > > > > > >>> ## Reference > > > > > > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > > > > >>> [3] > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > > > > >>> [4] > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > > > > >>> > > > > > > > > > > >>> Best, > > > > > > > > > > >>> Yijie Shen > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi all!
Nice to see this lively discussion about the Pulsar connector. Some thoughts on the open questions: ## Contribute to Flink or maintain as a community package Looks like the discussion is more going towards contribution. I think that is good, especially if we think that we want to build a similarly deep integration with Pulsar as we have for example with Kafka. The connector already looks like a more thorough connector than many others we have in the repository. With either a repo split, or the new build system, I hope that the build overhead is not a problem. ## Committer Support Becket offered some help already, I can also help a bit. I hope that between us, we can cover this. ## Contribute now, or wait for FLIP-27 As Becket said, FLIP-27 is actually making some PoC-ing progress, but will take 2 more months, I would estimate, before it is fully available. If we want to be on the safe side with the contribution, we should contribute the source sooner and adjust it later. That would also help us in case things get crazy towards the 1.10 feature freeze and it would be hard to find time to review the new changes. What would be the overhead of contributing now? Given that the code is already there, it looks like it would be only review capacity, right? Best, Stephan On Tue, Sep 10, 2019 at 11:04 AM Yijie Shen <[hidden email]> wrote: > Hi everyone! > > Thanks for your attention and the promotion of this work. > > We will prepare a FLIP as soon as possible for more specific discussions. > > For FLIP-27, it seems that we have not reached a consensus. Therefore, > I will explain all the functionalities of the existing connector in > the FLIP (including Source, Sink, and Catalog) to continue our > discussions in FLIP. > > Thanks for your kind help. > > Best, > Yijie > > On Tue, Sep 10, 2019 at 9:57 AM Becket Qin <[hidden email]> wrote: > > > > Hi Sijie, > > > > If we agree that the goal is to have Pulsar connector in 1.10, how about > we > > do the following: > > > > 0. Start a FLIP to add Pulsar connector to Flink main repo as it is a new > > public interface to Flink main repo. > > 1. Start to review the Pulsar sink right away as there is no change to > the > > sink interface so far. > > 2. Wait a little bit on FLIP-27. Flink 1.10 is going to be code freeze in > > late Nov and let's say we give a month to the development and review of > > Pulsar connector, we need to have FLIP-27 by late Oct. There are still 7 > > weeks. Personally I think it is doable. If FLIP-27 is not ready by late > > Oct, we can review and check in Pulsar connector with the existing source > > interface. This means we will have Pulsar connector in Flink 1.10, either > > with or without FLIP-27. > > > > Because we are going to have Pulsar sink and source checked in > separately, > > it might make sense to have two FLIPs, one for Pulsar sink and another > for > > Pulsar source. And we can start the work on Pulsar sink right away. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Mon, Sep 9, 2019 at 4:13 PM Sijie Guo <[hidden email]> wrote: > > > > > Thank you Bowen and Becket. > > > > > > What's the take from Flink community? Shall we wait for FLIP-27 or > shall we > > > proceed to next steps? And what the next steps are? :-) > > > > > > Thanks, > > > Sijie > > > > > > On Thu, Sep 5, 2019 at 2:43 PM Bowen Li <[hidden email]> wrote: > > > > > > > Hi, > > > > > > > > I think having a Pulsar connector in Flink can be a good mutual > benefit > > > to > > > > both communities. > > > > > > > > Another perspective is that Pulsar connector is the 1st streaming > > > connector > > > > that integrates with Flink's metadata management system and Catalog > APIs. > > > > It'll be cool to see how the integration turns out and whether we > need to > > > > improve Flink Catalog stack, which are currently in Beta, to cater to > > > > streaming source/sink. Thus I'm in favor of merging Pulsar connector > into > > > > Flink 1.10. > > > > > > > > I'd suggest to submit smaller sized PRs, e.g. maybe one for basic > > > > source/sink functionalities and another for schema and catalog > > > integration, > > > > just to make them easier to review. > > > > > > > > It doesn't seem to hurt to wait for FLIP-27. But I don't think > FLIP-27 > > > > should be a blocker in cases where it cannot make its way into 1.10 > or > > > > doesn't leave reasonable amount of time for committers to review or > for > > > > Pulsar connector to fully adapt to new interfaces. > > > > > > > > Bowen > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:21 AM Becket Qin <[hidden email]> > wrote: > > > > > > > > > Hi Till, > > > > > > > > > > You are right. It all depends on when the new source interface is > going > > > > to > > > > > be ready. Personally I think it would be there in about a month or > so. > > > > But > > > > > I could be too optimistic. It would also be good to hear what do > > > Aljoscha > > > > > and Stephan think as they are also involved in FLIP-27. > > > > > > > > > > In general I think we should have Pulsar connector in Flink 1.10, > > > > > preferably with the new source interface. We can also check it in > right > > > > now > > > > > with old source interface, but I suspect few users will use it > before > > > the > > > > > next official release. Therefore, it seems reasonable to wait a > little > > > > bit > > > > > to see whether we can jump to the new source interface. As long as > we > > > > make > > > > > sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt > much. > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann <[hidden email] > > > > > > wrote: > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > I'm wondering what the problem would be if we committed the > Pulsar > > > > > > connector before the new source interface is ready. If I > understood > > > it > > > > > > correctly, then we need to support the old source interface > anyway > > > for > > > > > the > > > > > > existing connectors. By checking it in early I could see the > benefit > > > > that > > > > > > our users could start using the connector earlier. Moreover, it > would > > > > > > prevent that the Pulsar integration is being delayed in case > that the > > > > > > source interface should be delayed. The only downside I see is > the > > > > extra > > > > > > review effort and potential fixes which might be irrelevant for > the > > > new > > > > > > source interface implementation. I guess it mainly depends on how > > > > certain > > > > > > we are when the new source interface will be ready. > > > > > > > > > > > > Cheers, > > > > > > Till > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin <[hidden email]> > > > > wrote: > > > > > > > > > > > > > Hi Sijie and Yijie, > > > > > > > > > > > > > > Thanks for sharing your thoughts. > > > > > > > > > > > > > > Just want to have some update on FLIP-27. Although the FLIP > wiki > > > and > > > > > > > discussion thread has been quiet for some time, a few > committer / > > > > > > > contributors in Flink community were actually prototyping the > > > entire > > > > > > thing. > > > > > > > We have made some good progress there but want to update the > FLIP > > > > wiki > > > > > > > after the entire thing is verified to work in case there are > some > > > > last > > > > > > > minute surprise in the implementation. I don't have an exact > ETA > > > yet, > > > > > > but I > > > > > > > guess it is going to be within a month or so. > > > > > > > > > > > > > > I am happy to review the current Flink Pulsar connector and > see if > > > it > > > > > > would > > > > > > > fit in FLIP-27. It would be good to avoid the case that we > checked > > > in > > > > > the > > > > > > > Pulsar connector with some review efforts and shortly after > that > > > the > > > > > new > > > > > > > Source interface is ready. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen < > > > [hidden email] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > > > > > > > > > As Sijie said, the goal of the connector has always been to > > > provide > > > > > > > > users with the latest features of both systems as soon as > > > possible. > > > > > We > > > > > > > > propose to contribute the connector to Flink and hope to get > more > > > > > > > > suggestions and feedback from Flink experts to ensure the > high > > > > > quality > > > > > > > > of the connector. > > > > > > > > > > > > > > > > For FLIP-27, we noticed its existence at the beginning of > > > reworking > > > > > > > > the connector implementation based on Flink 1.9; we also > wanted > > > to > > > > > > > > build a connector that supports both batch and stream > computing > > > > based > > > > > > > > on it. > > > > > > > > However, it has been inactive for some time, so we decided to > > > > provide > > > > > > > > a connector with most of the new features, such as the new > type > > > > > system > > > > > > > > and the new catalog API first. We will pay attention to the > > > > progress > > > > > > > > of FLIP-27 continually and incorporate it with the connector > as > > > > soon > > > > > > > > as possible. > > > > > > > > > > > > > > > > Regarding the test status of the connector, we are following > the > > > > > other > > > > > > > > connectors' test in Flink repository and aimed to provide > > > > throughout > > > > > > > > tests as we could. We are also happy to hear suggestions and > > > > > > > > supervision from the Flink community to improve the > stability and > > > > > > > > performance of the connector continuously. > > > > > > > > > > > > > > > > Best, > > > > > > > > Yijie > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo <[hidden email] > > > > > > wrote: > > > > > > > > > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > > > > > > > > > It seems to me that the main question here is about - "how > can > > > > the > > > > > > > Flink > > > > > > > > > community maintain the connector?". > > > > > > > > > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > > > > > > > > > 1) I think how and where to host this integration is kind > of > > > less > > > > > > > > important > > > > > > > > > here. I believe there can be many ways to achieve it. > > > > > > > > > As part of the contribution, what we are looking for here > is > > > how > > > > > > these > > > > > > > > two > > > > > > > > > communities can build the collaboration relationship on > > > > developing > > > > > > > > > the integration between Pulsar and Flink. Even we can try > our > > > > best > > > > > to > > > > > > > > catch > > > > > > > > > up all the updates in Flink community. We are still > > > > > > > > > facing the fact that we have less experiences in Flink than > > > folks > > > > > in > > > > > > > > Flink > > > > > > > > > community. In order to make sure we maintain and deliver > > > > > > > > > a high-quality pulsar-flink integration to the users who > use > > > both > > > > > > > > > technologies, we need some help from the experts from Flink > > > > > > community. > > > > > > > > > > > > > > > > > > 2) We have been following FLIP-27 for a while. Originally > we > > > were > > > > > > > > thinking > > > > > > > > > of contributing the connectors back after integrating with > the > > > > > > > > > new API introduced in FLIP-27. But we decided to initiate > the > > > > > > > > conversation > > > > > > > > > as early as possible. Because we believe there are more > > > benefits > > > > > > doing > > > > > > > > > it now rather than later. As part of contribution, it can > help > > > > > Flink > > > > > > > > > community understand more about Pulsar and the potential > > > > > integration > > > > > > > > points. > > > > > > > > > Also we can also help Flink community verify the new > connector > > > > API > > > > > as > > > > > > > > well > > > > > > > > > as other new API (e.g. catalog API). > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Sijie > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin < > > > [hidden email]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar > connector. > > > > > > > > > > > > > > > > > > > > In general, I think having Pulsar connector with strong > > > support > > > > > is > > > > > > a > > > > > > > > > > valuable addition to Flink. So I am happy the shepherd > this > > > > > effort. > > > > > > > > > > Meanwhile, I would also like to provide some context and > > > recent > > > > > > > > efforts on > > > > > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > > > > > > > > > The current way Flink maintains its connector has hit the > > > > > > scalability > > > > > > > > bar. > > > > > > > > > > With more and more connectors coming into Flink repo, we > are > > > > > > facing a > > > > > > > > few > > > > > > > > > > problems such as long build and testing time. To address > this > > > > > > > problem, > > > > > > > > we > > > > > > > > > > have attempted to do the following: > > > > > > > > > > 1. Split out the connectors into a separate repository. > This > > > is > > > > > > > > temporarily > > > > > > > > > > on hold due to potential solution to shorten the build > time. > > > > > > > > > > 2. Encourage the connectors to stay as ecosystem project > > > while > > > > > > Flink > > > > > > > > tries > > > > > > > > > > to provide good support for functionality and > compatibility > > > > > tests. > > > > > > > > Robert > > > > > > > > > > has driven to create a Flink Ecosystem project website > and it > > > > is > > > > > > > going > > > > > > > > > > through some final approval process. > > > > > > > > > > > > > > > > > > > > Given the above efforts, it would be great to first see > if we > > > > can > > > > > > > have > > > > > > > > > > Pulsar connector as an ecosystem project with great > support. > > > It > > > > > > would > > > > > > > > be > > > > > > > > > > good to hear how the Flink Pulsar connector is tested > > > currently > > > > > to > > > > > > > see > > > > > > > > if > > > > > > > > > > we can learn something to maintain it as an ecosystem > project > > > > > with > > > > > > > good > > > > > > > > > > quality and test coverage. If the quality as an ecosystem > > > > project > > > > > > is > > > > > > > > hard > > > > > > > > > > to guarantee, we may as well adopt it into the main repo. > > > > > > > > > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are > making > > > > > changes > > > > > > to > > > > > > > > the > > > > > > > > > > Flink source connector architecture and interface. This > > > change > > > > > will > > > > > > > > likely > > > > > > > > > > land in 1.10. Therefore timing wise, if we are going to > have > > > > the > > > > > > > Pulsar > > > > > > > > > > connector in main repo, I am wondering if we should hold > a > > > > little > > > > > > bit > > > > > > > > and > > > > > > > > > > let the Pulsar connector adapt to the new interface to > avoid > > > > > > shortly > > > > > > > > > > deprecated work? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > > > > > [hidden email]> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > I'm quite worried that we may end up repeating history. > > > > > > > > > > > > > > > > > > > > > > There were already 2 attempts at contributing a pulsar > > > > > connector, > > > > > > > > both > > > > > > > > > > > of which failed because no committer was getting > involved, > > > > > > despite > > > > > > > > the > > > > > > > > > > > contributor opening a dedicated discussion thread > about the > > > > > > > > contribution > > > > > > > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > > > > > > > > > > > > > We should really make sure that if we welcome/approve > such > > > a > > > > > > > > > > > contribution it will actually get the attention it > > > deserves. > > > > > > > > > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the > > > connector > > > > > > > outside > > > > > > > > of > > > > > > > > > > > Flink. We could link to it from the documentation to > give > > > it > > > > > more > > > > > > > > > > exposure. > > > > > > > > > > > With the upcoming page for sharing artifacts among the > > > > > community > > > > > > > > (what's > > > > > > > > > > > the state of that anyway?), this may be a better > option. > > > > > > > > > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I > think > > > > the > > > > > > > Pulsar > > > > > > > > > > > > connector would be a very valuable addition since > Pulsar > > > > > > becomes > > > > > > > > more > > > > > > > > > > and > > > > > > > > > > > > more popular and it would further expand Flink's > > > > > > > interoperability. > > > > > > > > Also > > > > > > > > > > > > from a project perspective it makes sense for me to > place > > > > the > > > > > > > > connector > > > > > > > > > > > in > > > > > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > > > > > > > > > My main concern/question is how can the Flink > community > > > > > > maintain > > > > > > > > the > > > > > > > > > > > > connector? We have seen in the past that connectors > are > > > > some > > > > > of > > > > > > > the > > > > > > > > > > most > > > > > > > > > > > > actively developed components because they need to be > > > kept > > > > in > > > > > > > sync > > > > > > > > with > > > > > > > > > > > the > > > > > > > > > > > > external system and with Flink. Given that the Pulsar > > > > > community > > > > > > > is > > > > > > > > > > > willing > > > > > > > > > > > > to help with maintaining, improving and evolving the > > > > > connector, > > > > > > > I'm > > > > > > > > > > > > optimistic that we can achieve this. Hence, +1 for > > > > > contributing > > > > > > > it > > > > > > > > back > > > > > > > > > > > to > > > > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo < > > > > [hidden email] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > > > > > >> > > > > > > > > > > > >> Since I was the main driver behind FLINK-9641 and > > > > > FLINK-9168, > > > > > > > let > > > > > > > > me > > > > > > > > > > > try to > > > > > > > > > > > >> add more context on this. > > > > > > > > > > > >> > > > > > > > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing > > > Pulsar > > > > as > > > > > > > > source > > > > > > > > > > and > > > > > > > > > > > >> sink for Flink. The integration was done with Flink > > > 1.6.0. > > > > > We > > > > > > > > sent out > > > > > > > > > > > pull > > > > > > > > > > > >> requests about a year ago and we ended up > maintaining > > > > those > > > > > > > > connectors > > > > > > > > > > > in > > > > > > > > > > > >> Pulsar for Pulsar users to use Flink to process > event > > > > > streams > > > > > > in > > > > > > > > > > Pulsar. > > > > > > > > > > > >> (See > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > > > > > ). > > > > > > > > The > > > > > > > > > > > Flink > > > > > > > > > > > >> 1.6 integration is pretty simple and there is no > schema > > > > > > > > > > considerations. > > > > > > > > > > > >> > > > > > > > > > > > >> In the past year, we have made a lot of changes in > > > Pulsar > > > > > and > > > > > > > > brought > > > > > > > > > > > >> Pulsar schema as the first-class citizen in Pulsar. > We > > > > also > > > > > > > > integrated > > > > > > > > > > > with > > > > > > > > > > > >> other computing engines for processing Pulsar event > > > > streams > > > > > > with > > > > > > > > > > Pulsar > > > > > > > > > > > >> schema. > > > > > > > > > > > >> > > > > > > > > > > > >> It led us to rethink how to integrate with Flink in > the > > > > best > > > > > > > way. > > > > > > > > Then > > > > > > > > > > > we > > > > > > > > > > > >> reimplement the pulsar-flink connectors from the > ground > > > up > > > > > > with > > > > > > > > schema > > > > > > > > > > > and > > > > > > > > > > > >> bring table API and catalog API as the first-class > > > citizen > > > > > in > > > > > > > the > > > > > > > > > > > >> integration. With that being said, in the new > > > pulsar-flink > > > > > > > > > > > implementation, > > > > > > > > > > > >> you can register pulsar as a flink catalog and > query / > > > > > process > > > > > > > the > > > > > > > > > > event > > > > > > > > > > > >> streams using Flink SQL. > > > > > > > > > > > >> > > > > > > > > > > > >> This is an example about how to use Pulsar as a > Flink > > > > > catalog: > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > > > > > >> > > > > > > > > > > > >> Yijie has also written a blog post explaining why we > > > > > > > re-implement > > > > > > > > the > > > > > > > > > > > flink > > > > > > > > > > > >> connector with Flink 1.9 and what are the changes we > > > made > > > > in > > > > > > the > > > > > > > > new > > > > > > > > > > > >> connector: > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > > > > > >> > > > > > > > > > > > >> We believe Pulsar is not just a simple data sink or > > > source > > > > > for > > > > > > > > Flink. > > > > > > > > > > It > > > > > > > > > > > >> actually can be a fully integrated streaming data > > > storage > > > > > for > > > > > > > > Flink in > > > > > > > > > > > many > > > > > > > > > > > >> areas (sink, source, schema/catalog and state). The > > > > > > combination > > > > > > > of > > > > > > > > > > Flink > > > > > > > > > > > >> and Pulsar can create a great streaming warehouse > > > > > architecture > > > > > > > for > > > > > > > > > > > >> streaming-first, unified data processing. Since we > are > > > > > talking > > > > > > > to > > > > > > > > > > > >> contribute Pulsar integration to Flink here, we are > also > > > > > > > > dedicated to > > > > > > > > > > > >> maintain, improve and evolve the integration with > Flink > > > to > > > > > > help > > > > > > > > the > > > > > > > > > > > users > > > > > > > > > > > >> who use both Flink and Pulsar. > > > > > > > > > > > >> > > > > > > > > > > > >> Hope this give you a bit more background about the > > > pulsar > > > > > > flink > > > > > > > > > > > >> integration. Let me know what are your thoughts. > > > > > > > > > > > >> > > > > > > > > > > > >> Thanks, > > > > > > > > > > > >> Sijie > > > > > > > > > > > >> > > > > > > > > > > > >> > > > > > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang < > > > > [hidden email]> > > > > > > > > wrote: > > > > > > > > > > > >> > > > > > > > > > > > >>> Hi Yijie > > > > > > > > > > > >>> > > > > > > > > > > > >>> I can see that Pulsar becomes more and more popular > > > > > recently > > > > > > > and > > > > > > > > very > > > > > > > > > > > >> glad > > > > > > > > > > > >>> to see more people willing to contribute to Flink > > > > > ecosystem. > > > > > > > > > > > >>> > > > > > > > > > > > >>> Before any further discussion, would you please > give > > > some > > > > > > > > explanation > > > > > > > > > > > of > > > > > > > > > > > >>> the relationship between this thread to current > > > existing > > > > > > JIRAs > > > > > > > of > > > > > > > > > > > pulsar > > > > > > > > > > > >>> source [1] and sink [2] connector? Will the > > > contribution > > > > > > > contains > > > > > > > > > > part > > > > > > > > > > > of > > > > > > > > > > > >>> those PRs or totally different implementation? > > > > > > > > > > > >>> > > > > > > > > > > > >>> [1] > https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > > > > > >>> [2] > https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > > > > > >>> > > > > > > > > > > > >>> Best > > > > > > > > > > > >>> Yun Tang > > > > > > > > > > > >>> ________________________________ > > > > > > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > > > > > >>> To: [hidden email] <[hidden email]> > > > > > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink > connector > > > back > > > > > to > > > > > > > > Flink > > > > > > > > > > > >>> > > > > > > > > > > > >>> Dear Flink Community! > > > > > > > > > > > >>> > > > > > > > > > > > >>> I would like to open the discussion of contributing > > > > Pulsar > > > > > > > Flink > > > > > > > > > > > >>> connector [0] back to Flink. > > > > > > > > > > > >>> > > > > > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > > > > > >>> > > > > > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, > high-performance > > > > > > > distributed > > > > > > > > > > > >>> pub-sub messaging system. Pulsar includes multiple > > > > features > > > > > > > such > > > > > > > > as > > > > > > > > > > > >>> native support for multiple clusters in a Pulsar > > > > instance, > > > > > > with > > > > > > > > > > > >>> seamless geo-replication of messages across > clusters, > > > > very > > > > > > low > > > > > > > > > > publish > > > > > > > > > > > >>> and end-to-end latency, seamless scalability to > over a > > > > > > million > > > > > > > > > > topics, > > > > > > > > > > > >>> and guaranteed message delivery with persistent > message > > > > > > storage > > > > > > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar has > > > been > > > > > > > adopted > > > > > > > > by > > > > > > > > > > > >>> more and more companies[2]. > > > > > > > > > > > >>> > > > > > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > > > > > >>> > > > > > > > > > > > >>> The Pulsar Flink connector we are planning to > > > contribute > > > > is > > > > > > > built > > > > > > > > > > upon > > > > > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features > are: > > > > > > > > > > > >>> - Pulsar as a streaming source with exactly-once > > > > guarantee. > > > > > > > > > > > >>> - Sink streaming results to Pulsar with > at-least-once > > > > > > > semantics. > > > > > > > > (We > > > > > > > > > > > >>> would update this to exactly-once as well when > Pulsar > > > > gets > > > > > > all > > > > > > > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > > > > > > > >>> - Build upon Flink new Table API Type system > > > > (FLIP-37[3]), > > > > > > and > > > > > > > > can > > > > > > > > > > > >>> automatically (de)serialize messages with the help > of > > > > > Pulsar > > > > > > > > schema. > > > > > > > > > > > >>> - Integrate with Flink new Catalog API > (FLIP-30[4]), > > > > which > > > > > > > > enables > > > > > > > > > > the > > > > > > > > > > > >>> use of Pulsar topics as tables in Table API as > well as > > > > SQL > > > > > > > > client. > > > > > > > > > > > >>> > > > > > > > > > > > >>> ## Reference > > > > > > > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > > > > > >>> [3] > > > > > > > > > > > >>> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > > > > > >>> [4] > > > > > > > > > > > >>> > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > > > > > >>> > > > > > > > > > > > >>> Best, > > > > > > > > > > > >>> Yijie Shen > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi Stephan,
Thanks for the volunteering to help. Yes, the overhead would just be review capacity. In fact, I am not worrying too much about the review capacity. That is just a one time cost. My concern is mainly about the long term burden. Assume we have new source interface ready in 1.10 with newly added Pulsar connectors in old interface. Later on if we migrate Pulsar to new source interface, the old Pulsar connector might be deprecated almost immediately after checked in, but we may still have to maintain two code bases. For the existing connectors, we have to do that anyways. But it would be good to avoid introducing a new connector with the same problem. Thanks, Jiangjie (Becket) Qin On Tue, Sep 10, 2019 at 6:51 PM Stephan Ewen <[hidden email]> wrote: > Hi all! > > Nice to see this lively discussion about the Pulsar connector. > Some thoughts on the open questions: > > ## Contribute to Flink or maintain as a community package > > Looks like the discussion is more going towards contribution. I think that > is good, especially if we think that we want to build a similarly deep > integration with Pulsar as we have for example with Kafka. The connector > already looks like a more thorough connector than many others we have in > the repository. > > With either a repo split, or the new build system, I hope that the build > overhead is not a problem. > > ## Committer Support > > Becket offered some help already, I can also help a bit. I hope that > between us, we can cover this. > > ## Contribute now, or wait for FLIP-27 > > As Becket said, FLIP-27 is actually making some PoC-ing progress, but will > take 2 more months, I would estimate, before it is fully available. > > If we want to be on the safe side with the contribution, we should > contribute the source sooner and adjust it later. That would also help us > in case things get crazy towards the 1.10 feature freeze and it would be > hard to find time to review the new changes. > What would be the overhead of contributing now? Given that the code is > already there, it looks like it would be only review capacity, right? > > Best, > Stephan > > On Tue, Sep 10, 2019 at 11:04 AM Yijie Shen <[hidden email]> > wrote: > > > Hi everyone! > > > > Thanks for your attention and the promotion of this work. > > > > We will prepare a FLIP as soon as possible for more specific discussions. > > > > For FLIP-27, it seems that we have not reached a consensus. Therefore, > > I will explain all the functionalities of the existing connector in > > the FLIP (including Source, Sink, and Catalog) to continue our > > discussions in FLIP. > > > > Thanks for your kind help. > > > > Best, > > Yijie > > > > On Tue, Sep 10, 2019 at 9:57 AM Becket Qin <[hidden email]> wrote: > > > > > > Hi Sijie, > > > > > > If we agree that the goal is to have Pulsar connector in 1.10, how > about > > we > > > do the following: > > > > > > 0. Start a FLIP to add Pulsar connector to Flink main repo as it is a > new > > > public interface to Flink main repo. > > > 1. Start to review the Pulsar sink right away as there is no change to > > the > > > sink interface so far. > > > 2. Wait a little bit on FLIP-27. Flink 1.10 is going to be code freeze > in > > > late Nov and let's say we give a month to the development and review of > > > Pulsar connector, we need to have FLIP-27 by late Oct. There are still > 7 > > > weeks. Personally I think it is doable. If FLIP-27 is not ready by late > > > Oct, we can review and check in Pulsar connector with the existing > source > > > interface. This means we will have Pulsar connector in Flink 1.10, > either > > > with or without FLIP-27. > > > > > > Because we are going to have Pulsar sink and source checked in > > separately, > > > it might make sense to have two FLIPs, one for Pulsar sink and another > > for > > > Pulsar source. And we can start the work on Pulsar sink right away. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Mon, Sep 9, 2019 at 4:13 PM Sijie Guo <[hidden email]> wrote: > > > > > > > Thank you Bowen and Becket. > > > > > > > > What's the take from Flink community? Shall we wait for FLIP-27 or > > shall we > > > > proceed to next steps? And what the next steps are? :-) > > > > > > > > Thanks, > > > > Sijie > > > > > > > > On Thu, Sep 5, 2019 at 2:43 PM Bowen Li <[hidden email]> wrote: > > > > > > > > > Hi, > > > > > > > > > > I think having a Pulsar connector in Flink can be a good mutual > > benefit > > > > to > > > > > both communities. > > > > > > > > > > Another perspective is that Pulsar connector is the 1st streaming > > > > connector > > > > > that integrates with Flink's metadata management system and Catalog > > APIs. > > > > > It'll be cool to see how the integration turns out and whether we > > need to > > > > > improve Flink Catalog stack, which are currently in Beta, to cater > to > > > > > streaming source/sink. Thus I'm in favor of merging Pulsar > connector > > into > > > > > Flink 1.10. > > > > > > > > > > I'd suggest to submit smaller sized PRs, e.g. maybe one for basic > > > > > source/sink functionalities and another for schema and catalog > > > > integration, > > > > > just to make them easier to review. > > > > > > > > > > It doesn't seem to hurt to wait for FLIP-27. But I don't think > > FLIP-27 > > > > > should be a blocker in cases where it cannot make its way into 1.10 > > or > > > > > doesn't leave reasonable amount of time for committers to review or > > for > > > > > Pulsar connector to fully adapt to new interfaces. > > > > > > > > > > Bowen > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:21 AM Becket Qin <[hidden email]> > > wrote: > > > > > > > > > > > Hi Till, > > > > > > > > > > > > You are right. It all depends on when the new source interface is > > going > > > > > to > > > > > > be ready. Personally I think it would be there in about a month > or > > so. > > > > > But > > > > > > I could be too optimistic. It would also be good to hear what do > > > > Aljoscha > > > > > > and Stephan think as they are also involved in FLIP-27. > > > > > > > > > > > > In general I think we should have Pulsar connector in Flink 1.10, > > > > > > preferably with the new source interface. We can also check it in > > right > > > > > now > > > > > > with old source interface, but I suspect few users will use it > > before > > > > the > > > > > > next official release. Therefore, it seems reasonable to wait a > > little > > > > > bit > > > > > > to see whether we can jump to the new source interface. As long > as > > we > > > > > make > > > > > > sure Flink 1.10 has it, waiting a little bit doesn't seem to hurt > > much. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann < > [hidden email] > > > > > > > > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > I'm wondering what the problem would be if we committed the > > Pulsar > > > > > > > connector before the new source interface is ready. If I > > understood > > > > it > > > > > > > correctly, then we need to support the old source interface > > anyway > > > > for > > > > > > the > > > > > > > existing connectors. By checking it in early I could see the > > benefit > > > > > that > > > > > > > our users could start using the connector earlier. Moreover, it > > would > > > > > > > prevent that the Pulsar integration is being delayed in case > > that the > > > > > > > source interface should be delayed. The only downside I see is > > the > > > > > extra > > > > > > > review effort and potential fixes which might be irrelevant for > > the > > > > new > > > > > > > source interface implementation. I guess it mainly depends on > how > > > > > certain > > > > > > > we are when the new source interface will be ready. > > > > > > > > > > > > > > Cheers, > > > > > > > Till > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin < > [hidden email]> > > > > > wrote: > > > > > > > > > > > > > > > Hi Sijie and Yijie, > > > > > > > > > > > > > > > > Thanks for sharing your thoughts. > > > > > > > > > > > > > > > > Just want to have some update on FLIP-27. Although the FLIP > > wiki > > > > and > > > > > > > > discussion thread has been quiet for some time, a few > > committer / > > > > > > > > contributors in Flink community were actually prototyping the > > > > entire > > > > > > > thing. > > > > > > > > We have made some good progress there but want to update the > > FLIP > > > > > wiki > > > > > > > > after the entire thing is verified to work in case there are > > some > > > > > last > > > > > > > > minute surprise in the implementation. I don't have an exact > > ETA > > > > yet, > > > > > > > but I > > > > > > > > guess it is going to be within a month or so. > > > > > > > > > > > > > > > > I am happy to review the current Flink Pulsar connector and > > see if > > > > it > > > > > > > would > > > > > > > > fit in FLIP-27. It would be good to avoid the case that we > > checked > > > > in > > > > > > the > > > > > > > > Pulsar connector with some review efforts and shortly after > > that > > > > the > > > > > > new > > > > > > > > Source interface is ready. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen < > > > > [hidden email] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > > > > > > > > > > > As Sijie said, the goal of the connector has always been to > > > > provide > > > > > > > > > users with the latest features of both systems as soon as > > > > possible. > > > > > > We > > > > > > > > > propose to contribute the connector to Flink and hope to > get > > more > > > > > > > > > suggestions and feedback from Flink experts to ensure the > > high > > > > > > quality > > > > > > > > > of the connector. > > > > > > > > > > > > > > > > > > For FLIP-27, we noticed its existence at the beginning of > > > > reworking > > > > > > > > > the connector implementation based on Flink 1.9; we also > > wanted > > > > to > > > > > > > > > build a connector that supports both batch and stream > > computing > > > > > based > > > > > > > > > on it. > > > > > > > > > However, it has been inactive for some time, so we decided > to > > > > > provide > > > > > > > > > a connector with most of the new features, such as the new > > type > > > > > > system > > > > > > > > > and the new catalog API first. We will pay attention to the > > > > > progress > > > > > > > > > of FLIP-27 continually and incorporate it with the > connector > > as > > > > > soon > > > > > > > > > as possible. > > > > > > > > > > > > > > > > > > Regarding the test status of the connector, we are > following > > the > > > > > > other > > > > > > > > > connectors' test in Flink repository and aimed to provide > > > > > throughout > > > > > > > > > tests as we could. We are also happy to hear suggestions > and > > > > > > > > > supervision from the Flink community to improve the > > stability and > > > > > > > > > performance of the connector continuously. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Yijie > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo < > [hidden email] > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > > > > > > > > > > > It seems to me that the main question here is about - > "how > > can > > > > > the > > > > > > > > Flink > > > > > > > > > > community maintain the connector?". > > > > > > > > > > > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > > > > > > > > > > > 1) I think how and where to host this integration is kind > > of > > > > less > > > > > > > > > important > > > > > > > > > > here. I believe there can be many ways to achieve it. > > > > > > > > > > As part of the contribution, what we are looking for here > > is > > > > how > > > > > > > these > > > > > > > > > two > > > > > > > > > > communities can build the collaboration relationship on > > > > > developing > > > > > > > > > > the integration between Pulsar and Flink. Even we can try > > our > > > > > best > > > > > > to > > > > > > > > > catch > > > > > > > > > > up all the updates in Flink community. We are still > > > > > > > > > > facing the fact that we have less experiences in Flink > than > > > > folks > > > > > > in > > > > > > > > > Flink > > > > > > > > > > community. In order to make sure we maintain and deliver > > > > > > > > > > a high-quality pulsar-flink integration to the users who > > use > > > > both > > > > > > > > > > technologies, we need some help from the experts from > Flink > > > > > > > community. > > > > > > > > > > > > > > > > > > > > 2) We have been following FLIP-27 for a while. Originally > > we > > > > were > > > > > > > > > thinking > > > > > > > > > > of contributing the connectors back after integrating > with > > the > > > > > > > > > > new API introduced in FLIP-27. But we decided to initiate > > the > > > > > > > > > conversation > > > > > > > > > > as early as possible. Because we believe there are more > > > > benefits > > > > > > > doing > > > > > > > > > > it now rather than later. As part of contribution, it can > > help > > > > > > Flink > > > > > > > > > > community understand more about Pulsar and the potential > > > > > > integration > > > > > > > > > points. > > > > > > > > > > Also we can also help Flink community verify the new > > connector > > > > > API > > > > > > as > > > > > > > > > well > > > > > > > > > > as other new API (e.g. catalog API). > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Sijie > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin < > > > > [hidden email]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar > > connector. > > > > > > > > > > > > > > > > > > > > > > In general, I think having Pulsar connector with strong > > > > support > > > > > > is > > > > > > > a > > > > > > > > > > > valuable addition to Flink. So I am happy the shepherd > > this > > > > > > effort. > > > > > > > > > > > Meanwhile, I would also like to provide some context > and > > > > recent > > > > > > > > > efforts on > > > > > > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > > > > > > > > > > > The current way Flink maintains its connector has hit > the > > > > > > > scalability > > > > > > > > > bar. > > > > > > > > > > > With more and more connectors coming into Flink repo, > we > > are > > > > > > > facing a > > > > > > > > > few > > > > > > > > > > > problems such as long build and testing time. To > address > > this > > > > > > > > problem, > > > > > > > > > we > > > > > > > > > > > have attempted to do the following: > > > > > > > > > > > 1. Split out the connectors into a separate repository. > > This > > > > is > > > > > > > > > temporarily > > > > > > > > > > > on hold due to potential solution to shorten the build > > time. > > > > > > > > > > > 2. Encourage the connectors to stay as ecosystem > project > > > > while > > > > > > > Flink > > > > > > > > > tries > > > > > > > > > > > to provide good support for functionality and > > compatibility > > > > > > tests. > > > > > > > > > Robert > > > > > > > > > > > has driven to create a Flink Ecosystem project website > > and it > > > > > is > > > > > > > > going > > > > > > > > > > > through some final approval process. > > > > > > > > > > > > > > > > > > > > > > Given the above efforts, it would be great to first see > > if we > > > > > can > > > > > > > > have > > > > > > > > > > > Pulsar connector as an ecosystem project with great > > support. > > > > It > > > > > > > would > > > > > > > > > be > > > > > > > > > > > good to hear how the Flink Pulsar connector is tested > > > > currently > > > > > > to > > > > > > > > see > > > > > > > > > if > > > > > > > > > > > we can learn something to maintain it as an ecosystem > > project > > > > > > with > > > > > > > > good > > > > > > > > > > > quality and test coverage. If the quality as an > ecosystem > > > > > project > > > > > > > is > > > > > > > > > hard > > > > > > > > > > > to guarantee, we may as well adopt it into the main > repo. > > > > > > > > > > > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are > > making > > > > > > changes > > > > > > > to > > > > > > > > > the > > > > > > > > > > > Flink source connector architecture and interface. This > > > > change > > > > > > will > > > > > > > > > likely > > > > > > > > > > > land in 1.10. Therefore timing wise, if we are going to > > have > > > > > the > > > > > > > > Pulsar > > > > > > > > > > > connector in main repo, I am wondering if we should > hold > > a > > > > > little > > > > > > > bit > > > > > > > > > and > > > > > > > > > > > let the Pulsar connector adapt to the new interface to > > avoid > > > > > > > shortly > > > > > > > > > > > deprecated work? > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > > > > > > [hidden email]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > I'm quite worried that we may end up repeating > history. > > > > > > > > > > > > > > > > > > > > > > > > There were already 2 attempts at contributing a > pulsar > > > > > > connector, > > > > > > > > > both > > > > > > > > > > > > of which failed because no committer was getting > > involved, > > > > > > > despite > > > > > > > > > the > > > > > > > > > > > > contributor opening a dedicated discussion thread > > about the > > > > > > > > > contribution > > > > > > > > > > > > beforehand and getting several +1's from committers. > > > > > > > > > > > > > > > > > > > > > > > > We should really make sure that if we welcome/approve > > such > > > > a > > > > > > > > > > > > contribution it will actually get the attention it > > > > deserves. > > > > > > > > > > > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the > > > > connector > > > > > > > > outside > > > > > > > > > of > > > > > > > > > > > > Flink. We could link to it from the documentation to > > give > > > > it > > > > > > more > > > > > > > > > > > exposure. > > > > > > > > > > > > With the upcoming page for sharing artifacts among > the > > > > > > community > > > > > > > > > (what's > > > > > > > > > > > > the state of that anyway?), this may be a better > > option. > > > > > > > > > > > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. I > > think > > > > > the > > > > > > > > Pulsar > > > > > > > > > > > > > connector would be a very valuable addition since > > Pulsar > > > > > > > becomes > > > > > > > > > more > > > > > > > > > > > and > > > > > > > > > > > > > more popular and it would further expand Flink's > > > > > > > > interoperability. > > > > > > > > > Also > > > > > > > > > > > > > from a project perspective it makes sense for me to > > place > > > > > the > > > > > > > > > connector > > > > > > > > > > > > in > > > > > > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > > > > > > > > > > > My main concern/question is how can the Flink > > community > > > > > > > maintain > > > > > > > > > the > > > > > > > > > > > > > connector? We have seen in the past that connectors > > are > > > > > some > > > > > > of > > > > > > > > the > > > > > > > > > > > most > > > > > > > > > > > > > actively developed components because they need to > be > > > > kept > > > > > in > > > > > > > > sync > > > > > > > > > with > > > > > > > > > > > > the > > > > > > > > > > > > > external system and with Flink. Given that the > Pulsar > > > > > > community > > > > > > > > is > > > > > > > > > > > > willing > > > > > > > > > > > > > to help with maintaining, improving and evolving > the > > > > > > connector, > > > > > > > > I'm > > > > > > > > > > > > > optimistic that we can achieve this. Hence, +1 for > > > > > > contributing > > > > > > > > it > > > > > > > > > back > > > > > > > > > > > > to > > > > > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo < > > > > > [hidden email] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > > > > > > >> > > > > > > > > > > > > >> Since I was the main driver behind FLINK-9641 and > > > > > > FLINK-9168, > > > > > > > > let > > > > > > > > > me > > > > > > > > > > > > try to > > > > > > > > > > > > >> add more context on this. > > > > > > > > > > > > >> > > > > > > > > > > > > >> FLINK-9641 and FLINK-9168 was created for bringing > > > > Pulsar > > > > > as > > > > > > > > > source > > > > > > > > > > > and > > > > > > > > > > > > >> sink for Flink. The integration was done with > Flink > > > > 1.6.0. > > > > > > We > > > > > > > > > sent out > > > > > > > > > > > > pull > > > > > > > > > > > > >> requests about a year ago and we ended up > > maintaining > > > > > those > > > > > > > > > connectors > > > > > > > > > > > > in > > > > > > > > > > > > >> Pulsar for Pulsar users to use Flink to process > > event > > > > > > streams > > > > > > > in > > > > > > > > > > > Pulsar. > > > > > > > > > > > > >> (See > > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > > > > > > ). > > > > > > > > > The > > > > > > > > > > > > Flink > > > > > > > > > > > > >> 1.6 integration is pretty simple and there is no > > schema > > > > > > > > > > > considerations. > > > > > > > > > > > > >> > > > > > > > > > > > > >> In the past year, we have made a lot of changes in > > > > Pulsar > > > > > > and > > > > > > > > > brought > > > > > > > > > > > > >> Pulsar schema as the first-class citizen in > Pulsar. > > We > > > > > also > > > > > > > > > integrated > > > > > > > > > > > > with > > > > > > > > > > > > >> other computing engines for processing Pulsar > event > > > > > streams > > > > > > > with > > > > > > > > > > > Pulsar > > > > > > > > > > > > >> schema. > > > > > > > > > > > > >> > > > > > > > > > > > > >> It led us to rethink how to integrate with Flink > in > > the > > > > > best > > > > > > > > way. > > > > > > > > > Then > > > > > > > > > > > > we > > > > > > > > > > > > >> reimplement the pulsar-flink connectors from the > > ground > > > > up > > > > > > > with > > > > > > > > > schema > > > > > > > > > > > > and > > > > > > > > > > > > >> bring table API and catalog API as the first-class > > > > citizen > > > > > > in > > > > > > > > the > > > > > > > > > > > > >> integration. With that being said, in the new > > > > pulsar-flink > > > > > > > > > > > > implementation, > > > > > > > > > > > > >> you can register pulsar as a flink catalog and > > query / > > > > > > process > > > > > > > > the > > > > > > > > > > > event > > > > > > > > > > > > >> streams using Flink SQL. > > > > > > > > > > > > >> > > > > > > > > > > > > >> This is an example about how to use Pulsar as a > > Flink > > > > > > catalog: > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > > > > > > >> > > > > > > > > > > > > >> Yijie has also written a blog post explaining why > we > > > > > > > > re-implement > > > > > > > > > the > > > > > > > > > > > > flink > > > > > > > > > > > > >> connector with Flink 1.9 and what are the changes > we > > > > made > > > > > in > > > > > > > the > > > > > > > > > new > > > > > > > > > > > > >> connector: > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > > > > > > >> > > > > > > > > > > > > >> We believe Pulsar is not just a simple data sink > or > > > > source > > > > > > for > > > > > > > > > Flink. > > > > > > > > > > > It > > > > > > > > > > > > >> actually can be a fully integrated streaming data > > > > storage > > > > > > for > > > > > > > > > Flink in > > > > > > > > > > > > many > > > > > > > > > > > > >> areas (sink, source, schema/catalog and state). > The > > > > > > > combination > > > > > > > > of > > > > > > > > > > > Flink > > > > > > > > > > > > >> and Pulsar can create a great streaming warehouse > > > > > > architecture > > > > > > > > for > > > > > > > > > > > > >> streaming-first, unified data processing. Since we > > are > > > > > > talking > > > > > > > > to > > > > > > > > > > > > >> contribute Pulsar integration to Flink here, we > are > > also > > > > > > > > > dedicated to > > > > > > > > > > > > >> maintain, improve and evolve the integration with > > Flink > > > > to > > > > > > > help > > > > > > > > > the > > > > > > > > > > > > users > > > > > > > > > > > > >> who use both Flink and Pulsar. > > > > > > > > > > > > >> > > > > > > > > > > > > >> Hope this give you a bit more background about the > > > > pulsar > > > > > > > flink > > > > > > > > > > > > >> integration. Let me know what are your thoughts. > > > > > > > > > > > > >> > > > > > > > > > > > > >> Thanks, > > > > > > > > > > > > >> Sijie > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang < > > > > > [hidden email]> > > > > > > > > > wrote: > > > > > > > > > > > > >> > > > > > > > > > > > > >>> Hi Yijie > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> I can see that Pulsar becomes more and more > popular > > > > > > recently > > > > > > > > and > > > > > > > > > very > > > > > > > > > > > > >> glad > > > > > > > > > > > > >>> to see more people willing to contribute to Flink > > > > > > ecosystem. > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Before any further discussion, would you please > > give > > > > some > > > > > > > > > explanation > > > > > > > > > > > > of > > > > > > > > > > > > >>> the relationship between this thread to current > > > > existing > > > > > > > JIRAs > > > > > > > > of > > > > > > > > > > > > pulsar > > > > > > > > > > > > >>> source [1] and sink [2] connector? Will the > > > > contribution > > > > > > > > contains > > > > > > > > > > > part > > > > > > > > > > > > of > > > > > > > > > > > > >>> those PRs or totally different implementation? > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> [1] > > https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > > > > > > >>> [2] > > https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Best > > > > > > > > > > > > >>> Yun Tang > > > > > > > > > > > > >>> ________________________________ > > > > > > > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > > > > > > >>> To: [hidden email] <[hidden email]> > > > > > > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink > > connector > > > > back > > > > > > to > > > > > > > > > Flink > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Dear Flink Community! > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> I would like to open the discussion of > contributing > > > > > Pulsar > > > > > > > > Flink > > > > > > > > > > > > >>> connector [0] back to Flink. > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, > > high-performance > > > > > > > > distributed > > > > > > > > > > > > >>> pub-sub messaging system. Pulsar includes > multiple > > > > > features > > > > > > > > such > > > > > > > > > as > > > > > > > > > > > > >>> native support for multiple clusters in a Pulsar > > > > > instance, > > > > > > > with > > > > > > > > > > > > >>> seamless geo-replication of messages across > > clusters, > > > > > very > > > > > > > low > > > > > > > > > > > publish > > > > > > > > > > > > >>> and end-to-end latency, seamless scalability to > > over a > > > > > > > million > > > > > > > > > > > topics, > > > > > > > > > > > > >>> and guaranteed message delivery with persistent > > message > > > > > > > storage > > > > > > > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar > has > > > > been > > > > > > > > adopted > > > > > > > > > by > > > > > > > > > > > > >>> more and more companies[2]. > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> The Pulsar Flink connector we are planning to > > > > contribute > > > > > is > > > > > > > > built > > > > > > > > > > > upon > > > > > > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features > > are: > > > > > > > > > > > > >>> - Pulsar as a streaming source with exactly-once > > > > > guarantee. > > > > > > > > > > > > >>> - Sink streaming results to Pulsar with > > at-least-once > > > > > > > > semantics. > > > > > > > > > (We > > > > > > > > > > > > >>> would update this to exactly-once as well when > > Pulsar > > > > > gets > > > > > > > all > > > > > > > > > > > > >>> transaction features ready in its 2.5.0 version) > > > > > > > > > > > > >>> - Build upon Flink new Table API Type system > > > > > (FLIP-37[3]), > > > > > > > and > > > > > > > > > can > > > > > > > > > > > > >>> automatically (de)serialize messages with the > help > > of > > > > > > Pulsar > > > > > > > > > schema. > > > > > > > > > > > > >>> - Integrate with Flink new Catalog API > > (FLIP-30[4]), > > > > > which > > > > > > > > > enables > > > > > > > > > > > the > > > > > > > > > > > > >>> use of Pulsar topics as tables in Table API as > > well as > > > > > SQL > > > > > > > > > client. > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> ## Reference > > > > > > > > > > > > >>> [0] https://github.com/streamnative/pulsar-flink > > > > > > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > > > > > > >>> [3] > > > > > > > > > > > > >>> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > > > > > > >>> [4] > > > > > > > > > > > > >>> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > > > > > > >>> > > > > > > > > > > > > >>> Best, > > > > > > > > > > > > >>> Yijie Shen > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Agreed, if we check in the old code, we should make it clear that it will
be removed as soon as the FLIP-27 based version of the connector is there. We should not commit to maintaining the old version, that would be indeed too much overhead. On Thu, Sep 12, 2019 at 3:30 AM Becket Qin <[hidden email]> wrote: > Hi Stephan, > > Thanks for the volunteering to help. > > Yes, the overhead would just be review capacity. In fact, I am not worrying > too much about the review capacity. That is just a one time cost. My > concern is mainly about the long term burden. Assume we have new source > interface ready in 1.10 with newly added Pulsar connectors in old > interface. Later on if we migrate Pulsar to new source interface, the old > Pulsar connector might be deprecated almost immediately after checked in, > but we may still have to maintain two code bases. For the existing > connectors, we have to do that anyways. But it would be good to avoid > introducing a new connector with the same problem. > > Thanks, > > Jiangjie (Becket) Qin > > On Tue, Sep 10, 2019 at 6:51 PM Stephan Ewen <[hidden email]> wrote: > > > Hi all! > > > > Nice to see this lively discussion about the Pulsar connector. > > Some thoughts on the open questions: > > > > ## Contribute to Flink or maintain as a community package > > > > Looks like the discussion is more going towards contribution. I think > that > > is good, especially if we think that we want to build a similarly deep > > integration with Pulsar as we have for example with Kafka. The connector > > already looks like a more thorough connector than many others we have in > > the repository. > > > > With either a repo split, or the new build system, I hope that the build > > overhead is not a problem. > > > > ## Committer Support > > > > Becket offered some help already, I can also help a bit. I hope that > > between us, we can cover this. > > > > ## Contribute now, or wait for FLIP-27 > > > > As Becket said, FLIP-27 is actually making some PoC-ing progress, but > will > > take 2 more months, I would estimate, before it is fully available. > > > > If we want to be on the safe side with the contribution, we should > > contribute the source sooner and adjust it later. That would also help us > > in case things get crazy towards the 1.10 feature freeze and it would be > > hard to find time to review the new changes. > > What would be the overhead of contributing now? Given that the code is > > already there, it looks like it would be only review capacity, right? > > > > Best, > > Stephan > > > > On Tue, Sep 10, 2019 at 11:04 AM Yijie Shen <[hidden email]> > > wrote: > > > > > Hi everyone! > > > > > > Thanks for your attention and the promotion of this work. > > > > > > We will prepare a FLIP as soon as possible for more specific > discussions. > > > > > > For FLIP-27, it seems that we have not reached a consensus. Therefore, > > > I will explain all the functionalities of the existing connector in > > > the FLIP (including Source, Sink, and Catalog) to continue our > > > discussions in FLIP. > > > > > > Thanks for your kind help. > > > > > > Best, > > > Yijie > > > > > > On Tue, Sep 10, 2019 at 9:57 AM Becket Qin <[hidden email]> > wrote: > > > > > > > > Hi Sijie, > > > > > > > > If we agree that the goal is to have Pulsar connector in 1.10, how > > about > > > we > > > > do the following: > > > > > > > > 0. Start a FLIP to add Pulsar connector to Flink main repo as it is a > > new > > > > public interface to Flink main repo. > > > > 1. Start to review the Pulsar sink right away as there is no change > to > > > the > > > > sink interface so far. > > > > 2. Wait a little bit on FLIP-27. Flink 1.10 is going to be code > freeze > > in > > > > late Nov and let's say we give a month to the development and review > of > > > > Pulsar connector, we need to have FLIP-27 by late Oct. There are > still > > 7 > > > > weeks. Personally I think it is doable. If FLIP-27 is not ready by > late > > > > Oct, we can review and check in Pulsar connector with the existing > > source > > > > interface. This means we will have Pulsar connector in Flink 1.10, > > either > > > > with or without FLIP-27. > > > > > > > > Because we are going to have Pulsar sink and source checked in > > > separately, > > > > it might make sense to have two FLIPs, one for Pulsar sink and > another > > > for > > > > Pulsar source. And we can start the work on Pulsar sink right away. > > > > > > > > Thanks, > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > On Mon, Sep 9, 2019 at 4:13 PM Sijie Guo <[hidden email]> wrote: > > > > > > > > > Thank you Bowen and Becket. > > > > > > > > > > What's the take from Flink community? Shall we wait for FLIP-27 or > > > shall we > > > > > proceed to next steps? And what the next steps are? :-) > > > > > > > > > > Thanks, > > > > > Sijie > > > > > > > > > > On Thu, Sep 5, 2019 at 2:43 PM Bowen Li <[hidden email]> > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > I think having a Pulsar connector in Flink can be a good mutual > > > benefit > > > > > to > > > > > > both communities. > > > > > > > > > > > > Another perspective is that Pulsar connector is the 1st streaming > > > > > connector > > > > > > that integrates with Flink's metadata management system and > Catalog > > > APIs. > > > > > > It'll be cool to see how the integration turns out and whether we > > > need to > > > > > > improve Flink Catalog stack, which are currently in Beta, to > cater > > to > > > > > > streaming source/sink. Thus I'm in favor of merging Pulsar > > connector > > > into > > > > > > Flink 1.10. > > > > > > > > > > > > I'd suggest to submit smaller sized PRs, e.g. maybe one for basic > > > > > > source/sink functionalities and another for schema and catalog > > > > > integration, > > > > > > just to make them easier to review. > > > > > > > > > > > > It doesn't seem to hurt to wait for FLIP-27. But I don't think > > > FLIP-27 > > > > > > should be a blocker in cases where it cannot make its way into > 1.10 > > > or > > > > > > doesn't leave reasonable amount of time for committers to review > or > > > for > > > > > > Pulsar connector to fully adapt to new interfaces. > > > > > > > > > > > > Bowen > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:21 AM Becket Qin <[hidden email]> > > > wrote: > > > > > > > > > > > > > Hi Till, > > > > > > > > > > > > > > You are right. It all depends on when the new source interface > is > > > going > > > > > > to > > > > > > > be ready. Personally I think it would be there in about a month > > or > > > so. > > > > > > But > > > > > > > I could be too optimistic. It would also be good to hear what > do > > > > > Aljoscha > > > > > > > and Stephan think as they are also involved in FLIP-27. > > > > > > > > > > > > > > In general I think we should have Pulsar connector in Flink > 1.10, > > > > > > > preferably with the new source interface. We can also check it > in > > > right > > > > > > now > > > > > > > with old source interface, but I suspect few users will use it > > > before > > > > > the > > > > > > > next official release. Therefore, it seems reasonable to wait a > > > little > > > > > > bit > > > > > > > to see whether we can jump to the new source interface. As long > > as > > > we > > > > > > make > > > > > > > sure Flink 1.10 has it, waiting a little bit doesn't seem to > hurt > > > much. > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann < > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > I'm wondering what the problem would be if we committed the > > > Pulsar > > > > > > > > connector before the new source interface is ready. If I > > > understood > > > > > it > > > > > > > > correctly, then we need to support the old source interface > > > anyway > > > > > for > > > > > > > the > > > > > > > > existing connectors. By checking it in early I could see the > > > benefit > > > > > > that > > > > > > > > our users could start using the connector earlier. Moreover, > it > > > would > > > > > > > > prevent that the Pulsar integration is being delayed in case > > > that the > > > > > > > > source interface should be delayed. The only downside I see > is > > > the > > > > > > extra > > > > > > > > review effort and potential fixes which might be irrelevant > for > > > the > > > > > new > > > > > > > > source interface implementation. I guess it mainly depends on > > how > > > > > > certain > > > > > > > > we are when the new source interface will be ready. > > > > > > > > > > > > > > > > Cheers, > > > > > > > > Till > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin < > > [hidden email]> > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Sijie and Yijie, > > > > > > > > > > > > > > > > > > Thanks for sharing your thoughts. > > > > > > > > > > > > > > > > > > Just want to have some update on FLIP-27. Although the FLIP > > > wiki > > > > > and > > > > > > > > > discussion thread has been quiet for some time, a few > > > committer / > > > > > > > > > contributors in Flink community were actually prototyping > the > > > > > entire > > > > > > > > thing. > > > > > > > > > We have made some good progress there but want to update > the > > > FLIP > > > > > > wiki > > > > > > > > > after the entire thing is verified to work in case there > are > > > some > > > > > > last > > > > > > > > > minute surprise in the implementation. I don't have an > exact > > > ETA > > > > > yet, > > > > > > > > but I > > > > > > > > > guess it is going to be within a month or so. > > > > > > > > > > > > > > > > > > I am happy to review the current Flink Pulsar connector and > > > see if > > > > > it > > > > > > > > would > > > > > > > > > fit in FLIP-27. It would be good to avoid the case that we > > > checked > > > > > in > > > > > > > the > > > > > > > > > Pulsar connector with some review efforts and shortly after > > > that > > > > > the > > > > > > > new > > > > > > > > > Source interface is ready. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen < > > > > > [hidden email] > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > > > > > > > > > > > > > As Sijie said, the goal of the connector has always been > to > > > > > provide > > > > > > > > > > users with the latest features of both systems as soon as > > > > > possible. > > > > > > > We > > > > > > > > > > propose to contribute the connector to Flink and hope to > > get > > > more > > > > > > > > > > suggestions and feedback from Flink experts to ensure the > > > high > > > > > > > quality > > > > > > > > > > of the connector. > > > > > > > > > > > > > > > > > > > > For FLIP-27, we noticed its existence at the beginning of > > > > > reworking > > > > > > > > > > the connector implementation based on Flink 1.9; we also > > > wanted > > > > > to > > > > > > > > > > build a connector that supports both batch and stream > > > computing > > > > > > based > > > > > > > > > > on it. > > > > > > > > > > However, it has been inactive for some time, so we > decided > > to > > > > > > provide > > > > > > > > > > a connector with most of the new features, such as the > new > > > type > > > > > > > system > > > > > > > > > > and the new catalog API first. We will pay attention to > the > > > > > > progress > > > > > > > > > > of FLIP-27 continually and incorporate it with the > > connector > > > as > > > > > > soon > > > > > > > > > > as possible. > > > > > > > > > > > > > > > > > > > > Regarding the test status of the connector, we are > > following > > > the > > > > > > > other > > > > > > > > > > connectors' test in Flink repository and aimed to provide > > > > > > throughout > > > > > > > > > > tests as we could. We are also happy to hear suggestions > > and > > > > > > > > > > supervision from the Flink community to improve the > > > stability and > > > > > > > > > > performance of the connector continuously. > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > Yijie > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo < > > [hidden email] > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > > > > > > > > > > > > > It seems to me that the main question here is about - > > "how > > > can > > > > > > the > > > > > > > > > Flink > > > > > > > > > > > community maintain the connector?". > > > > > > > > > > > > > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > > > > > > > > > > > > > 1) I think how and where to host this integration is > kind > > > of > > > > > less > > > > > > > > > > important > > > > > > > > > > > here. I believe there can be many ways to achieve it. > > > > > > > > > > > As part of the contribution, what we are looking for > here > > > is > > > > > how > > > > > > > > these > > > > > > > > > > two > > > > > > > > > > > communities can build the collaboration relationship on > > > > > > developing > > > > > > > > > > > the integration between Pulsar and Flink. Even we can > try > > > our > > > > > > best > > > > > > > to > > > > > > > > > > catch > > > > > > > > > > > up all the updates in Flink community. We are still > > > > > > > > > > > facing the fact that we have less experiences in Flink > > than > > > > > folks > > > > > > > in > > > > > > > > > > Flink > > > > > > > > > > > community. In order to make sure we maintain and > deliver > > > > > > > > > > > a high-quality pulsar-flink integration to the users > who > > > use > > > > > both > > > > > > > > > > > technologies, we need some help from the experts from > > Flink > > > > > > > > community. > > > > > > > > > > > > > > > > > > > > > > 2) We have been following FLIP-27 for a while. > Originally > > > we > > > > > were > > > > > > > > > > thinking > > > > > > > > > > > of contributing the connectors back after integrating > > with > > > the > > > > > > > > > > > new API introduced in FLIP-27. But we decided to > initiate > > > the > > > > > > > > > > conversation > > > > > > > > > > > as early as possible. Because we believe there are more > > > > > benefits > > > > > > > > doing > > > > > > > > > > > it now rather than later. As part of contribution, it > can > > > help > > > > > > > Flink > > > > > > > > > > > community understand more about Pulsar and the > potential > > > > > > > integration > > > > > > > > > > points. > > > > > > > > > > > Also we can also help Flink community verify the new > > > connector > > > > > > API > > > > > > > as > > > > > > > > > > well > > > > > > > > > > > as other new API (e.g. catalog API). > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > Sijie > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin < > > > > > [hidden email]> > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar > > > connector. > > > > > > > > > > > > > > > > > > > > > > > > In general, I think having Pulsar connector with > strong > > > > > support > > > > > > > is > > > > > > > > a > > > > > > > > > > > > valuable addition to Flink. So I am happy the > shepherd > > > this > > > > > > > effort. > > > > > > > > > > > > Meanwhile, I would also like to provide some context > > and > > > > > recent > > > > > > > > > > efforts on > > > > > > > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > > > > > > > > > > > > > The current way Flink maintains its connector has hit > > the > > > > > > > > scalability > > > > > > > > > > bar. > > > > > > > > > > > > With more and more connectors coming into Flink repo, > > we > > > are > > > > > > > > facing a > > > > > > > > > > few > > > > > > > > > > > > problems such as long build and testing time. To > > address > > > this > > > > > > > > > problem, > > > > > > > > > > we > > > > > > > > > > > > have attempted to do the following: > > > > > > > > > > > > 1. Split out the connectors into a separate > repository. > > > This > > > > > is > > > > > > > > > > temporarily > > > > > > > > > > > > on hold due to potential solution to shorten the > build > > > time. > > > > > > > > > > > > 2. Encourage the connectors to stay as ecosystem > > project > > > > > while > > > > > > > > Flink > > > > > > > > > > tries > > > > > > > > > > > > to provide good support for functionality and > > > compatibility > > > > > > > tests. > > > > > > > > > > Robert > > > > > > > > > > > > has driven to create a Flink Ecosystem project > website > > > and it > > > > > > is > > > > > > > > > going > > > > > > > > > > > > through some final approval process. > > > > > > > > > > > > > > > > > > > > > > > > Given the above efforts, it would be great to first > see > > > if we > > > > > > can > > > > > > > > > have > > > > > > > > > > > > Pulsar connector as an ecosystem project with great > > > support. > > > > > It > > > > > > > > would > > > > > > > > > > be > > > > > > > > > > > > good to hear how the Flink Pulsar connector is tested > > > > > currently > > > > > > > to > > > > > > > > > see > > > > > > > > > > if > > > > > > > > > > > > we can learn something to maintain it as an ecosystem > > > project > > > > > > > with > > > > > > > > > good > > > > > > > > > > > > quality and test coverage. If the quality as an > > ecosystem > > > > > > project > > > > > > > > is > > > > > > > > > > hard > > > > > > > > > > > > to guarantee, we may as well adopt it into the main > > repo. > > > > > > > > > > > > > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are > > > making > > > > > > > changes > > > > > > > > to > > > > > > > > > > the > > > > > > > > > > > > Flink source connector architecture and interface. > This > > > > > change > > > > > > > will > > > > > > > > > > likely > > > > > > > > > > > > land in 1.10. Therefore timing wise, if we are going > to > > > have > > > > > > the > > > > > > > > > Pulsar > > > > > > > > > > > > connector in main repo, I am wondering if we should > > hold > > > a > > > > > > little > > > > > > > > bit > > > > > > > > > > and > > > > > > > > > > > > let the Pulsar connector adapt to the new interface > to > > > avoid > > > > > > > > shortly > > > > > > > > > > > > deprecated work? > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > > > > > > > [hidden email]> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > I'm quite worried that we may end up repeating > > history. > > > > > > > > > > > > > > > > > > > > > > > > > > There were already 2 attempts at contributing a > > pulsar > > > > > > > connector, > > > > > > > > > > both > > > > > > > > > > > > > of which failed because no committer was getting > > > involved, > > > > > > > > despite > > > > > > > > > > the > > > > > > > > > > > > > contributor opening a dedicated discussion thread > > > about the > > > > > > > > > > contribution > > > > > > > > > > > > > beforehand and getting several +1's from > committers. > > > > > > > > > > > > > > > > > > > > > > > > > > We should really make sure that if we > welcome/approve > > > such > > > > > a > > > > > > > > > > > > > contribution it will actually get the attention it > > > > > deserves. > > > > > > > > > > > > > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining the > > > > > connector > > > > > > > > > outside > > > > > > > > > > of > > > > > > > > > > > > > Flink. We could link to it from the documentation > to > > > give > > > > > it > > > > > > > more > > > > > > > > > > > > exposure. > > > > > > > > > > > > > With the upcoming page for sharing artifacts among > > the > > > > > > > community > > > > > > > > > > (what's > > > > > > > > > > > > > the state of that anyway?), this may be a better > > > option. > > > > > > > > > > > > > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks a lot for starting this discussion Yijie. > I > > > think > > > > > > the > > > > > > > > > Pulsar > > > > > > > > > > > > > > connector would be a very valuable addition since > > > Pulsar > > > > > > > > becomes > > > > > > > > > > more > > > > > > > > > > > > and > > > > > > > > > > > > > > more popular and it would further expand Flink's > > > > > > > > > interoperability. > > > > > > > > > > Also > > > > > > > > > > > > > > from a project perspective it makes sense for me > to > > > place > > > > > > the > > > > > > > > > > connector > > > > > > > > > > > > > in > > > > > > > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > > > > > > > > > > > > > My main concern/question is how can the Flink > > > community > > > > > > > > maintain > > > > > > > > > > the > > > > > > > > > > > > > > connector? We have seen in the past that > connectors > > > are > > > > > > some > > > > > > > of > > > > > > > > > the > > > > > > > > > > > > most > > > > > > > > > > > > > > actively developed components because they need > to > > be > > > > > kept > > > > > > in > > > > > > > > > sync > > > > > > > > > > with > > > > > > > > > > > > > the > > > > > > > > > > > > > > external system and with Flink. Given that the > > Pulsar > > > > > > > community > > > > > > > > > is > > > > > > > > > > > > > willing > > > > > > > > > > > > > > to help with maintaining, improving and evolving > > the > > > > > > > connector, > > > > > > > > > I'm > > > > > > > > > > > > > > optimistic that we can achieve this. Hence, +1 > for > > > > > > > contributing > > > > > > > > > it > > > > > > > > > > back > > > > > > > > > > > > > to > > > > > > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo < > > > > > > [hidden email] > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> Since I was the main driver behind FLINK-9641 > and > > > > > > > FLINK-9168, > > > > > > > > > let > > > > > > > > > > me > > > > > > > > > > > > > try to > > > > > > > > > > > > > >> add more context on this. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> FLINK-9641 and FLINK-9168 was created for > bringing > > > > > Pulsar > > > > > > as > > > > > > > > > > source > > > > > > > > > > > > and > > > > > > > > > > > > > >> sink for Flink. The integration was done with > > Flink > > > > > 1.6.0. > > > > > > > We > > > > > > > > > > sent out > > > > > > > > > > > > > pull > > > > > > > > > > > > > >> requests about a year ago and we ended up > > > maintaining > > > > > > those > > > > > > > > > > connectors > > > > > > > > > > > > > in > > > > > > > > > > > > > >> Pulsar for Pulsar users to use Flink to process > > > event > > > > > > > streams > > > > > > > > in > > > > > > > > > > > > Pulsar. > > > > > > > > > > > > > >> (See > > > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > > > > > > > ). > > > > > > > > > > The > > > > > > > > > > > > > Flink > > > > > > > > > > > > > >> 1.6 integration is pretty simple and there is no > > > schema > > > > > > > > > > > > considerations. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> In the past year, we have made a lot of changes > in > > > > > Pulsar > > > > > > > and > > > > > > > > > > brought > > > > > > > > > > > > > >> Pulsar schema as the first-class citizen in > > Pulsar. > > > We > > > > > > also > > > > > > > > > > integrated > > > > > > > > > > > > > with > > > > > > > > > > > > > >> other computing engines for processing Pulsar > > event > > > > > > streams > > > > > > > > with > > > > > > > > > > > > Pulsar > > > > > > > > > > > > > >> schema. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> It led us to rethink how to integrate with Flink > > in > > > the > > > > > > best > > > > > > > > > way. > > > > > > > > > > Then > > > > > > > > > > > > > we > > > > > > > > > > > > > >> reimplement the pulsar-flink connectors from the > > > ground > > > > > up > > > > > > > > with > > > > > > > > > > schema > > > > > > > > > > > > > and > > > > > > > > > > > > > >> bring table API and catalog API as the > first-class > > > > > citizen > > > > > > > in > > > > > > > > > the > > > > > > > > > > > > > >> integration. With that being said, in the new > > > > > pulsar-flink > > > > > > > > > > > > > implementation, > > > > > > > > > > > > > >> you can register pulsar as a flink catalog and > > > query / > > > > > > > process > > > > > > > > > the > > > > > > > > > > > > event > > > > > > > > > > > > > >> streams using Flink SQL. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> This is an example about how to use Pulsar as a > > > Flink > > > > > > > catalog: > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> Yijie has also written a blog post explaining > why > > we > > > > > > > > > re-implement > > > > > > > > > > the > > > > > > > > > > > > > flink > > > > > > > > > > > > > >> connector with Flink 1.9 and what are the > changes > > we > > > > > made > > > > > > in > > > > > > > > the > > > > > > > > > > new > > > > > > > > > > > > > >> connector: > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> We believe Pulsar is not just a simple data sink > > or > > > > > source > > > > > > > for > > > > > > > > > > Flink. > > > > > > > > > > > > It > > > > > > > > > > > > > >> actually can be a fully integrated streaming > data > > > > > storage > > > > > > > for > > > > > > > > > > Flink in > > > > > > > > > > > > > many > > > > > > > > > > > > > >> areas (sink, source, schema/catalog and state). > > The > > > > > > > > combination > > > > > > > > > of > > > > > > > > > > > > Flink > > > > > > > > > > > > > >> and Pulsar can create a great streaming > warehouse > > > > > > > architecture > > > > > > > > > for > > > > > > > > > > > > > >> streaming-first, unified data processing. Since > we > > > are > > > > > > > talking > > > > > > > > > to > > > > > > > > > > > > > >> contribute Pulsar integration to Flink here, we > > are > > > also > > > > > > > > > > dedicated to > > > > > > > > > > > > > >> maintain, improve and evolve the integration > with > > > Flink > > > > > to > > > > > > > > help > > > > > > > > > > the > > > > > > > > > > > > > users > > > > > > > > > > > > > >> who use both Flink and Pulsar. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> Hope this give you a bit more background about > the > > > > > pulsar > > > > > > > > flink > > > > > > > > > > > > > >> integration. Let me know what are your thoughts. > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> Thanks, > > > > > > > > > > > > > >> Sijie > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang < > > > > > > [hidden email]> > > > > > > > > > > wrote: > > > > > > > > > > > > > >> > > > > > > > > > > > > > >>> Hi Yijie > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> I can see that Pulsar becomes more and more > > popular > > > > > > > recently > > > > > > > > > and > > > > > > > > > > very > > > > > > > > > > > > > >> glad > > > > > > > > > > > > > >>> to see more people willing to contribute to > Flink > > > > > > > ecosystem. > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> Before any further discussion, would you please > > > give > > > > > some > > > > > > > > > > explanation > > > > > > > > > > > > > of > > > > > > > > > > > > > >>> the relationship between this thread to current > > > > > existing > > > > > > > > JIRAs > > > > > > > > > of > > > > > > > > > > > > > pulsar > > > > > > > > > > > > > >>> source [1] and sink [2] connector? Will the > > > > > contribution > > > > > > > > > contains > > > > > > > > > > > > part > > > > > > > > > > > > > of > > > > > > > > > > > > > >>> those PRs or totally different implementation? > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> [1] > > > https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > > > > > > > >>> [2] > > > https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> Best > > > > > > > > > > > > > >>> Yun Tang > > > > > > > > > > > > > >>> ________________________________ > > > > > > > > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > > > > > > > >>> To: [hidden email] <[hidden email] > > > > > > > > > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink > > > connector > > > > > back > > > > > > > to > > > > > > > > > > Flink > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> Dear Flink Community! > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> I would like to open the discussion of > > contributing > > > > > > Pulsar > > > > > > > > > Flink > > > > > > > > > > > > > >>> connector [0] back to Flink. > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, > > > high-performance > > > > > > > > > distributed > > > > > > > > > > > > > >>> pub-sub messaging system. Pulsar includes > > multiple > > > > > > features > > > > > > > > > such > > > > > > > > > > as > > > > > > > > > > > > > >>> native support for multiple clusters in a > Pulsar > > > > > > instance, > > > > > > > > with > > > > > > > > > > > > > >>> seamless geo-replication of messages across > > > clusters, > > > > > > very > > > > > > > > low > > > > > > > > > > > > publish > > > > > > > > > > > > > >>> and end-to-end latency, seamless scalability to > > > over a > > > > > > > > million > > > > > > > > > > > > topics, > > > > > > > > > > > > > >>> and guaranteed message delivery with persistent > > > message > > > > > > > > storage > > > > > > > > > > > > > >>> provided by Apache BookKeeper. Nowadays, Pulsar > > has > > > > > been > > > > > > > > > adopted > > > > > > > > > > by > > > > > > > > > > > > > >>> more and more companies[2]. > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> The Pulsar Flink connector we are planning to > > > > > contribute > > > > > > is > > > > > > > > > built > > > > > > > > > > > > upon > > > > > > > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main features > > > are: > > > > > > > > > > > > > >>> - Pulsar as a streaming source with > exactly-once > > > > > > guarantee. > > > > > > > > > > > > > >>> - Sink streaming results to Pulsar with > > > at-least-once > > > > > > > > > semantics. > > > > > > > > > > (We > > > > > > > > > > > > > >>> would update this to exactly-once as well when > > > Pulsar > > > > > > gets > > > > > > > > all > > > > > > > > > > > > > >>> transaction features ready in its 2.5.0 > version) > > > > > > > > > > > > > >>> - Build upon Flink new Table API Type system > > > > > > (FLIP-37[3]), > > > > > > > > and > > > > > > > > > > can > > > > > > > > > > > > > >>> automatically (de)serialize messages with the > > help > > > of > > > > > > > Pulsar > > > > > > > > > > schema. > > > > > > > > > > > > > >>> - Integrate with Flink new Catalog API > > > (FLIP-30[4]), > > > > > > which > > > > > > > > > > enables > > > > > > > > > > > > the > > > > > > > > > > > > > >>> use of Pulsar topics as tables in Table API as > > > well as > > > > > > SQL > > > > > > > > > > client. > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> ## Reference > > > > > > > > > > > > > >>> [0] > https://github.com/streamnative/pulsar-flink > > > > > > > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > > > > > > > >>> [3] > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > > > > > > > >>> [4] > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > > > > > > > >>> > > > > > > > > > > > > > >>> Best, > > > > > > > > > > > > > >>> Yijie Shen > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Technically speaking, removing the old connector code is a backwards
incompatible change which requires a major version bump, i.e. Flink 2.x. Given that we don't have a clear plan on when to have the next major version release, it seems unclear how long the old connector code will be there if we check it in right now. Or will we remove the old connector without a major version bump? In any case, it sounds not quite user friendly to the those who might use the old Pulsar connector. I am not sure if it is worth these potential problems in order to have the Pulsar source connector checked in one or two months earlier. Thanks, Jiangjie (Becket) Qin On Thu, Sep 12, 2019 at 3:52 PM Stephan Ewen <[hidden email]> wrote: > Agreed, if we check in the old code, we should make it clear that it will > be removed as soon as the FLIP-27 based version of the connector is there. > We should not commit to maintaining the old version, that would be indeed > too much overhead. > > On Thu, Sep 12, 2019 at 3:30 AM Becket Qin <[hidden email]> wrote: > > > Hi Stephan, > > > > Thanks for the volunteering to help. > > > > Yes, the overhead would just be review capacity. In fact, I am not > worrying > > too much about the review capacity. That is just a one time cost. My > > concern is mainly about the long term burden. Assume we have new source > > interface ready in 1.10 with newly added Pulsar connectors in old > > interface. Later on if we migrate Pulsar to new source interface, the old > > Pulsar connector might be deprecated almost immediately after checked in, > > but we may still have to maintain two code bases. For the existing > > connectors, we have to do that anyways. But it would be good to avoid > > introducing a new connector with the same problem. > > > > Thanks, > > > > Jiangjie (Becket) Qin > > > > On Tue, Sep 10, 2019 at 6:51 PM Stephan Ewen <[hidden email]> wrote: > > > > > Hi all! > > > > > > Nice to see this lively discussion about the Pulsar connector. > > > Some thoughts on the open questions: > > > > > > ## Contribute to Flink or maintain as a community package > > > > > > Looks like the discussion is more going towards contribution. I think > > that > > > is good, especially if we think that we want to build a similarly deep > > > integration with Pulsar as we have for example with Kafka. The > connector > > > already looks like a more thorough connector than many others we have > in > > > the repository. > > > > > > With either a repo split, or the new build system, I hope that the > build > > > overhead is not a problem. > > > > > > ## Committer Support > > > > > > Becket offered some help already, I can also help a bit. I hope that > > > between us, we can cover this. > > > > > > ## Contribute now, or wait for FLIP-27 > > > > > > As Becket said, FLIP-27 is actually making some PoC-ing progress, but > > will > > > take 2 more months, I would estimate, before it is fully available. > > > > > > If we want to be on the safe side with the contribution, we should > > > contribute the source sooner and adjust it later. That would also help > us > > > in case things get crazy towards the 1.10 feature freeze and it would > be > > > hard to find time to review the new changes. > > > What would be the overhead of contributing now? Given that the code is > > > already there, it looks like it would be only review capacity, right? > > > > > > Best, > > > Stephan > > > > > > On Tue, Sep 10, 2019 at 11:04 AM Yijie Shen <[hidden email] > > > > > wrote: > > > > > > > Hi everyone! > > > > > > > > Thanks for your attention and the promotion of this work. > > > > > > > > We will prepare a FLIP as soon as possible for more specific > > discussions. > > > > > > > > For FLIP-27, it seems that we have not reached a consensus. > Therefore, > > > > I will explain all the functionalities of the existing connector in > > > > the FLIP (including Source, Sink, and Catalog) to continue our > > > > discussions in FLIP. > > > > > > > > Thanks for your kind help. > > > > > > > > Best, > > > > Yijie > > > > > > > > On Tue, Sep 10, 2019 at 9:57 AM Becket Qin <[hidden email]> > > wrote: > > > > > > > > > > Hi Sijie, > > > > > > > > > > If we agree that the goal is to have Pulsar connector in 1.10, how > > > about > > > > we > > > > > do the following: > > > > > > > > > > 0. Start a FLIP to add Pulsar connector to Flink main repo as it > is a > > > new > > > > > public interface to Flink main repo. > > > > > 1. Start to review the Pulsar sink right away as there is no change > > to > > > > the > > > > > sink interface so far. > > > > > 2. Wait a little bit on FLIP-27. Flink 1.10 is going to be code > > freeze > > > in > > > > > late Nov and let's say we give a month to the development and > review > > of > > > > > Pulsar connector, we need to have FLIP-27 by late Oct. There are > > still > > > 7 > > > > > weeks. Personally I think it is doable. If FLIP-27 is not ready by > > late > > > > > Oct, we can review and check in Pulsar connector with the existing > > > source > > > > > interface. This means we will have Pulsar connector in Flink 1.10, > > > either > > > > > with or without FLIP-27. > > > > > > > > > > Because we are going to have Pulsar sink and source checked in > > > > separately, > > > > > it might make sense to have two FLIPs, one for Pulsar sink and > > another > > > > for > > > > > Pulsar source. And we can start the work on Pulsar sink right away. > > > > > > > > > > Thanks, > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > On Mon, Sep 9, 2019 at 4:13 PM Sijie Guo <[hidden email]> > wrote: > > > > > > > > > > > Thank you Bowen and Becket. > > > > > > > > > > > > What's the take from Flink community? Shall we wait for FLIP-27 > or > > > > shall we > > > > > > proceed to next steps? And what the next steps are? :-) > > > > > > > > > > > > Thanks, > > > > > > Sijie > > > > > > > > > > > > On Thu, Sep 5, 2019 at 2:43 PM Bowen Li <[hidden email]> > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I think having a Pulsar connector in Flink can be a good mutual > > > > benefit > > > > > > to > > > > > > > both communities. > > > > > > > > > > > > > > Another perspective is that Pulsar connector is the 1st > streaming > > > > > > connector > > > > > > > that integrates with Flink's metadata management system and > > Catalog > > > > APIs. > > > > > > > It'll be cool to see how the integration turns out and whether > we > > > > need to > > > > > > > improve Flink Catalog stack, which are currently in Beta, to > > cater > > > to > > > > > > > streaming source/sink. Thus I'm in favor of merging Pulsar > > > connector > > > > into > > > > > > > Flink 1.10. > > > > > > > > > > > > > > I'd suggest to submit smaller sized PRs, e.g. maybe one for > basic > > > > > > > source/sink functionalities and another for schema and catalog > > > > > > integration, > > > > > > > just to make them easier to review. > > > > > > > > > > > > > > It doesn't seem to hurt to wait for FLIP-27. But I don't think > > > > FLIP-27 > > > > > > > should be a blocker in cases where it cannot make its way into > > 1.10 > > > > or > > > > > > > doesn't leave reasonable amount of time for committers to > review > > or > > > > for > > > > > > > Pulsar connector to fully adapt to new interfaces. > > > > > > > > > > > > > > Bowen > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:21 AM Becket Qin < > [hidden email]> > > > > wrote: > > > > > > > > > > > > > > > Hi Till, > > > > > > > > > > > > > > > > You are right. It all depends on when the new source > interface > > is > > > > going > > > > > > > to > > > > > > > > be ready. Personally I think it would be there in about a > month > > > or > > > > so. > > > > > > > But > > > > > > > > I could be too optimistic. It would also be good to hear what > > do > > > > > > Aljoscha > > > > > > > > and Stephan think as they are also involved in FLIP-27. > > > > > > > > > > > > > > > > In general I think we should have Pulsar connector in Flink > > 1.10, > > > > > > > > preferably with the new source interface. We can also check > it > > in > > > > right > > > > > > > now > > > > > > > > with old source interface, but I suspect few users will use > it > > > > before > > > > > > the > > > > > > > > next official release. Therefore, it seems reasonable to > wait a > > > > little > > > > > > > bit > > > > > > > > to see whether we can jump to the new source interface. As > long > > > as > > > > we > > > > > > > make > > > > > > > > sure Flink 1.10 has it, waiting a little bit doesn't seem to > > hurt > > > > much. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann < > > > [hidden email] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > I'm wondering what the problem would be if we committed the > > > > Pulsar > > > > > > > > > connector before the new source interface is ready. If I > > > > understood > > > > > > it > > > > > > > > > correctly, then we need to support the old source interface > > > > anyway > > > > > > for > > > > > > > > the > > > > > > > > > existing connectors. By checking it in early I could see > the > > > > benefit > > > > > > > that > > > > > > > > > our users could start using the connector earlier. > Moreover, > > it > > > > would > > > > > > > > > prevent that the Pulsar integration is being delayed in > case > > > > that the > > > > > > > > > source interface should be delayed. The only downside I see > > is > > > > the > > > > > > > extra > > > > > > > > > review effort and potential fixes which might be irrelevant > > for > > > > the > > > > > > new > > > > > > > > > source interface implementation. I guess it mainly depends > on > > > how > > > > > > > certain > > > > > > > > > we are when the new source interface will be ready. > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > Till > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin < > > > [hidden email]> > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi Sijie and Yijie, > > > > > > > > > > > > > > > > > > > > Thanks for sharing your thoughts. > > > > > > > > > > > > > > > > > > > > Just want to have some update on FLIP-27. Although the > FLIP > > > > wiki > > > > > > and > > > > > > > > > > discussion thread has been quiet for some time, a few > > > > committer / > > > > > > > > > > contributors in Flink community were actually prototyping > > the > > > > > > entire > > > > > > > > > thing. > > > > > > > > > > We have made some good progress there but want to update > > the > > > > FLIP > > > > > > > wiki > > > > > > > > > > after the entire thing is verified to work in case there > > are > > > > some > > > > > > > last > > > > > > > > > > minute surprise in the implementation. I don't have an > > exact > > > > ETA > > > > > > yet, > > > > > > > > > but I > > > > > > > > > > guess it is going to be within a month or so. > > > > > > > > > > > > > > > > > > > > I am happy to review the current Flink Pulsar connector > and > > > > see if > > > > > > it > > > > > > > > > would > > > > > > > > > > fit in FLIP-27. It would be good to avoid the case that > we > > > > checked > > > > > > in > > > > > > > > the > > > > > > > > > > Pulsar connector with some review efforts and shortly > after > > > > that > > > > > > the > > > > > > > > new > > > > > > > > > > Source interface is ready. > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen < > > > > > > [hidden email] > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > > > > > > > > > > > > > > > As Sijie said, the goal of the connector has always > been > > to > > > > > > provide > > > > > > > > > > > users with the latest features of both systems as soon > as > > > > > > possible. > > > > > > > > We > > > > > > > > > > > propose to contribute the connector to Flink and hope > to > > > get > > > > more > > > > > > > > > > > suggestions and feedback from Flink experts to ensure > the > > > > high > > > > > > > > quality > > > > > > > > > > > of the connector. > > > > > > > > > > > > > > > > > > > > > > For FLIP-27, we noticed its existence at the beginning > of > > > > > > reworking > > > > > > > > > > > the connector implementation based on Flink 1.9; we > also > > > > wanted > > > > > > to > > > > > > > > > > > build a connector that supports both batch and stream > > > > computing > > > > > > > based > > > > > > > > > > > on it. > > > > > > > > > > > However, it has been inactive for some time, so we > > decided > > > to > > > > > > > provide > > > > > > > > > > > a connector with most of the new features, such as the > > new > > > > type > > > > > > > > system > > > > > > > > > > > and the new catalog API first. We will pay attention to > > the > > > > > > > progress > > > > > > > > > > > of FLIP-27 continually and incorporate it with the > > > connector > > > > as > > > > > > > soon > > > > > > > > > > > as possible. > > > > > > > > > > > > > > > > > > > > > > Regarding the test status of the connector, we are > > > following > > > > the > > > > > > > > other > > > > > > > > > > > connectors' test in Flink repository and aimed to > provide > > > > > > > throughout > > > > > > > > > > > tests as we could. We are also happy to hear > suggestions > > > and > > > > > > > > > > > supervision from the Flink community to improve the > > > > stability and > > > > > > > > > > > performance of the connector continuously. > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Yijie > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo < > > > [hidden email] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > > > > > > > > > > > > > > > It seems to me that the main question here is about - > > > "how > > > > can > > > > > > > the > > > > > > > > > > Flink > > > > > > > > > > > > community maintain the connector?". > > > > > > > > > > > > > > > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > > > > > > > > > > > > > > > 1) I think how and where to host this integration is > > kind > > > > of > > > > > > less > > > > > > > > > > > important > > > > > > > > > > > > here. I believe there can be many ways to achieve it. > > > > > > > > > > > > As part of the contribution, what we are looking for > > here > > > > is > > > > > > how > > > > > > > > > these > > > > > > > > > > > two > > > > > > > > > > > > communities can build the collaboration relationship > on > > > > > > > developing > > > > > > > > > > > > the integration between Pulsar and Flink. Even we can > > try > > > > our > > > > > > > best > > > > > > > > to > > > > > > > > > > > catch > > > > > > > > > > > > up all the updates in Flink community. We are still > > > > > > > > > > > > facing the fact that we have less experiences in > Flink > > > than > > > > > > folks > > > > > > > > in > > > > > > > > > > > Flink > > > > > > > > > > > > community. In order to make sure we maintain and > > deliver > > > > > > > > > > > > a high-quality pulsar-flink integration to the users > > who > > > > use > > > > > > both > > > > > > > > > > > > technologies, we need some help from the experts from > > > Flink > > > > > > > > > community. > > > > > > > > > > > > > > > > > > > > > > > > 2) We have been following FLIP-27 for a while. > > Originally > > > > we > > > > > > were > > > > > > > > > > > thinking > > > > > > > > > > > > of contributing the connectors back after integrating > > > with > > > > the > > > > > > > > > > > > new API introduced in FLIP-27. But we decided to > > initiate > > > > the > > > > > > > > > > > conversation > > > > > > > > > > > > as early as possible. Because we believe there are > more > > > > > > benefits > > > > > > > > > doing > > > > > > > > > > > > it now rather than later. As part of contribution, it > > can > > > > help > > > > > > > > Flink > > > > > > > > > > > > community understand more about Pulsar and the > > potential > > > > > > > > integration > > > > > > > > > > > points. > > > > > > > > > > > > Also we can also help Flink community verify the new > > > > connector > > > > > > > API > > > > > > > > as > > > > > > > > > > > well > > > > > > > > > > > > as other new API (e.g. catalog API). > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Sijie > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin < > > > > > > [hidden email]> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the interest in contributing the Pulsar > > > > connector. > > > > > > > > > > > > > > > > > > > > > > > > > > In general, I think having Pulsar connector with > > strong > > > > > > support > > > > > > > > is > > > > > > > > > a > > > > > > > > > > > > > valuable addition to Flink. So I am happy the > > shepherd > > > > this > > > > > > > > effort. > > > > > > > > > > > > > Meanwhile, I would also like to provide some > context > > > and > > > > > > recent > > > > > > > > > > > efforts on > > > > > > > > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > > > > > > > > > > > > > > > The current way Flink maintains its connector has > hit > > > the > > > > > > > > > scalability > > > > > > > > > > > bar. > > > > > > > > > > > > > With more and more connectors coming into Flink > repo, > > > we > > > > are > > > > > > > > > facing a > > > > > > > > > > > few > > > > > > > > > > > > > problems such as long build and testing time. To > > > address > > > > this > > > > > > > > > > problem, > > > > > > > > > > > we > > > > > > > > > > > > > have attempted to do the following: > > > > > > > > > > > > > 1. Split out the connectors into a separate > > repository. > > > > This > > > > > > is > > > > > > > > > > > temporarily > > > > > > > > > > > > > on hold due to potential solution to shorten the > > build > > > > time. > > > > > > > > > > > > > 2. Encourage the connectors to stay as ecosystem > > > project > > > > > > while > > > > > > > > > Flink > > > > > > > > > > > tries > > > > > > > > > > > > > to provide good support for functionality and > > > > compatibility > > > > > > > > tests. > > > > > > > > > > > Robert > > > > > > > > > > > > > has driven to create a Flink Ecosystem project > > website > > > > and it > > > > > > > is > > > > > > > > > > going > > > > > > > > > > > > > through some final approval process. > > > > > > > > > > > > > > > > > > > > > > > > > > Given the above efforts, it would be great to first > > see > > > > if we > > > > > > > can > > > > > > > > > > have > > > > > > > > > > > > > Pulsar connector as an ecosystem project with great > > > > support. > > > > > > It > > > > > > > > > would > > > > > > > > > > > be > > > > > > > > > > > > > good to hear how the Flink Pulsar connector is > tested > > > > > > currently > > > > > > > > to > > > > > > > > > > see > > > > > > > > > > > if > > > > > > > > > > > > > we can learn something to maintain it as an > ecosystem > > > > project > > > > > > > > with > > > > > > > > > > good > > > > > > > > > > > > > quality and test coverage. If the quality as an > > > ecosystem > > > > > > > project > > > > > > > > > is > > > > > > > > > > > hard > > > > > > > > > > > > > to guarantee, we may as well adopt it into the main > > > repo. > > > > > > > > > > > > > > > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we are > > > > making > > > > > > > > changes > > > > > > > > > to > > > > > > > > > > > the > > > > > > > > > > > > > Flink source connector architecture and interface. > > This > > > > > > change > > > > > > > > will > > > > > > > > > > > likely > > > > > > > > > > > > > land in 1.10. Therefore timing wise, if we are > going > > to > > > > have > > > > > > > the > > > > > > > > > > Pulsar > > > > > > > > > > > > > connector in main repo, I am wondering if we should > > > hold > > > > a > > > > > > > little > > > > > > > > > bit > > > > > > > > > > > and > > > > > > > > > > > > > let the Pulsar connector adapt to the new interface > > to > > > > avoid > > > > > > > > > shortly > > > > > > > > > > > > > deprecated work? > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > > > > > > > > [hidden email]> > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm quite worried that we may end up repeating > > > history. > > > > > > > > > > > > > > > > > > > > > > > > > > > > There were already 2 attempts at contributing a > > > pulsar > > > > > > > > connector, > > > > > > > > > > > both > > > > > > > > > > > > > > of which failed because no committer was getting > > > > involved, > > > > > > > > > despite > > > > > > > > > > > the > > > > > > > > > > > > > > contributor opening a dedicated discussion thread > > > > about the > > > > > > > > > > > contribution > > > > > > > > > > > > > > beforehand and getting several +1's from > > committers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > We should really make sure that if we > > welcome/approve > > > > such > > > > > > a > > > > > > > > > > > > > > contribution it will actually get the attention > it > > > > > > deserves. > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining > the > > > > > > connector > > > > > > > > > > outside > > > > > > > > > > > of > > > > > > > > > > > > > > Flink. We could link to it from the documentation > > to > > > > give > > > > > > it > > > > > > > > more > > > > > > > > > > > > > exposure. > > > > > > > > > > > > > > With the upcoming page for sharing artifacts > among > > > the > > > > > > > > community > > > > > > > > > > > (what's > > > > > > > > > > > > > > the state of that anyway?), this may be a better > > > > option. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks a lot for starting this discussion > Yijie. > > I > > > > think > > > > > > > the > > > > > > > > > > Pulsar > > > > > > > > > > > > > > > connector would be a very valuable addition > since > > > > Pulsar > > > > > > > > > becomes > > > > > > > > > > > more > > > > > > > > > > > > > and > > > > > > > > > > > > > > > more popular and it would further expand > Flink's > > > > > > > > > > interoperability. > > > > > > > > > > > Also > > > > > > > > > > > > > > > from a project perspective it makes sense for > me > > to > > > > place > > > > > > > the > > > > > > > > > > > connector > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My main concern/question is how can the Flink > > > > community > > > > > > > > > maintain > > > > > > > > > > > the > > > > > > > > > > > > > > > connector? We have seen in the past that > > connectors > > > > are > > > > > > > some > > > > > > > > of > > > > > > > > > > the > > > > > > > > > > > > > most > > > > > > > > > > > > > > > actively developed components because they need > > to > > > be > > > > > > kept > > > > > > > in > > > > > > > > > > sync > > > > > > > > > > > with > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > external system and with Flink. Given that the > > > Pulsar > > > > > > > > community > > > > > > > > > > is > > > > > > > > > > > > > > willing > > > > > > > > > > > > > > > to help with maintaining, improving and > evolving > > > the > > > > > > > > connector, > > > > > > > > > > I'm > > > > > > > > > > > > > > > optimistic that we can achieve this. Hence, +1 > > for > > > > > > > > contributing > > > > > > > > > > it > > > > > > > > > > > back > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo < > > > > > > > [hidden email] > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> Since I was the main driver behind FLINK-9641 > > and > > > > > > > > FLINK-9168, > > > > > > > > > > let > > > > > > > > > > > me > > > > > > > > > > > > > > try to > > > > > > > > > > > > > > >> add more context on this. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> FLINK-9641 and FLINK-9168 was created for > > bringing > > > > > > Pulsar > > > > > > > as > > > > > > > > > > > source > > > > > > > > > > > > > and > > > > > > > > > > > > > > >> sink for Flink. The integration was done with > > > Flink > > > > > > 1.6.0. > > > > > > > > We > > > > > > > > > > > sent out > > > > > > > > > > > > > > pull > > > > > > > > > > > > > > >> requests about a year ago and we ended up > > > > maintaining > > > > > > > those > > > > > > > > > > > connectors > > > > > > > > > > > > > > in > > > > > > > > > > > > > > >> Pulsar for Pulsar users to use Flink to > process > > > > event > > > > > > > > streams > > > > > > > > > in > > > > > > > > > > > > > Pulsar. > > > > > > > > > > > > > > >> (See > > > > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > > > > > > > > ). > > > > > > > > > > > The > > > > > > > > > > > > > > Flink > > > > > > > > > > > > > > >> 1.6 integration is pretty simple and there is > no > > > > schema > > > > > > > > > > > > > considerations. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> In the past year, we have made a lot of > changes > > in > > > > > > Pulsar > > > > > > > > and > > > > > > > > > > > brought > > > > > > > > > > > > > > >> Pulsar schema as the first-class citizen in > > > Pulsar. > > > > We > > > > > > > also > > > > > > > > > > > integrated > > > > > > > > > > > > > > with > > > > > > > > > > > > > > >> other computing engines for processing Pulsar > > > event > > > > > > > streams > > > > > > > > > with > > > > > > > > > > > > > Pulsar > > > > > > > > > > > > > > >> schema. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> It led us to rethink how to integrate with > Flink > > > in > > > > the > > > > > > > best > > > > > > > > > > way. > > > > > > > > > > > Then > > > > > > > > > > > > > > we > > > > > > > > > > > > > > >> reimplement the pulsar-flink connectors from > the > > > > ground > > > > > > up > > > > > > > > > with > > > > > > > > > > > schema > > > > > > > > > > > > > > and > > > > > > > > > > > > > > >> bring table API and catalog API as the > > first-class > > > > > > citizen > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > > > > >> integration. With that being said, in the new > > > > > > pulsar-flink > > > > > > > > > > > > > > implementation, > > > > > > > > > > > > > > >> you can register pulsar as a flink catalog and > > > > query / > > > > > > > > process > > > > > > > > > > the > > > > > > > > > > > > > event > > > > > > > > > > > > > > >> streams using Flink SQL. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> This is an example about how to use Pulsar as > a > > > > Flink > > > > > > > > catalog: > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> Yijie has also written a blog post explaining > > why > > > we > > > > > > > > > > re-implement > > > > > > > > > > > the > > > > > > > > > > > > > > flink > > > > > > > > > > > > > > >> connector with Flink 1.9 and what are the > > changes > > > we > > > > > > made > > > > > > > in > > > > > > > > > the > > > > > > > > > > > new > > > > > > > > > > > > > > >> connector: > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> We believe Pulsar is not just a simple data > sink > > > or > > > > > > source > > > > > > > > for > > > > > > > > > > > Flink. > > > > > > > > > > > > > It > > > > > > > > > > > > > > >> actually can be a fully integrated streaming > > data > > > > > > storage > > > > > > > > for > > > > > > > > > > > Flink in > > > > > > > > > > > > > > many > > > > > > > > > > > > > > >> areas (sink, source, schema/catalog and > state). > > > The > > > > > > > > > combination > > > > > > > > > > of > > > > > > > > > > > > > Flink > > > > > > > > > > > > > > >> and Pulsar can create a great streaming > > warehouse > > > > > > > > architecture > > > > > > > > > > for > > > > > > > > > > > > > > >> streaming-first, unified data processing. > Since > > we > > > > are > > > > > > > > talking > > > > > > > > > > to > > > > > > > > > > > > > > >> contribute Pulsar integration to Flink here, > we > > > are > > > > also > > > > > > > > > > > dedicated to > > > > > > > > > > > > > > >> maintain, improve and evolve the integration > > with > > > > Flink > > > > > > to > > > > > > > > > help > > > > > > > > > > > the > > > > > > > > > > > > > > users > > > > > > > > > > > > > > >> who use both Flink and Pulsar. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> Hope this give you a bit more background about > > the > > > > > > pulsar > > > > > > > > > flink > > > > > > > > > > > > > > >> integration. Let me know what are your > thoughts. > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> Thanks, > > > > > > > > > > > > > > >> Sijie > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang < > > > > > > > [hidden email]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > >>> Hi Yijie > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> I can see that Pulsar becomes more and more > > > popular > > > > > > > > recently > > > > > > > > > > and > > > > > > > > > > > very > > > > > > > > > > > > > > >> glad > > > > > > > > > > > > > > >>> to see more people willing to contribute to > > Flink > > > > > > > > ecosystem. > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> Before any further discussion, would you > please > > > > give > > > > > > some > > > > > > > > > > > explanation > > > > > > > > > > > > > > of > > > > > > > > > > > > > > >>> the relationship between this thread to > current > > > > > > existing > > > > > > > > > JIRAs > > > > > > > > > > of > > > > > > > > > > > > > > pulsar > > > > > > > > > > > > > > >>> source [1] and sink [2] connector? Will the > > > > > > contribution > > > > > > > > > > contains > > > > > > > > > > > > > part > > > > > > > > > > > > > > of > > > > > > > > > > > > > > >>> those PRs or totally different > implementation? > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> [1] > > > > https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > > > > > > > > >>> [2] > > > > https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> Best > > > > > > > > > > > > > > >>> Yun Tang > > > > > > > > > > > > > > >>> ________________________________ > > > > > > > > > > > > > > >>> From: Yijie Shen <[hidden email]> > > > > > > > > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > > > > > > > > >>> To: [hidden email] < > [hidden email] > > > > > > > > > > > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink > > > > connector > > > > > > back > > > > > > > > to > > > > > > > > > > > Flink > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> Dear Flink Community! > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> I would like to open the discussion of > > > contributing > > > > > > > Pulsar > > > > > > > > > > Flink > > > > > > > > > > > > > > >>> connector [0] back to Flink. > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, > > > > high-performance > > > > > > > > > > distributed > > > > > > > > > > > > > > >>> pub-sub messaging system. Pulsar includes > > > multiple > > > > > > > features > > > > > > > > > > such > > > > > > > > > > > as > > > > > > > > > > > > > > >>> native support for multiple clusters in a > > Pulsar > > > > > > > instance, > > > > > > > > > with > > > > > > > > > > > > > > >>> seamless geo-replication of messages across > > > > clusters, > > > > > > > very > > > > > > > > > low > > > > > > > > > > > > > publish > > > > > > > > > > > > > > >>> and end-to-end latency, seamless scalability > to > > > > over a > > > > > > > > > million > > > > > > > > > > > > > topics, > > > > > > > > > > > > > > >>> and guaranteed message delivery with > persistent > > > > message > > > > > > > > > storage > > > > > > > > > > > > > > >>> provided by Apache BookKeeper. Nowadays, > Pulsar > > > has > > > > > > been > > > > > > > > > > adopted > > > > > > > > > > > by > > > > > > > > > > > > > > >>> more and more companies[2]. > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> The Pulsar Flink connector we are planning to > > > > > > contribute > > > > > > > is > > > > > > > > > > built > > > > > > > > > > > > > upon > > > > > > > > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main > features > > > > are: > > > > > > > > > > > > > > >>> - Pulsar as a streaming source with > > exactly-once > > > > > > > guarantee. > > > > > > > > > > > > > > >>> - Sink streaming results to Pulsar with > > > > at-least-once > > > > > > > > > > semantics. > > > > > > > > > > > (We > > > > > > > > > > > > > > >>> would update this to exactly-once as well > when > > > > Pulsar > > > > > > > gets > > > > > > > > > all > > > > > > > > > > > > > > >>> transaction features ready in its 2.5.0 > > version) > > > > > > > > > > > > > > >>> - Build upon Flink new Table API Type system > > > > > > > (FLIP-37[3]), > > > > > > > > > and > > > > > > > > > > > can > > > > > > > > > > > > > > >>> automatically (de)serialize messages with the > > > help > > > > of > > > > > > > > Pulsar > > > > > > > > > > > schema. > > > > > > > > > > > > > > >>> - Integrate with Flink new Catalog API > > > > (FLIP-30[4]), > > > > > > > which > > > > > > > > > > > enables > > > > > > > > > > > > > the > > > > > > > > > > > > > > >>> use of Pulsar topics as tables in Table API > as > > > > well as > > > > > > > SQL > > > > > > > > > > > client. > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> ## Reference > > > > > > > > > > > > > > >>> [0] > > https://github.com/streamnative/pulsar-flink > > > > > > > > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > > > > > > > > >>> [2] https://pulsar.apache.org/en/powered-by/ > > > > > > > > > > > > > > >>> [3] > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > > > > > > > > >>> [4] > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > >>> Best, > > > > > > > > > > > > > > >>> Yijie Shen > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi All,
Sorry for joining the discussion late and thanks Yijie & Sijie for driving the discussion. I also think the Pulsar connector would be a very valuable addition to Flink. I can also help out a bit on the review side :-) Regarding the timeline, I also share concerns with Becket on the relationship between the new Pulsar connector and FLIP-27. There's also another discussion just started by Stephan on dropping Kafka 9/10 support on next Flink release [1]. Although the situation is somewhat different, and Kafka 9/10 connector has been in Flink for almost 3-4 years, based on the discussion I am not sure if a major version release is a requirement for removing old connector supports. I think there shouldn't be a blocker if we agree the old connector will be removed once FLIP-27 based Pulsar connector is there. As Stephan stated, it is easier to contribute the source sooner and adjust it later. We should also ensure we clearly communicate the message: for example, putting an experimental flag on the pre-FLIP27 connector page of the website, documentations, etc. Any other thoughts? -- Rong [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/DISCUSS-Drop-older-versions-of-Kafka-Connectors-0-9-0-10-for-Flink-1-10-td29916.html On Fri, Sep 13, 2019 at 8:15 AM Becket Qin <[hidden email]> wrote: > Technically speaking, removing the old connector code is a backwards > incompatible change which requires a major version bump, i.e. Flink 2.x. > Given that we don't have a clear plan on when to have the next major > version release, it seems unclear how long the old connector code will be > there if we check it in right now. Or will we remove the old connector > without a major version bump? In any case, it sounds not quite user > friendly to the those who might use the old Pulsar connector. I am not sure > if it is worth these potential problems in order to have the Pulsar source > connector checked in one or two months earlier. > > Thanks, > > Jiangjie (Becket) Qin > > On Thu, Sep 12, 2019 at 3:52 PM Stephan Ewen <[hidden email]> wrote: > > > Agreed, if we check in the old code, we should make it clear that it will > > be removed as soon as the FLIP-27 based version of the connector is > there. > > We should not commit to maintaining the old version, that would be indeed > > too much overhead. > > > > On Thu, Sep 12, 2019 at 3:30 AM Becket Qin <[hidden email]> wrote: > > > > > Hi Stephan, > > > > > > Thanks for the volunteering to help. > > > > > > Yes, the overhead would just be review capacity. In fact, I am not > > worrying > > > too much about the review capacity. That is just a one time cost. My > > > concern is mainly about the long term burden. Assume we have new source > > > interface ready in 1.10 with newly added Pulsar connectors in old > > > interface. Later on if we migrate Pulsar to new source interface, the > old > > > Pulsar connector might be deprecated almost immediately after checked > in, > > > but we may still have to maintain two code bases. For the existing > > > connectors, we have to do that anyways. But it would be good to avoid > > > introducing a new connector with the same problem. > > > > > > Thanks, > > > > > > Jiangjie (Becket) Qin > > > > > > On Tue, Sep 10, 2019 at 6:51 PM Stephan Ewen <[hidden email]> wrote: > > > > > > > Hi all! > > > > > > > > Nice to see this lively discussion about the Pulsar connector. > > > > Some thoughts on the open questions: > > > > > > > > ## Contribute to Flink or maintain as a community package > > > > > > > > Looks like the discussion is more going towards contribution. I think > > > that > > > > is good, especially if we think that we want to build a similarly > deep > > > > integration with Pulsar as we have for example with Kafka. The > > connector > > > > already looks like a more thorough connector than many others we have > > in > > > > the repository. > > > > > > > > With either a repo split, or the new build system, I hope that the > > build > > > > overhead is not a problem. > > > > > > > > ## Committer Support > > > > > > > > Becket offered some help already, I can also help a bit. I hope that > > > > between us, we can cover this. > > > > > > > > ## Contribute now, or wait for FLIP-27 > > > > > > > > As Becket said, FLIP-27 is actually making some PoC-ing progress, but > > > will > > > > take 2 more months, I would estimate, before it is fully available. > > > > > > > > If we want to be on the safe side with the contribution, we should > > > > contribute the source sooner and adjust it later. That would also > help > > us > > > > in case things get crazy towards the 1.10 feature freeze and it would > > be > > > > hard to find time to review the new changes. > > > > What would be the overhead of contributing now? Given that the code > is > > > > already there, it looks like it would be only review capacity, right? > > > > > > > > Best, > > > > Stephan > > > > > > > > On Tue, Sep 10, 2019 at 11:04 AM Yijie Shen < > [hidden email] > > > > > > > wrote: > > > > > > > > > Hi everyone! > > > > > > > > > > Thanks for your attention and the promotion of this work. > > > > > > > > > > We will prepare a FLIP as soon as possible for more specific > > > discussions. > > > > > > > > > > For FLIP-27, it seems that we have not reached a consensus. > > Therefore, > > > > > I will explain all the functionalities of the existing connector in > > > > > the FLIP (including Source, Sink, and Catalog) to continue our > > > > > discussions in FLIP. > > > > > > > > > > Thanks for your kind help. > > > > > > > > > > Best, > > > > > Yijie > > > > > > > > > > On Tue, Sep 10, 2019 at 9:57 AM Becket Qin <[hidden email]> > > > wrote: > > > > > > > > > > > > Hi Sijie, > > > > > > > > > > > > If we agree that the goal is to have Pulsar connector in 1.10, > how > > > > about > > > > > we > > > > > > do the following: > > > > > > > > > > > > 0. Start a FLIP to add Pulsar connector to Flink main repo as it > > is a > > > > new > > > > > > public interface to Flink main repo. > > > > > > 1. Start to review the Pulsar sink right away as there is no > change > > > to > > > > > the > > > > > > sink interface so far. > > > > > > 2. Wait a little bit on FLIP-27. Flink 1.10 is going to be code > > > freeze > > > > in > > > > > > late Nov and let's say we give a month to the development and > > review > > > of > > > > > > Pulsar connector, we need to have FLIP-27 by late Oct. There are > > > still > > > > 7 > > > > > > weeks. Personally I think it is doable. If FLIP-27 is not ready > by > > > late > > > > > > Oct, we can review and check in Pulsar connector with the > existing > > > > source > > > > > > interface. This means we will have Pulsar connector in Flink > 1.10, > > > > either > > > > > > with or without FLIP-27. > > > > > > > > > > > > Because we are going to have Pulsar sink and source checked in > > > > > separately, > > > > > > it might make sense to have two FLIPs, one for Pulsar sink and > > > another > > > > > for > > > > > > Pulsar source. And we can start the work on Pulsar sink right > away. > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > On Mon, Sep 9, 2019 at 4:13 PM Sijie Guo <[hidden email]> > > wrote: > > > > > > > > > > > > > Thank you Bowen and Becket. > > > > > > > > > > > > > > What's the take from Flink community? Shall we wait for FLIP-27 > > or > > > > > shall we > > > > > > > proceed to next steps? And what the next steps are? :-) > > > > > > > > > > > > > > Thanks, > > > > > > > Sijie > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 2:43 PM Bowen Li <[hidden email]> > > > wrote: > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I think having a Pulsar connector in Flink can be a good > mutual > > > > > benefit > > > > > > > to > > > > > > > > both communities. > > > > > > > > > > > > > > > > Another perspective is that Pulsar connector is the 1st > > streaming > > > > > > > connector > > > > > > > > that integrates with Flink's metadata management system and > > > Catalog > > > > > APIs. > > > > > > > > It'll be cool to see how the integration turns out and > whether > > we > > > > > need to > > > > > > > > improve Flink Catalog stack, which are currently in Beta, to > > > cater > > > > to > > > > > > > > streaming source/sink. Thus I'm in favor of merging Pulsar > > > > connector > > > > > into > > > > > > > > Flink 1.10. > > > > > > > > > > > > > > > > I'd suggest to submit smaller sized PRs, e.g. maybe one for > > basic > > > > > > > > source/sink functionalities and another for schema and > catalog > > > > > > > integration, > > > > > > > > just to make them easier to review. > > > > > > > > > > > > > > > > It doesn't seem to hurt to wait for FLIP-27. But I don't > think > > > > > FLIP-27 > > > > > > > > should be a blocker in cases where it cannot make its way > into > > > 1.10 > > > > > or > > > > > > > > doesn't leave reasonable amount of time for committers to > > review > > > or > > > > > for > > > > > > > > Pulsar connector to fully adapt to new interfaces. > > > > > > > > > > > > > > > > Bowen > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:21 AM Becket Qin < > > [hidden email]> > > > > > wrote: > > > > > > > > > > > > > > > > > Hi Till, > > > > > > > > > > > > > > > > > > You are right. It all depends on when the new source > > interface > > > is > > > > > going > > > > > > > > to > > > > > > > > > be ready. Personally I think it would be there in about a > > month > > > > or > > > > > so. > > > > > > > > But > > > > > > > > > I could be too optimistic. It would also be good to hear > what > > > do > > > > > > > Aljoscha > > > > > > > > > and Stephan think as they are also involved in FLIP-27. > > > > > > > > > > > > > > > > > > In general I think we should have Pulsar connector in Flink > > > 1.10, > > > > > > > > > preferably with the new source interface. We can also check > > it > > > in > > > > > right > > > > > > > > now > > > > > > > > > with old source interface, but I suspect few users will use > > it > > > > > before > > > > > > > the > > > > > > > > > next official release. Therefore, it seems reasonable to > > wait a > > > > > little > > > > > > > > bit > > > > > > > > > to see whether we can jump to the new source interface. As > > long > > > > as > > > > > we > > > > > > > > make > > > > > > > > > sure Flink 1.10 has it, waiting a little bit doesn't seem > to > > > hurt > > > > > much. > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 3:59 PM Till Rohrmann < > > > > [hidden email] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > I'm wondering what the problem would be if we committed > the > > > > > Pulsar > > > > > > > > > > connector before the new source interface is ready. If I > > > > > understood > > > > > > > it > > > > > > > > > > correctly, then we need to support the old source > interface > > > > > anyway > > > > > > > for > > > > > > > > > the > > > > > > > > > > existing connectors. By checking it in early I could see > > the > > > > > benefit > > > > > > > > that > > > > > > > > > > our users could start using the connector earlier. > > Moreover, > > > it > > > > > would > > > > > > > > > > prevent that the Pulsar integration is being delayed in > > case > > > > > that the > > > > > > > > > > source interface should be delayed. The only downside I > see > > > is > > > > > the > > > > > > > > extra > > > > > > > > > > review effort and potential fixes which might be > irrelevant > > > for > > > > > the > > > > > > > new > > > > > > > > > > source interface implementation. I guess it mainly > depends > > on > > > > how > > > > > > > > certain > > > > > > > > > > we are when the new source interface will be ready. > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:56 AM Becket Qin < > > > > [hidden email]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hi Sijie and Yijie, > > > > > > > > > > > > > > > > > > > > > > Thanks for sharing your thoughts. > > > > > > > > > > > > > > > > > > > > > > Just want to have some update on FLIP-27. Although the > > FLIP > > > > > wiki > > > > > > > and > > > > > > > > > > > discussion thread has been quiet for some time, a few > > > > > committer / > > > > > > > > > > > contributors in Flink community were actually > prototyping > > > the > > > > > > > entire > > > > > > > > > > thing. > > > > > > > > > > > We have made some good progress there but want to > update > > > the > > > > > FLIP > > > > > > > > wiki > > > > > > > > > > > after the entire thing is verified to work in case > there > > > are > > > > > some > > > > > > > > last > > > > > > > > > > > minute surprise in the implementation. I don't have an > > > exact > > > > > ETA > > > > > > > yet, > > > > > > > > > > but I > > > > > > > > > > > guess it is going to be within a month or so. > > > > > > > > > > > > > > > > > > > > > > I am happy to review the current Flink Pulsar connector > > and > > > > > see if > > > > > > > it > > > > > > > > > > would > > > > > > > > > > > fit in FLIP-27. It would be good to avoid the case that > > we > > > > > checked > > > > > > > in > > > > > > > > > the > > > > > > > > > > > Pulsar connector with some review efforts and shortly > > after > > > > > that > > > > > > > the > > > > > > > > > new > > > > > > > > > > > Source interface is ready. > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 8:39 AM Yijie Shen < > > > > > > > [hidden email] > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Thanks for all the feedback and suggestions! > > > > > > > > > > > > > > > > > > > > > > > > As Sijie said, the goal of the connector has always > > been > > > to > > > > > > > provide > > > > > > > > > > > > users with the latest features of both systems as > soon > > as > > > > > > > possible. > > > > > > > > > We > > > > > > > > > > > > propose to contribute the connector to Flink and hope > > to > > > > get > > > > > more > > > > > > > > > > > > suggestions and feedback from Flink experts to ensure > > the > > > > > high > > > > > > > > > quality > > > > > > > > > > > > of the connector. > > > > > > > > > > > > > > > > > > > > > > > > For FLIP-27, we noticed its existence at the > beginning > > of > > > > > > > reworking > > > > > > > > > > > > the connector implementation based on Flink 1.9; we > > also > > > > > wanted > > > > > > > to > > > > > > > > > > > > build a connector that supports both batch and stream > > > > > computing > > > > > > > > based > > > > > > > > > > > > on it. > > > > > > > > > > > > However, it has been inactive for some time, so we > > > decided > > > > to > > > > > > > > provide > > > > > > > > > > > > a connector with most of the new features, such as > the > > > new > > > > > type > > > > > > > > > system > > > > > > > > > > > > and the new catalog API first. We will pay attention > to > > > the > > > > > > > > progress > > > > > > > > > > > > of FLIP-27 continually and incorporate it with the > > > > connector > > > > > as > > > > > > > > soon > > > > > > > > > > > > as possible. > > > > > > > > > > > > > > > > > > > > > > > > Regarding the test status of the connector, we are > > > > following > > > > > the > > > > > > > > > other > > > > > > > > > > > > connectors' test in Flink repository and aimed to > > provide > > > > > > > > throughout > > > > > > > > > > > > tests as we could. We are also happy to hear > > suggestions > > > > and > > > > > > > > > > > > supervision from the Flink community to improve the > > > > > stability and > > > > > > > > > > > > performance of the connector continuously. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > Yijie > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 5, 2019 at 5:59 AM Sijie Guo < > > > > [hidden email] > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks everyone for the comments and feedback. > > > > > > > > > > > > > > > > > > > > > > > > > > It seems to me that the main question here is > about - > > > > "how > > > > > can > > > > > > > > the > > > > > > > > > > > Flink > > > > > > > > > > > > > community maintain the connector?". > > > > > > > > > > > > > > > > > > > > > > > > > > Here are two thoughts from myself. > > > > > > > > > > > > > > > > > > > > > > > > > > 1) I think how and where to host this integration > is > > > kind > > > > > of > > > > > > > less > > > > > > > > > > > > important > > > > > > > > > > > > > here. I believe there can be many ways to achieve > it. > > > > > > > > > > > > > As part of the contribution, what we are looking > for > > > here > > > > > is > > > > > > > how > > > > > > > > > > these > > > > > > > > > > > > two > > > > > > > > > > > > > communities can build the collaboration > relationship > > on > > > > > > > > developing > > > > > > > > > > > > > the integration between Pulsar and Flink. Even we > can > > > try > > > > > our > > > > > > > > best > > > > > > > > > to > > > > > > > > > > > > catch > > > > > > > > > > > > > up all the updates in Flink community. We are still > > > > > > > > > > > > > facing the fact that we have less experiences in > > Flink > > > > than > > > > > > > folks > > > > > > > > > in > > > > > > > > > > > > Flink > > > > > > > > > > > > > community. In order to make sure we maintain and > > > deliver > > > > > > > > > > > > > a high-quality pulsar-flink integration to the > users > > > who > > > > > use > > > > > > > both > > > > > > > > > > > > > technologies, we need some help from the experts > from > > > > Flink > > > > > > > > > > community. > > > > > > > > > > > > > > > > > > > > > > > > > > 2) We have been following FLIP-27 for a while. > > > Originally > > > > > we > > > > > > > were > > > > > > > > > > > > thinking > > > > > > > > > > > > > of contributing the connectors back after > integrating > > > > with > > > > > the > > > > > > > > > > > > > new API introduced in FLIP-27. But we decided to > > > initiate > > > > > the > > > > > > > > > > > > conversation > > > > > > > > > > > > > as early as possible. Because we believe there are > > more > > > > > > > benefits > > > > > > > > > > doing > > > > > > > > > > > > > it now rather than later. As part of contribution, > it > > > can > > > > > help > > > > > > > > > Flink > > > > > > > > > > > > > community understand more about Pulsar and the > > > potential > > > > > > > > > integration > > > > > > > > > > > > points. > > > > > > > > > > > > > Also we can also help Flink community verify the > new > > > > > connector > > > > > > > > API > > > > > > > > > as > > > > > > > > > > > > well > > > > > > > > > > > > > as other new API (e.g. catalog API). > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > Sijie > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 5:24 AM Becket Qin < > > > > > > > [hidden email]> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Yijie, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks for the interest in contributing the > Pulsar > > > > > connector. > > > > > > > > > > > > > > > > > > > > > > > > > > > > In general, I think having Pulsar connector with > > > strong > > > > > > > support > > > > > > > > > is > > > > > > > > > > a > > > > > > > > > > > > > > valuable addition to Flink. So I am happy the > > > shepherd > > > > > this > > > > > > > > > effort. > > > > > > > > > > > > > > Meanwhile, I would also like to provide some > > context > > > > and > > > > > > > recent > > > > > > > > > > > > efforts on > > > > > > > > > > > > > > the Flink connectors ecosystem. > > > > > > > > > > > > > > > > > > > > > > > > > > > > The current way Flink maintains its connector has > > hit > > > > the > > > > > > > > > > scalability > > > > > > > > > > > > bar. > > > > > > > > > > > > > > With more and more connectors coming into Flink > > repo, > > > > we > > > > > are > > > > > > > > > > facing a > > > > > > > > > > > > few > > > > > > > > > > > > > > problems such as long build and testing time. To > > > > address > > > > > this > > > > > > > > > > > problem, > > > > > > > > > > > > we > > > > > > > > > > > > > > have attempted to do the following: > > > > > > > > > > > > > > 1. Split out the connectors into a separate > > > repository. > > > > > This > > > > > > > is > > > > > > > > > > > > temporarily > > > > > > > > > > > > > > on hold due to potential solution to shorten the > > > build > > > > > time. > > > > > > > > > > > > > > 2. Encourage the connectors to stay as ecosystem > > > > project > > > > > > > while > > > > > > > > > > Flink > > > > > > > > > > > > tries > > > > > > > > > > > > > > to provide good support for functionality and > > > > > compatibility > > > > > > > > > tests. > > > > > > > > > > > > Robert > > > > > > > > > > > > > > has driven to create a Flink Ecosystem project > > > website > > > > > and it > > > > > > > > is > > > > > > > > > > > going > > > > > > > > > > > > > > through some final approval process. > > > > > > > > > > > > > > > > > > > > > > > > > > > > Given the above efforts, it would be great to > first > > > see > > > > > if we > > > > > > > > can > > > > > > > > > > > have > > > > > > > > > > > > > > Pulsar connector as an ecosystem project with > great > > > > > support. > > > > > > > It > > > > > > > > > > would > > > > > > > > > > > > be > > > > > > > > > > > > > > good to hear how the Flink Pulsar connector is > > tested > > > > > > > currently > > > > > > > > > to > > > > > > > > > > > see > > > > > > > > > > > > if > > > > > > > > > > > > > > we can learn something to maintain it as an > > ecosystem > > > > > project > > > > > > > > > with > > > > > > > > > > > good > > > > > > > > > > > > > > quality and test coverage. If the quality as an > > > > ecosystem > > > > > > > > project > > > > > > > > > > is > > > > > > > > > > > > hard > > > > > > > > > > > > > > to guarantee, we may as well adopt it into the > main > > > > repo. > > > > > > > > > > > > > > > > > > > > > > > > > > > > BTW, another ongoing effort is FLIP-27 where we > are > > > > > making > > > > > > > > > changes > > > > > > > > > > to > > > > > > > > > > > > the > > > > > > > > > > > > > > Flink source connector architecture and > interface. > > > This > > > > > > > change > > > > > > > > > will > > > > > > > > > > > > likely > > > > > > > > > > > > > > land in 1.10. Therefore timing wise, if we are > > going > > > to > > > > > have > > > > > > > > the > > > > > > > > > > > Pulsar > > > > > > > > > > > > > > connector in main repo, I am wondering if we > should > > > > hold > > > > > a > > > > > > > > little > > > > > > > > > > bit > > > > > > > > > > > > and > > > > > > > > > > > > > > let the Pulsar connector adapt to the new > interface > > > to > > > > > avoid > > > > > > > > > > shortly > > > > > > > > > > > > > > deprecated work? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > Jiangjie (Becket) Qin > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 4:32 PM Chesnay Schepler < > > > > > > > > > > [hidden email]> > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I'm quite worried that we may end up repeating > > > > history. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > There were already 2 attempts at contributing a > > > > pulsar > > > > > > > > > connector, > > > > > > > > > > > > both > > > > > > > > > > > > > > > of which failed because no committer was > getting > > > > > involved, > > > > > > > > > > despite > > > > > > > > > > > > the > > > > > > > > > > > > > > > contributor opening a dedicated discussion > thread > > > > > about the > > > > > > > > > > > > contribution > > > > > > > > > > > > > > > beforehand and getting several +1's from > > > committers. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > We should really make sure that if we > > > welcome/approve > > > > > such > > > > > > > a > > > > > > > > > > > > > > > contribution it will actually get the attention > > it > > > > > > > deserves. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > As such, I'm inclined to recommend maintaining > > the > > > > > > > connector > > > > > > > > > > > outside > > > > > > > > > > > > of > > > > > > > > > > > > > > > Flink. We could link to it from the > documentation > > > to > > > > > give > > > > > > > it > > > > > > > > > more > > > > > > > > > > > > > > exposure. > > > > > > > > > > > > > > > With the upcoming page for sharing artifacts > > among > > > > the > > > > > > > > > community > > > > > > > > > > > > (what's > > > > > > > > > > > > > > > the state of that anyway?), this may be a > better > > > > > option. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 04/09/2019 10:16, Till Rohrmann wrote: > > > > > > > > > > > > > > > > Hi everyone, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > thanks a lot for starting this discussion > > Yijie. > > > I > > > > > think > > > > > > > > the > > > > > > > > > > > Pulsar > > > > > > > > > > > > > > > > connector would be a very valuable addition > > since > > > > > Pulsar > > > > > > > > > > becomes > > > > > > > > > > > > more > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > > more popular and it would further expand > > Flink's > > > > > > > > > > > interoperability. > > > > > > > > > > > > Also > > > > > > > > > > > > > > > > from a project perspective it makes sense for > > me > > > to > > > > > place > > > > > > > > the > > > > > > > > > > > > connector > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > > the downstream project. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > My main concern/question is how can the Flink > > > > > community > > > > > > > > > > maintain > > > > > > > > > > > > the > > > > > > > > > > > > > > > > connector? We have seen in the past that > > > connectors > > > > > are > > > > > > > > some > > > > > > > > > of > > > > > > > > > > > the > > > > > > > > > > > > > > most > > > > > > > > > > > > > > > > actively developed components because they > need > > > to > > > > be > > > > > > > kept > > > > > > > > in > > > > > > > > > > > sync > > > > > > > > > > > > with > > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > > external system and with Flink. Given that > the > > > > Pulsar > > > > > > > > > community > > > > > > > > > > > is > > > > > > > > > > > > > > > willing > > > > > > > > > > > > > > > > to help with maintaining, improving and > > evolving > > > > the > > > > > > > > > connector, > > > > > > > > > > > I'm > > > > > > > > > > > > > > > > optimistic that we can achieve this. Hence, > +1 > > > for > > > > > > > > > contributing > > > > > > > > > > > it > > > > > > > > > > > > back > > > > > > > > > > > > > > > to > > > > > > > > > > > > > > > > Flink. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Cheers, > > > > > > > > > > > > > > > > Till > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Sep 4, 2019 at 2:03 AM Sijie Guo < > > > > > > > > [hidden email] > > > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Yun, > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> Since I was the main driver behind > FLINK-9641 > > > and > > > > > > > > > FLINK-9168, > > > > > > > > > > > let > > > > > > > > > > > > me > > > > > > > > > > > > > > > try to > > > > > > > > > > > > > > > >> add more context on this. > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> FLINK-9641 and FLINK-9168 was created for > > > bringing > > > > > > > Pulsar > > > > > > > > as > > > > > > > > > > > > source > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > >> sink for Flink. The integration was done > with > > > > Flink > > > > > > > 1.6.0. > > > > > > > > > We > > > > > > > > > > > > sent out > > > > > > > > > > > > > > > pull > > > > > > > > > > > > > > > >> requests about a year ago and we ended up > > > > > maintaining > > > > > > > > those > > > > > > > > > > > > connectors > > > > > > > > > > > > > > > in > > > > > > > > > > > > > > > >> Pulsar for Pulsar users to use Flink to > > process > > > > > event > > > > > > > > > streams > > > > > > > > > > in > > > > > > > > > > > > > > Pulsar. > > > > > > > > > > > > > > > >> (See > > > > > > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-flink > > > > > > > > > > > ). > > > > > > > > > > > > The > > > > > > > > > > > > > > > Flink > > > > > > > > > > > > > > > >> 1.6 integration is pretty simple and there > is > > no > > > > > schema > > > > > > > > > > > > > > considerations. > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> In the past year, we have made a lot of > > changes > > > in > > > > > > > Pulsar > > > > > > > > > and > > > > > > > > > > > > brought > > > > > > > > > > > > > > > >> Pulsar schema as the first-class citizen in > > > > Pulsar. > > > > > We > > > > > > > > also > > > > > > > > > > > > integrated > > > > > > > > > > > > > > > with > > > > > > > > > > > > > > > >> other computing engines for processing > Pulsar > > > > event > > > > > > > > streams > > > > > > > > > > with > > > > > > > > > > > > > > Pulsar > > > > > > > > > > > > > > > >> schema. > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> It led us to rethink how to integrate with > > Flink > > > > in > > > > > the > > > > > > > > best > > > > > > > > > > > way. > > > > > > > > > > > > Then > > > > > > > > > > > > > > > we > > > > > > > > > > > > > > > >> reimplement the pulsar-flink connectors from > > the > > > > > ground > > > > > > > up > > > > > > > > > > with > > > > > > > > > > > > schema > > > > > > > > > > > > > > > and > > > > > > > > > > > > > > > >> bring table API and catalog API as the > > > first-class > > > > > > > citizen > > > > > > > > > in > > > > > > > > > > > the > > > > > > > > > > > > > > > >> integration. With that being said, in the > new > > > > > > > pulsar-flink > > > > > > > > > > > > > > > implementation, > > > > > > > > > > > > > > > >> you can register pulsar as a flink catalog > and > > > > > query / > > > > > > > > > process > > > > > > > > > > > the > > > > > > > > > > > > > > event > > > > > > > > > > > > > > > >> streams using Flink SQL. > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> This is an example about how to use Pulsar > as > > a > > > > > Flink > > > > > > > > > catalog: > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://github.com/streamnative/pulsar-flink/blob/3eeddec5625fc7dddc3f8a3ec69f72e1614ca9c9/README.md#use-pulsar-catalog > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> Yijie has also written a blog post > explaining > > > why > > > > we > > > > > > > > > > > re-implement > > > > > > > > > > > > the > > > > > > > > > > > > > > > flink > > > > > > > > > > > > > > > >> connector with Flink 1.9 and what are the > > > changes > > > > we > > > > > > > made > > > > > > > > in > > > > > > > > > > the > > > > > > > > > > > > new > > > > > > > > > > > > > > > >> connector: > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://medium.com/streamnative/use-apache-pulsar-as-streaming-table-with-8-lines-of-code-39033a93947f > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> We believe Pulsar is not just a simple data > > sink > > > > or > > > > > > > source > > > > > > > > > for > > > > > > > > > > > > Flink. > > > > > > > > > > > > > > It > > > > > > > > > > > > > > > >> actually can be a fully integrated streaming > > > data > > > > > > > storage > > > > > > > > > for > > > > > > > > > > > > Flink in > > > > > > > > > > > > > > > many > > > > > > > > > > > > > > > >> areas (sink, source, schema/catalog and > > state). > > > > The > > > > > > > > > > combination > > > > > > > > > > > of > > > > > > > > > > > > > > Flink > > > > > > > > > > > > > > > >> and Pulsar can create a great streaming > > > warehouse > > > > > > > > > architecture > > > > > > > > > > > for > > > > > > > > > > > > > > > >> streaming-first, unified data processing. > > Since > > > we > > > > > are > > > > > > > > > talking > > > > > > > > > > > to > > > > > > > > > > > > > > > >> contribute Pulsar integration to Flink here, > > we > > > > are > > > > > also > > > > > > > > > > > > dedicated to > > > > > > > > > > > > > > > >> maintain, improve and evolve the integration > > > with > > > > > Flink > > > > > > > to > > > > > > > > > > help > > > > > > > > > > > > the > > > > > > > > > > > > > > > users > > > > > > > > > > > > > > > >> who use both Flink and Pulsar. > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> Hope this give you a bit more background > about > > > the > > > > > > > pulsar > > > > > > > > > > flink > > > > > > > > > > > > > > > >> integration. Let me know what are your > > thoughts. > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> Thanks, > > > > > > > > > > > > > > > >> Sijie > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >> On Tue, Sep 3, 2019 at 11:54 AM Yun Tang < > > > > > > > > [hidden email]> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > >>> Hi Yijie > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> I can see that Pulsar becomes more and more > > > > popular > > > > > > > > > recently > > > > > > > > > > > and > > > > > > > > > > > > very > > > > > > > > > > > > > > > >> glad > > > > > > > > > > > > > > > >>> to see more people willing to contribute to > > > Flink > > > > > > > > > ecosystem. > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> Before any further discussion, would you > > please > > > > > give > > > > > > > some > > > > > > > > > > > > explanation > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > >>> the relationship between this thread to > > current > > > > > > > existing > > > > > > > > > > JIRAs > > > > > > > > > > > of > > > > > > > > > > > > > > > pulsar > > > > > > > > > > > > > > > >>> source [1] and sink [2] connector? Will the > > > > > > > contribution > > > > > > > > > > > contains > > > > > > > > > > > > > > part > > > > > > > > > > > > > > > of > > > > > > > > > > > > > > > >>> those PRs or totally different > > implementation? > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> [1] > > > > > https://issues.apache.org/jira/browse/FLINK-9641 > > > > > > > > > > > > > > > >>> [2] > > > > > https://issues.apache.org/jira/browse/FLINK-9168 > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> Best > > > > > > > > > > > > > > > >>> Yun Tang > > > > > > > > > > > > > > > >>> ________________________________ > > > > > > > > > > > > > > > >>> From: Yijie Shen < > [hidden email]> > > > > > > > > > > > > > > > >>> Sent: Tuesday, September 3, 2019 13:57 > > > > > > > > > > > > > > > >>> To: [hidden email] < > > [hidden email] > > > > > > > > > > > > > > > > > > > >>> Subject: [DISCUSS] Contribute Pulsar Flink > > > > > connector > > > > > > > back > > > > > > > > > to > > > > > > > > > > > > Flink > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> Dear Flink Community! > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> I would like to open the discussion of > > > > contributing > > > > > > > > Pulsar > > > > > > > > > > > Flink > > > > > > > > > > > > > > > >>> connector [0] back to Flink. > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> ## A brief introduction to Apache Pulsar > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> Apache Pulsar[1] is a multi-tenant, > > > > > high-performance > > > > > > > > > > > distributed > > > > > > > > > > > > > > > >>> pub-sub messaging system. Pulsar includes > > > > multiple > > > > > > > > features > > > > > > > > > > > such > > > > > > > > > > > > as > > > > > > > > > > > > > > > >>> native support for multiple clusters in a > > > Pulsar > > > > > > > > instance, > > > > > > > > > > with > > > > > > > > > > > > > > > >>> seamless geo-replication of messages across > > > > > clusters, > > > > > > > > very > > > > > > > > > > low > > > > > > > > > > > > > > publish > > > > > > > > > > > > > > > >>> and end-to-end latency, seamless > scalability > > to > > > > > over a > > > > > > > > > > million > > > > > > > > > > > > > > topics, > > > > > > > > > > > > > > > >>> and guaranteed message delivery with > > persistent > > > > > message > > > > > > > > > > storage > > > > > > > > > > > > > > > >>> provided by Apache BookKeeper. Nowadays, > > Pulsar > > > > has > > > > > > > been > > > > > > > > > > > adopted > > > > > > > > > > > > by > > > > > > > > > > > > > > > >>> more and more companies[2]. > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> ## The status of Pulsar Flink connector > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> The Pulsar Flink connector we are planning > to > > > > > > > contribute > > > > > > > > is > > > > > > > > > > > built > > > > > > > > > > > > > > upon > > > > > > > > > > > > > > > >>> Flink 1.9.0 and Pulsar 2.4.0. The main > > features > > > > > are: > > > > > > > > > > > > > > > >>> - Pulsar as a streaming source with > > > exactly-once > > > > > > > > guarantee. > > > > > > > > > > > > > > > >>> - Sink streaming results to Pulsar with > > > > > at-least-once > > > > > > > > > > > semantics. > > > > > > > > > > > > (We > > > > > > > > > > > > > > > >>> would update this to exactly-once as well > > when > > > > > Pulsar > > > > > > > > gets > > > > > > > > > > all > > > > > > > > > > > > > > > >>> transaction features ready in its 2.5.0 > > > version) > > > > > > > > > > > > > > > >>> - Build upon Flink new Table API Type > system > > > > > > > > (FLIP-37[3]), > > > > > > > > > > and > > > > > > > > > > > > can > > > > > > > > > > > > > > > >>> automatically (de)serialize messages with > the > > > > help > > > > > of > > > > > > > > > Pulsar > > > > > > > > > > > > schema. > > > > > > > > > > > > > > > >>> - Integrate with Flink new Catalog API > > > > > (FLIP-30[4]), > > > > > > > > which > > > > > > > > > > > > enables > > > > > > > > > > > > > > the > > > > > > > > > > > > > > > >>> use of Pulsar topics as tables in Table API > > as > > > > > well as > > > > > > > > SQL > > > > > > > > > > > > client. > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> ## Reference > > > > > > > > > > > > > > > >>> [0] > > > https://github.com/streamnative/pulsar-flink > > > > > > > > > > > > > > > >>> [1] https://pulsar.apache.org/ > > > > > > > > > > > > > > > >>> [2] > https://pulsar.apache.org/en/powered-by/ > > > > > > > > > > > > > > > >>> [3] > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System > > > > > > > > > > > > > > > >>> [4] > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > >>> Best, > > > > > > > > > > > > > > > >>> Yijie Shen > > > > > > > > > > > > > > > >>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |