I have opened the following issues to track new efforts to bring additional
Azure Support in the following areas to the connectors ecosystem. My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to the existing file system capabilities [1] and then have the other connectors FLINK-1856[3-7] exist as standalone plugins. As more users adopt the additional connectors, we could incrementally bring them into the core code base if necessary with sufficient support. I am new to the process so that I have a few questions: Do I need to create a FLIP [2] in order to make these changes to bring the new capabilities or the individual JIRA issues are sufficient? I am thinking about targeting Flink versions 1.10 through 1.12 For new connectors like this, how many versions can/should this be integrated into? Are there any upcoming changes to supported Java and Scala versions that I need to be aware of? Any ideas or suggestions you have would be great. Below is a summary of the JIRA issues that were created to track the effort Add Support for Azure Data Lake Store Gen 2 in Flink File System https://issues.apache.org/jira/browse/FLINK-18562 Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink https://issues.apache.org/jira/browse/FLINK-18568 Add Support for Azure Cosmos DB DataStream Connector https://issues.apache.org/jira/browse/FLINK-18563 Add Support for Azure Event Hub DataStream Connector https://issues.apache.org/jira/browse/FLINK-18564 Add Support for Azure Event Grid DataStream Connector https://issues.apache.org/jira/browse/FLINK-18565 Add Support for Azure Cognitive Search DataStream Connector https://issues.apache.org/jira/browse/FLINK-18566 Add Support for Azure Cognitive Search Table & SQL Connector https://issues.apache.org/jira/browse/FLINK-18567 [1] https://github.com/apache/flink/tree/master/flink-filesystems [2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals |
Hi Israel,
thanks a lot for reaching out! I'm very excited about your efforts to bring additional Azure support into Flink. There are ~50 threads on the user@ mailing list mentioning Azure -- that's good evidence that our users use Flink in Azure. Regarding your questions: Do I need to create a FLIP [2] in order to make these changes to bring the > new capabilities or the individual JIRA issues are sufficient? For the two DataLake FS tickets, I don't see the need for a FLIP. I am thinking about targeting Flink versions 1.10 through 1.12 > For new connectors like this, how many versions can/should this be > integrated into? We generally don't add new features to old releases (unless there's a very good reason to backport the feature). Therefore, the new integrations will all go into the next major Flink release (in this case probably Flink 1.12 for the first tickets). Are there any upcoming changes to supported Java and Scala versions that I > need to be aware of? No, I'm not aware of any upcoming changes. Java 8 and Java 11 are the two versions we test against. My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to > the existing file system capabilities [1] and then have the other > connectors FLINK-1856[3-7] exist as standalone plugins. I like the order in which you approach the tickets. I assigned you to the first ticket and commented on the second one. I'm also willing to help review your pull requests. What do you mean by "standalone plugins" in the context of connectors? Would you like to contribute these connectors to the main Flink codebase, or maintain them outside Flink but list them in flink-packages.org? Best, Robert On Wed, Jul 22, 2020 at 10:43 AM Israel Ekpo <[hidden email]> wrote: > I have opened the following issues to track new efforts to bring additional > Azure Support in the following areas to the connectors ecosystem. > > My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to > the existing file system capabilities [1] and then have the other > connectors FLINK-1856[3-7] exist as standalone plugins. > > As more users adopt the additional connectors, we could incrementally bring > them into the core code base if necessary with sufficient support. > > I am new to the process so that I have a few questions: > > Do I need to create a FLIP [2] in order to make these changes to bring the > new capabilities or the individual JIRA issues are sufficient? > > I am thinking about targeting Flink versions 1.10 through 1.12 > For new connectors like this, how many versions can/should this be > integrated into? > > Are there any upcoming changes to supported Java and Scala versions that I > need to be aware of? > > Any ideas or suggestions you have would be great. > > Below is a summary of the JIRA issues that were created to track the effort > > Add Support for Azure Data Lake Store Gen 2 in Flink File System > https://issues.apache.org/jira/browse/FLINK-18562 > > Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink > https://issues.apache.org/jira/browse/FLINK-18568 > > Add Support for Azure Cosmos DB DataStream Connector > https://issues.apache.org/jira/browse/FLINK-18563 > > Add Support for Azure Event Hub DataStream Connector > https://issues.apache.org/jira/browse/FLINK-18564 > > Add Support for Azure Event Grid DataStream Connector > https://issues.apache.org/jira/browse/FLINK-18565 > > Add Support for Azure Cognitive Search DataStream Connector > https://issues.apache.org/jira/browse/FLINK-18566 > > Add Support for Azure Cognitive Search Table & SQL Connector > https://issues.apache.org/jira/browse/FLINK-18567 > > > [1] https://github.com/apache/flink/tree/master/flink-filesystems > [2] > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > |
Thanks for the guidance Robert. I appreciate the prompt response and will
share the pull requests for the ADLS support [2] next week. Regarding the additional, I would like to contribute them to the main codebase [1] if possible I initially thought maybe it is better to start outside the core codebase but I think the adoption would be better if we have documentation on the core Flink documentation and reduce the level of effort necessary to integrate it for users. We will take time to make sure the docs and tests for the connectors are solid and then we can bring them one at a time into the core code base. [1] https://github.com/apache/flink/tree/master/flink-connectors [2] https://issues.apache.org/jira/browse/FLINK-18562 On Thu, Jul 23, 2020 at 3:41 AM Robert Metzger <[hidden email]> wrote: > Hi Israel, > > thanks a lot for reaching out! I'm very excited about your efforts to bring > additional Azure support into Flink. > There are ~50 threads on the user@ mailing list mentioning Azure -- that's > good evidence that our users use Flink in Azure. > > Regarding your questions: > > Do I need to create a FLIP [2] in order to make these changes to bring the > > new capabilities or the individual JIRA issues are sufficient? > > > For the two DataLake FS tickets, I don't see the need for a FLIP. > > I am thinking about targeting Flink versions 1.10 through 1.12 > > For new connectors like this, how many versions can/should this be > > integrated into? > > > We generally don't add new features to old releases (unless there's a very > good reason to backport the feature). Therefore, the new integrations will > all go into the next major Flink release (in this case probably Flink 1.12 > for the first tickets). > > Are there any upcoming changes to supported Java and Scala versions that I > > need to be aware of? > > > No, I'm not aware of any upcoming changes. Java 8 and Java 11 are the two > versions we test against. > > My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to > > the existing file system capabilities [1] and then have the other > > connectors FLINK-1856[3-7] exist as standalone plugins. > > > I like the order in which you approach the tickets. I assigned you to the > first ticket and commented on the second one. I'm also willing to help > review your pull requests. > > What do you mean by "standalone plugins" in the context of connectors? > Would you like to contribute these connectors to the main Flink codebase, > or maintain them outside Flink but list them in flink-packages.org? > > Best, > Robert > > > On Wed, Jul 22, 2020 at 10:43 AM Israel Ekpo <[hidden email]> wrote: > > > I have opened the following issues to track new efforts to bring > additional > > Azure Support in the following areas to the connectors ecosystem. > > > > My goal is to add the first two features [FLINK-18562] and [FLINK-18568] > to > > the existing file system capabilities [1] and then have the other > > connectors FLINK-1856[3-7] exist as standalone plugins. > > > > As more users adopt the additional connectors, we could incrementally > bring > > them into the core code base if necessary with sufficient support. > > > > I am new to the process so that I have a few questions: > > > > Do I need to create a FLIP [2] in order to make these changes to bring > the > > new capabilities or the individual JIRA issues are sufficient? > > > > I am thinking about targeting Flink versions 1.10 through 1.12 > > For new connectors like this, how many versions can/should this be > > integrated into? > > > > Are there any upcoming changes to supported Java and Scala versions that > I > > need to be aware of? > > > > Any ideas or suggestions you have would be great. > > > > Below is a summary of the JIRA issues that were created to track the > effort > > > > Add Support for Azure Data Lake Store Gen 2 in Flink File System > > https://issues.apache.org/jira/browse/FLINK-18562 > > > > Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink > > https://issues.apache.org/jira/browse/FLINK-18568 > > > > Add Support for Azure Cosmos DB DataStream Connector > > https://issues.apache.org/jira/browse/FLINK-18563 > > > > Add Support for Azure Event Hub DataStream Connector > > https://issues.apache.org/jira/browse/FLINK-18564 > > > > Add Support for Azure Event Grid DataStream Connector > > https://issues.apache.org/jira/browse/FLINK-18565 > > > > Add Support for Azure Cognitive Search DataStream Connector > > https://issues.apache.org/jira/browse/FLINK-18566 > > > > Add Support for Azure Cognitive Search Table & SQL Connector > > https://issues.apache.org/jira/browse/FLINK-18567 > > > > > > [1] https://github.com/apache/flink/tree/master/flink-filesystems > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > |
Great! Looking forward to your first pull request.
I agree that having a connector in Flink's codebase will probably help it's adoption. However, we are careful with the connectors we are adding to Flink for the following reasons: a) The Flink project is lacking people to maintain all connectors, leading to a poor user experience. Some connectors have a lot of unresolved bugs, because there's no committer involved in the component. b) Even if the connector is unmaintained, there's a overhead for the project to carry it along, as we sometimes make changes to the build system, as it complicates our license checking process and slows down our CI execution time c) For you as a contributor / maintainer of a connector, it could be difficult to maintain the connector, because you will always need a committer willing to review & merge your changes. We have no bad intentions, it's just the reality of a big, busy open source project (of course we will consider contributors actively maintaining a component over a longer period of time for committership) I'm not against adding connectors per se, but for "Azure Cognitive" and "Azure Cosmos" I could not find any evidence (on the user@ mailing list or google) that people are asking for such connectors. In my opinion, these connectors should exist on flink-packages.org first, and once we see that there's a lot of activity around them, we can revisit this decision. For "Azure Event Hub", I'm open to discuss adding a connector to Flink. It seems to have a Kafka compatible endpoint, but I believe it'll lead to a poor user experience (for authentication, exactly-once etc.). Please note that all I wrote above is my personal opinion, based on my observations of the Flink project. Maybe other PMC members have a different opinion. On Fri, Jul 24, 2020 at 4:32 AM Israel Ekpo <[hidden email]> wrote: > Thanks for the guidance Robert. I appreciate the prompt response and will > share the pull requests for the ADLS support [2] next week. > > Regarding the additional, I would like to contribute them to the main > codebase [1] if possible > > I initially thought maybe it is better to start outside the core codebase > but I think the adoption would be better if we have documentation on the > core Flink documentation and reduce the level of effort necessary to > integrate it for users. We will take time to make sure the docs and tests > for the connectors are solid and then we can bring them one at a time into > the core code base. > > [1] https://github.com/apache/flink/tree/master/flink-connectors > > [2] https://issues.apache.org/jira/browse/FLINK-18562 > > > On Thu, Jul 23, 2020 at 3:41 AM Robert Metzger <[hidden email]> > wrote: > > > Hi Israel, > > > > thanks a lot for reaching out! I'm very excited about your efforts to > bring > > additional Azure support into Flink. > > There are ~50 threads on the user@ mailing list mentioning Azure -- > that's > > good evidence that our users use Flink in Azure. > > > > Regarding your questions: > > > > Do I need to create a FLIP [2] in order to make these changes to bring > the > > > new capabilities or the individual JIRA issues are sufficient? > > > > > > For the two DataLake FS tickets, I don't see the need for a FLIP. > > > > I am thinking about targeting Flink versions 1.10 through 1.12 > > > For new connectors like this, how many versions can/should this be > > > integrated into? > > > > > > We generally don't add new features to old releases (unless there's a > very > > good reason to backport the feature). Therefore, the new integrations > will > > all go into the next major Flink release (in this case probably Flink > 1.12 > > for the first tickets). > > > > Are there any upcoming changes to supported Java and Scala versions that > I > > > need to be aware of? > > > > > > No, I'm not aware of any upcoming changes. Java 8 and Java 11 are the two > > versions we test against. > > > > My goal is to add the first two features [FLINK-18562] and [FLINK-18568] > to > > > the existing file system capabilities [1] and then have the other > > > connectors FLINK-1856[3-7] exist as standalone plugins. > > > > > > I like the order in which you approach the tickets. I assigned you to the > > first ticket and commented on the second one. I'm also willing to help > > review your pull requests. > > > > What do you mean by "standalone plugins" in the context of connectors? > > Would you like to contribute these connectors to the main Flink codebase, > > or maintain them outside Flink but list them in flink-packages.org? > > > > Best, > > Robert > > > > > > On Wed, Jul 22, 2020 at 10:43 AM Israel Ekpo <[hidden email]> > wrote: > > > > > I have opened the following issues to track new efforts to bring > > additional > > > Azure Support in the following areas to the connectors ecosystem. > > > > > > My goal is to add the first two features [FLINK-18562] and > [FLINK-18568] > > to > > > the existing file system capabilities [1] and then have the other > > > connectors FLINK-1856[3-7] exist as standalone plugins. > > > > > > As more users adopt the additional connectors, we could incrementally > > bring > > > them into the core code base if necessary with sufficient support. > > > > > > I am new to the process so that I have a few questions: > > > > > > Do I need to create a FLIP [2] in order to make these changes to bring > > the > > > new capabilities or the individual JIRA issues are sufficient? > > > > > > I am thinking about targeting Flink versions 1.10 through 1.12 > > > For new connectors like this, how many versions can/should this be > > > integrated into? > > > > > > Are there any upcoming changes to supported Java and Scala versions > that > > I > > > need to be aware of? > > > > > > Any ideas or suggestions you have would be great. > > > > > > Below is a summary of the JIRA issues that were created to track the > > effort > > > > > > Add Support for Azure Data Lake Store Gen 2 in Flink File System > > > https://issues.apache.org/jira/browse/FLINK-18562 > > > > > > Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink > > > https://issues.apache.org/jira/browse/FLINK-18568 > > > > > > Add Support for Azure Cosmos DB DataStream Connector > > > https://issues.apache.org/jira/browse/FLINK-18563 > > > > > > Add Support for Azure Event Hub DataStream Connector > > > https://issues.apache.org/jira/browse/FLINK-18564 > > > > > > Add Support for Azure Event Grid DataStream Connector > > > https://issues.apache.org/jira/browse/FLINK-18565 > > > > > > Add Support for Azure Cognitive Search DataStream Connector > > > https://issues.apache.org/jira/browse/FLINK-18566 > > > > > > Add Support for Azure Cognitive Search Table & SQL Connector > > > https://issues.apache.org/jira/browse/FLINK-18567 > > > > > > > > > [1] https://github.com/apache/flink/tree/master/flink-filesystems > > > [2] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > > > > |
Free forum by Nabble | Edit this page |