[DISCUSS] Adding Azure Platform Support in DataStream, Table and SQL Connectors

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Adding Azure Platform Support in DataStream, Table and SQL Connectors

Israel Ekpo
I have opened the following issues to track new efforts to bring additional
Azure Support in the following areas to the connectors ecosystem.

My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to
the existing file system capabilities [1] and then have the other
connectors FLINK-1856[3-7] exist as standalone plugins.

As more users adopt the additional connectors, we could incrementally bring
them into the core code base if necessary with sufficient support.

I am new to the process so that I have a few questions:

Do I need to create a FLIP [2] in order to make these changes to bring the
new capabilities or the individual JIRA issues are sufficient?

I am thinking about targeting Flink versions 1.10 through 1.12
For new connectors like this, how many versions can/should this be
integrated into?

Are there any upcoming changes to supported Java and Scala versions that I
need to be aware of?

Any ideas or suggestions you have would be great.

Below is a summary of the JIRA issues that were created to track the effort

Add Support for Azure Data Lake Store Gen 2 in Flink File System
https://issues.apache.org/jira/browse/FLINK-18562

Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink
https://issues.apache.org/jira/browse/FLINK-18568

Add Support for Azure Cosmos DB DataStream Connector
https://issues.apache.org/jira/browse/FLINK-18563

Add Support for Azure Event Hub DataStream Connector
https://issues.apache.org/jira/browse/FLINK-18564

Add Support for Azure Event Grid DataStream Connector
https://issues.apache.org/jira/browse/FLINK-18565

Add Support for Azure Cognitive Search DataStream Connector
https://issues.apache.org/jira/browse/FLINK-18566

Add Support for Azure Cognitive Search Table & SQL Connector
https://issues.apache.org/jira/browse/FLINK-18567


[1] https://github.com/apache/flink/tree/master/flink-filesystems
[2]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding Azure Platform Support in DataStream, Table and SQL Connectors

Robert Metzger
Hi Israel,

thanks a lot for reaching out! I'm very excited about your efforts to bring
additional Azure support into Flink.
There are ~50 threads on the user@ mailing list mentioning Azure -- that's
good evidence that our users use Flink in Azure.

Regarding your questions:

Do I need to create a FLIP [2] in order to make these changes to bring the
> new capabilities or the individual JIRA issues are sufficient?


For the two DataLake FS tickets, I don't see the need for a FLIP.

I am thinking about targeting Flink versions 1.10 through 1.12
> For new connectors like this, how many versions can/should this be
> integrated into?


We generally don't add new features to old releases (unless there's a very
good reason to backport the feature). Therefore, the new integrations will
all go into the next major Flink release (in this case probably Flink 1.12
for the first tickets).

Are there any upcoming changes to supported Java and Scala versions that I
> need to be aware of?


No, I'm not aware of any upcoming changes. Java 8 and Java 11 are the two
versions we test against.

My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to
> the existing file system capabilities [1] and then have the other
> connectors FLINK-1856[3-7] exist as standalone plugins.


I like the order in which you approach the tickets. I assigned you to the
first ticket and commented on the second one. I'm also willing to help
review your pull requests.

What do you mean by "standalone plugins" in the context of connectors?
Would you like to contribute these connectors to the main Flink codebase,
or maintain them outside Flink but list them in flink-packages.org?

Best,
Robert


On Wed, Jul 22, 2020 at 10:43 AM Israel Ekpo <[hidden email]> wrote:

> I have opened the following issues to track new efforts to bring additional
> Azure Support in the following areas to the connectors ecosystem.
>
> My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to
> the existing file system capabilities [1] and then have the other
> connectors FLINK-1856[3-7] exist as standalone plugins.
>
> As more users adopt the additional connectors, we could incrementally bring
> them into the core code base if necessary with sufficient support.
>
> I am new to the process so that I have a few questions:
>
> Do I need to create a FLIP [2] in order to make these changes to bring the
> new capabilities or the individual JIRA issues are sufficient?
>
> I am thinking about targeting Flink versions 1.10 through 1.12
> For new connectors like this, how many versions can/should this be
> integrated into?
>
> Are there any upcoming changes to supported Java and Scala versions that I
> need to be aware of?
>
> Any ideas or suggestions you have would be great.
>
> Below is a summary of the JIRA issues that were created to track the effort
>
> Add Support for Azure Data Lake Store Gen 2 in Flink File System
> https://issues.apache.org/jira/browse/FLINK-18562
>
> Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink
> https://issues.apache.org/jira/browse/FLINK-18568
>
> Add Support for Azure Cosmos DB DataStream Connector
> https://issues.apache.org/jira/browse/FLINK-18563
>
> Add Support for Azure Event Hub DataStream Connector
> https://issues.apache.org/jira/browse/FLINK-18564
>
> Add Support for Azure Event Grid DataStream Connector
> https://issues.apache.org/jira/browse/FLINK-18565
>
> Add Support for Azure Cognitive Search DataStream Connector
> https://issues.apache.org/jira/browse/FLINK-18566
>
> Add Support for Azure Cognitive Search Table & SQL Connector
> https://issues.apache.org/jira/browse/FLINK-18567
>
>
> [1] https://github.com/apache/flink/tree/master/flink-filesystems
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding Azure Platform Support in DataStream, Table and SQL Connectors

Israel Ekpo
Thanks for the guidance Robert. I appreciate the prompt response and will
share the pull requests for the ADLS support [2] next week.

Regarding the additional, I would like to contribute them to the main
codebase [1] if possible

I initially thought maybe it is better to start outside the core codebase
but I think the adoption would be better if we have documentation on the
core Flink documentation and reduce the level of effort necessary to
integrate it for users. We will take time to make sure the docs and tests
for the connectors are solid and then we can bring them one at a time into
the core code base.

[1] https://github.com/apache/flink/tree/master/flink-connectors

[2] https://issues.apache.org/jira/browse/FLINK-18562


On Thu, Jul 23, 2020 at 3:41 AM Robert Metzger <[hidden email]> wrote:

> Hi Israel,
>
> thanks a lot for reaching out! I'm very excited about your efforts to bring
> additional Azure support into Flink.
> There are ~50 threads on the user@ mailing list mentioning Azure -- that's
> good evidence that our users use Flink in Azure.
>
> Regarding your questions:
>
> Do I need to create a FLIP [2] in order to make these changes to bring the
> > new capabilities or the individual JIRA issues are sufficient?
>
>
> For the two DataLake FS tickets, I don't see the need for a FLIP.
>
> I am thinking about targeting Flink versions 1.10 through 1.12
> > For new connectors like this, how many versions can/should this be
> > integrated into?
>
>
> We generally don't add new features to old releases (unless there's a very
> good reason to backport the feature). Therefore, the new integrations will
> all go into the next major Flink release (in this case probably Flink 1.12
> for the first tickets).
>
> Are there any upcoming changes to supported Java and Scala versions that I
> > need to be aware of?
>
>
> No, I'm not aware of any upcoming changes. Java 8 and Java 11 are the two
> versions we test against.
>
> My goal is to add the first two features [FLINK-18562] and [FLINK-18568] to
> > the existing file system capabilities [1] and then have the other
> > connectors FLINK-1856[3-7] exist as standalone plugins.
>
>
> I like the order in which you approach the tickets. I assigned you to the
> first ticket and commented on the second one. I'm also willing to help
> review your pull requests.
>
> What do you mean by "standalone plugins" in the context of connectors?
> Would you like to contribute these connectors to the main Flink codebase,
> or maintain them outside Flink but list them in flink-packages.org?
>
> Best,
> Robert
>
>
> On Wed, Jul 22, 2020 at 10:43 AM Israel Ekpo <[hidden email]> wrote:
>
> > I have opened the following issues to track new efforts to bring
> additional
> > Azure Support in the following areas to the connectors ecosystem.
> >
> > My goal is to add the first two features [FLINK-18562] and [FLINK-18568]
> to
> > the existing file system capabilities [1] and then have the other
> > connectors FLINK-1856[3-7] exist as standalone plugins.
> >
> > As more users adopt the additional connectors, we could incrementally
> bring
> > them into the core code base if necessary with sufficient support.
> >
> > I am new to the process so that I have a few questions:
> >
> > Do I need to create a FLIP [2] in order to make these changes to bring
> the
> > new capabilities or the individual JIRA issues are sufficient?
> >
> > I am thinking about targeting Flink versions 1.10 through 1.12
> > For new connectors like this, how many versions can/should this be
> > integrated into?
> >
> > Are there any upcoming changes to supported Java and Scala versions that
> I
> > need to be aware of?
> >
> > Any ideas or suggestions you have would be great.
> >
> > Below is a summary of the JIRA issues that were created to track the
> effort
> >
> > Add Support for Azure Data Lake Store Gen 2 in Flink File System
> > https://issues.apache.org/jira/browse/FLINK-18562
> >
> > Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink
> > https://issues.apache.org/jira/browse/FLINK-18568
> >
> > Add Support for Azure Cosmos DB DataStream Connector
> > https://issues.apache.org/jira/browse/FLINK-18563
> >
> > Add Support for Azure Event Hub DataStream Connector
> > https://issues.apache.org/jira/browse/FLINK-18564
> >
> > Add Support for Azure Event Grid DataStream Connector
> > https://issues.apache.org/jira/browse/FLINK-18565
> >
> > Add Support for Azure Cognitive Search DataStream Connector
> > https://issues.apache.org/jira/browse/FLINK-18566
> >
> > Add Support for Azure Cognitive Search Table & SQL Connector
> > https://issues.apache.org/jira/browse/FLINK-18567
> >
> >
> > [1] https://github.com/apache/flink/tree/master/flink-filesystems
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Adding Azure Platform Support in DataStream, Table and SQL Connectors

Robert Metzger
Great! Looking forward to your first pull request.

I agree that having a connector in Flink's codebase will probably help it's
adoption.
However, we are careful with the connectors we are adding to Flink for the
following reasons:
a) The Flink project is lacking people to maintain all connectors, leading
to a poor user experience. Some connectors have a lot of unresolved bugs,
because there's no committer involved in the component.
b) Even if the connector is unmaintained, there's a overhead for the
project to carry it along, as we sometimes make changes to the build
system, as it complicates our license checking process and slows down our
CI execution time
c) For you as a contributor / maintainer of a connector, it could be
difficult to maintain the connector, because you will always need a
committer willing to review & merge your changes. We have no bad
intentions, it's just the reality of a big, busy open source project (of
course we will consider contributors actively maintaining a component over
a longer period of time for committership)

I'm not against adding connectors per se, but for "Azure Cognitive" and
"Azure Cosmos" I could not find any evidence (on the user@ mailing list or
google) that people are asking for such connectors. In my opinion, these
connectors should exist on flink-packages.org first, and once we see that
there's a lot of activity around them, we can revisit this decision.

For "Azure Event Hub", I'm open to discuss adding a connector to Flink. It
seems to have a Kafka compatible endpoint, but I believe it'll lead to a
poor user experience (for authentication, exactly-once etc.).

Please note that all I wrote above is my personal opinion, based on my
observations of the Flink project. Maybe other PMC members have a different
opinion.


On Fri, Jul 24, 2020 at 4:32 AM Israel Ekpo <[hidden email]> wrote:

> Thanks for the guidance Robert. I appreciate the prompt response and will
> share the pull requests for the ADLS support [2] next week.
>
> Regarding the additional, I would like to contribute them to the main
> codebase [1] if possible
>
> I initially thought maybe it is better to start outside the core codebase
> but I think the adoption would be better if we have documentation on the
> core Flink documentation and reduce the level of effort necessary to
> integrate it for users. We will take time to make sure the docs and tests
> for the connectors are solid and then we can bring them one at a time into
> the core code base.
>
> [1] https://github.com/apache/flink/tree/master/flink-connectors
>
> [2] https://issues.apache.org/jira/browse/FLINK-18562
>
>
> On Thu, Jul 23, 2020 at 3:41 AM Robert Metzger <[hidden email]>
> wrote:
>
> > Hi Israel,
> >
> > thanks a lot for reaching out! I'm very excited about your efforts to
> bring
> > additional Azure support into Flink.
> > There are ~50 threads on the user@ mailing list mentioning Azure --
> that's
> > good evidence that our users use Flink in Azure.
> >
> > Regarding your questions:
> >
> > Do I need to create a FLIP [2] in order to make these changes to bring
> the
> > > new capabilities or the individual JIRA issues are sufficient?
> >
> >
> > For the two DataLake FS tickets, I don't see the need for a FLIP.
> >
> > I am thinking about targeting Flink versions 1.10 through 1.12
> > > For new connectors like this, how many versions can/should this be
> > > integrated into?
> >
> >
> > We generally don't add new features to old releases (unless there's a
> very
> > good reason to backport the feature). Therefore, the new integrations
> will
> > all go into the next major Flink release (in this case probably Flink
> 1.12
> > for the first tickets).
> >
> > Are there any upcoming changes to supported Java and Scala versions that
> I
> > > need to be aware of?
> >
> >
> > No, I'm not aware of any upcoming changes. Java 8 and Java 11 are the two
> > versions we test against.
> >
> > My goal is to add the first two features [FLINK-18562] and [FLINK-18568]
> to
> > > the existing file system capabilities [1] and then have the other
> > > connectors FLINK-1856[3-7] exist as standalone plugins.
> >
> >
> > I like the order in which you approach the tickets. I assigned you to the
> > first ticket and commented on the second one. I'm also willing to help
> > review your pull requests.
> >
> > What do you mean by "standalone plugins" in the context of connectors?
> > Would you like to contribute these connectors to the main Flink codebase,
> > or maintain them outside Flink but list them in flink-packages.org?
> >
> > Best,
> > Robert
> >
> >
> > On Wed, Jul 22, 2020 at 10:43 AM Israel Ekpo <[hidden email]>
> wrote:
> >
> > > I have opened the following issues to track new efforts to bring
> > additional
> > > Azure Support in the following areas to the connectors ecosystem.
> > >
> > > My goal is to add the first two features [FLINK-18562] and
> [FLINK-18568]
> > to
> > > the existing file system capabilities [1] and then have the other
> > > connectors FLINK-1856[3-7] exist as standalone plugins.
> > >
> > > As more users adopt the additional connectors, we could incrementally
> > bring
> > > them into the core code base if necessary with sufficient support.
> > >
> > > I am new to the process so that I have a few questions:
> > >
> > > Do I need to create a FLIP [2] in order to make these changes to bring
> > the
> > > new capabilities or the individual JIRA issues are sufficient?
> > >
> > > I am thinking about targeting Flink versions 1.10 through 1.12
> > > For new connectors like this, how many versions can/should this be
> > > integrated into?
> > >
> > > Are there any upcoming changes to supported Java and Scala versions
> that
> > I
> > > need to be aware of?
> > >
> > > Any ideas or suggestions you have would be great.
> > >
> > > Below is a summary of the JIRA issues that were created to track the
> > effort
> > >
> > > Add Support for Azure Data Lake Store Gen 2 in Flink File System
> > > https://issues.apache.org/jira/browse/FLINK-18562
> > >
> > > Add Support for Azure Data Lake Store Gen 2 in Streaming File Sink
> > > https://issues.apache.org/jira/browse/FLINK-18568
> > >
> > > Add Support for Azure Cosmos DB DataStream Connector
> > > https://issues.apache.org/jira/browse/FLINK-18563
> > >
> > > Add Support for Azure Event Hub DataStream Connector
> > > https://issues.apache.org/jira/browse/FLINK-18564
> > >
> > > Add Support for Azure Event Grid DataStream Connector
> > > https://issues.apache.org/jira/browse/FLINK-18565
> > >
> > > Add Support for Azure Cognitive Search DataStream Connector
> > > https://issues.apache.org/jira/browse/FLINK-18566
> > >
> > > Add Support for Azure Cognitive Search Table & SQL Connector
> > > https://issues.apache.org/jira/browse/FLINK-18567
> > >
> > >
> > > [1] https://github.com/apache/flink/tree/master/flink-filesystems
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > >
> >
>