Hi everybody,
right now, we have two separate Maven modules for batch and streaming connectors (flink-batch-connectors and flink-streaming-connectors) that contain modules for the individual external systems and storage formats such as HBase, Cassandra, Avro, Elasticsearch, etc. Some of these systems can be used in streaming as well as batch jobs as for instance HBase, Cassandra, and Elasticsearch. However, due to the separate main modules for streaming and batch connectors, we currently need to decide where to put a connector. For example, the flink-connector-cassandra module is located in flink-streaming-connectors but includes a CassandraInputFormat and CassandraOutputFormat (i.e., a batch source and sink). In my opinion, it would be better to just merge flink-batch-connectors and flink-streaming-connectors into a joint flink-connectors module. This would be only an internal restructuring of code and not be visible to users (unless we change the module names of the individual connectors which is not necessary, IMO). What do others think? Best, Fabian |
+1
It will be good to have one module flink-connectors (union of streaming and batch connectors). Regards, Swapnil On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske <[hidden email]> wrote: > Hi everybody, > > right now, we have two separate Maven modules for batch and streaming > connectors (flink-batch-connectors and flink-streaming-connectors) that > contain modules for the individual external systems and storage formats > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > Some of these systems can be used in streaming as well as batch jobs as for > instance HBase, Cassandra, and Elasticsearch. However, due to the separate > main modules for streaming and batch connectors, we currently need to > decide where to put a connector. For example, the flink-connector-cassandra > module is located in flink-streaming-connectors but includes a > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source and > sink). > > In my opinion, it would be better to just merge flink-batch-connectors and > flink-streaming-connectors into a joint flink-connectors module. > > This would be only an internal restructuring of code and not be visible to > users (unless we change the module names of the individual connectors which > is not necessary, IMO). > > What do others think? > > Best, Fabian > |
+1 for Fabian's suggestion
On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule <[hidden email]> wrote: > +1 > It will be good to have one module flink-connectors (union of streaming and > batch connectors). > > Regards, > Swapnil > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske <[hidden email]> wrote: > > > Hi everybody, > > > > right now, we have two separate Maven modules for batch and streaming > > connectors (flink-batch-connectors and flink-streaming-connectors) that > > contain modules for the individual external systems and storage formats > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > Some of these systems can be used in streaming as well as batch jobs as > for > > instance HBase, Cassandra, and Elasticsearch. However, due to the > separate > > main modules for streaming and batch connectors, we currently need to > > decide where to put a connector. For example, the > flink-connector-cassandra > > module is located in flink-streaming-connectors but includes a > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source and > > sink). > > > > In my opinion, it would be better to just merge flink-batch-connectors > and > > flink-streaming-connectors into a joint flink-connectors module. > > > > This would be only an internal restructuring of code and not be visible > to > > users (unless we change the module names of the individual connectors > which > > is not necessary, IMO). > > > > What do others think? > > > > Best, Fabian > > > |
I don't think it's that easy. The streaming connectors have flink-streaming
as dependency while the batch connectors have the batch dependencies. Combining them would mean that users always have all dependencies, right? On Thu, 22 Sep 2016 at 15:41 Stephan Ewen <[hidden email]> wrote: > +1 for Fabian's suggestion > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule <[hidden email] > > > wrote: > > > +1 > > It will be good to have one module flink-connectors (union of streaming > and > > batch connectors). > > > > Regards, > > Swapnil > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske <[hidden email]> > wrote: > > > > > Hi everybody, > > > > > > right now, we have two separate Maven modules for batch and streaming > > > connectors (flink-batch-connectors and flink-streaming-connectors) that > > > contain modules for the individual external systems and storage formats > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > Some of these systems can be used in streaming as well as batch jobs as > > for > > > instance HBase, Cassandra, and Elasticsearch. However, due to the > > separate > > > main modules for streaming and batch connectors, we currently need to > > > decide where to put a connector. For example, the > > flink-connector-cassandra > > > module is located in flink-streaming-connectors but includes a > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source > and > > > sink). > > > > > > In my opinion, it would be better to just merge flink-batch-connectors > > and > > > flink-streaming-connectors into a joint flink-connectors module. > > > > > > This would be only an internal restructuring of code and not be visible > > to > > > users (unless we change the module names of the individual connectors > > which > > > is not necessary, IMO). > > > > > > What do others think? > > > > > > Best, Fabian > > > > > > |
I think this only holds true for modules which depend on the batch or
streaming counter part, respectively. We could refactor these modules by pulling out common types which are independent of streaming/batch and are used by the batch and streaming module. Cheers, Till On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek <[hidden email]> wrote: > I don't think it's that easy. The streaming connectors have flink-streaming > as dependency while the batch connectors have the batch dependencies. > > Combining them would mean that users always have all dependencies, right? > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen <[hidden email]> wrote: > > > +1 for Fabian's suggestion > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < > [hidden email] > > > > > wrote: > > > > > +1 > > > It will be good to have one module flink-connectors (union of streaming > > and > > > batch connectors). > > > > > > Regards, > > > Swapnil > > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske <[hidden email]> > > wrote: > > > > > > > Hi everybody, > > > > > > > > right now, we have two separate Maven modules for batch and streaming > > > > connectors (flink-batch-connectors and flink-streaming-connectors) > that > > > > contain modules for the individual external systems and storage > formats > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > > > Some of these systems can be used in streaming as well as batch jobs > as > > > for > > > > instance HBase, Cassandra, and Elasticsearch. However, due to the > > > separate > > > > main modules for streaming and batch connectors, we currently need to > > > > decide where to put a connector. For example, the > > > flink-connector-cassandra > > > > module is located in flink-streaming-connectors but includes a > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch source > > and > > > > sink). > > > > > > > > In my opinion, it would be better to just merge > flink-batch-connectors > > > and > > > > flink-streaming-connectors into a joint flink-connectors module. > > > > > > > > This would be only an internal restructuring of code and not be > visible > > > to > > > > users (unless we change the module names of the individual connectors > > > which > > > > is not necessary, IMO). > > > > > > > > What do others think? > > > > > > > > Best, Fabian > > > > > > > > > > |
The module would have both dependencies, but both are provided anyways, so
that would not be much of an issue, I think. On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann <[hidden email]> wrote: > I think this only holds true for modules which depend on the batch or > streaming counter part, respectively. We could refactor these modules by > pulling out common types which are independent of streaming/batch and are > used by the batch and streaming module. > > Cheers, > Till > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek <[hidden email]> > wrote: > > > I don't think it's that easy. The streaming connectors have > flink-streaming > > as dependency while the batch connectors have the batch dependencies. > > > > Combining them would mean that users always have all dependencies, right? > > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen <[hidden email]> wrote: > > > > > +1 for Fabian's suggestion > > > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < > > [hidden email] > > > > > > > wrote: > > > > > > > +1 > > > > It will be good to have one module flink-connectors (union of > streaming > > > and > > > > batch connectors). > > > > > > > > Regards, > > > > Swapnil > > > > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske <[hidden email]> > > > wrote: > > > > > > > > > Hi everybody, > > > > > > > > > > right now, we have two separate Maven modules for batch and > streaming > > > > > connectors (flink-batch-connectors and flink-streaming-connectors) > > that > > > > > contain modules for the individual external systems and storage > > formats > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > > > > > Some of these systems can be used in streaming as well as batch > jobs > > as > > > > for > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to the > > > > separate > > > > > main modules for streaming and batch connectors, we currently need > to > > > > > decide where to put a connector. For example, the > > > > flink-connector-cassandra > > > > > module is located in flink-streaming-connectors but includes a > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch > source > > > and > > > > > sink). > > > > > > > > > > In my opinion, it would be better to just merge > > flink-batch-connectors > > > > and > > > > > flink-streaming-connectors into a joint flink-connectors module. > > > > > > > > > > This would be only an internal restructuring of code and not be > > visible > > > > to > > > > > users (unless we change the module names of the individual > connectors > > > > which > > > > > is not necessary, IMO). > > > > > > > > > > What do others think? > > > > > > > > > > Best, Fabian > > > > > > > > > > > > > > > |
+1 good suggestion.
On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen <[hidden email]> wrote: > The module would have both dependencies, but both are provided anyways, so > that would not be much of an issue, I think. > > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann <[hidden email]> > wrote: > > > I think this only holds true for modules which depend on the batch or > > streaming counter part, respectively. We could refactor these modules by > > pulling out common types which are independent of streaming/batch and are > > used by the batch and streaming module. > > > > Cheers, > > Till > > > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek <[hidden email]> > > wrote: > > > > > I don't think it's that easy. The streaming connectors have > > flink-streaming > > > as dependency while the batch connectors have the batch dependencies. > > > > > > Combining them would mean that users always have all dependencies, > right? > > > > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen <[hidden email]> wrote: > > > > > > > +1 for Fabian's suggestion > > > > > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < > > > [hidden email] > > > > > > > > > wrote: > > > > > > > > > +1 > > > > > It will be good to have one module flink-connectors (union of > > streaming > > > > and > > > > > batch connectors). > > > > > > > > > > Regards, > > > > > Swapnil > > > > > > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske <[hidden email]> > > > > wrote: > > > > > > > > > > > Hi everybody, > > > > > > > > > > > > right now, we have two separate Maven modules for batch and > > streaming > > > > > > connectors (flink-batch-connectors and > flink-streaming-connectors) > > > that > > > > > > contain modules for the individual external systems and storage > > > formats > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > > > > > > > Some of these systems can be used in streaming as well as batch > > jobs > > > as > > > > > for > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to the > > > > > separate > > > > > > main modules for streaming and batch connectors, we currently > need > > to > > > > > > decide where to put a connector. For example, the > > > > > flink-connector-cassandra > > > > > > module is located in flink-streaming-connectors but includes a > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch > > source > > > > and > > > > > > sink). > > > > > > > > > > > > In my opinion, it would be better to just merge > > > flink-batch-connectors > > > > > and > > > > > > flink-streaming-connectors into a joint flink-connectors module. > > > > > > > > > > > > This would be only an internal restructuring of code and not be > > > visible > > > > > to > > > > > > users (unless we change the module names of the individual > > connectors > > > > > which > > > > > > is not necessary, IMO). > > > > > > > > > > > > What do others think? > > > > > > > > > > > > Best, Fabian > > > > > > > > > > > > > > > > > > > > > |
Thanks everybody for your comments.
I opened FLINK-4676 [1] for merging the connector modules. [1] https://issues.apache.org/jira/browse/FLINK-4676 2016-09-26 13:17 GMT+02:00 Robert Metzger <[hidden email]>: > +1 good suggestion. > > On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen <[hidden email]> wrote: > > > The module would have both dependencies, but both are provided anyways, > so > > that would not be much of an issue, I think. > > > > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann <[hidden email]> > > wrote: > > > > > I think this only holds true for modules which depend on the batch or > > > streaming counter part, respectively. We could refactor these modules > by > > > pulling out common types which are independent of streaming/batch and > are > > > used by the batch and streaming module. > > > > > > Cheers, > > > Till > > > > > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek < > [hidden email]> > > > wrote: > > > > > > > I don't think it's that easy. The streaming connectors have > > > flink-streaming > > > > as dependency while the batch connectors have the batch dependencies. > > > > > > > > Combining them would mean that users always have all dependencies, > > right? > > > > > > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen <[hidden email]> wrote: > > > > > > > > > +1 for Fabian's suggestion > > > > > > > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < > > > > [hidden email] > > > > > > > > > > > wrote: > > > > > > > > > > > +1 > > > > > > It will be good to have one module flink-connectors (union of > > > streaming > > > > > and > > > > > > batch connectors). > > > > > > > > > > > > Regards, > > > > > > Swapnil > > > > > > > > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske < > [hidden email]> > > > > > wrote: > > > > > > > > > > > > > Hi everybody, > > > > > > > > > > > > > > right now, we have two separate Maven modules for batch and > > > streaming > > > > > > > connectors (flink-batch-connectors and > > flink-streaming-connectors) > > > > that > > > > > > > contain modules for the individual external systems and storage > > > > formats > > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. > > > > > > > > > > > > > > Some of these systems can be used in streaming as well as batch > > > jobs > > > > as > > > > > > for > > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to > the > > > > > > separate > > > > > > > main modules for streaming and batch connectors, we currently > > need > > > to > > > > > > > decide where to put a connector. For example, the > > > > > > flink-connector-cassandra > > > > > > > module is located in flink-streaming-connectors but includes a > > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch > > > source > > > > > and > > > > > > > sink). > > > > > > > > > > > > > > In my opinion, it would be better to just merge > > > > flink-batch-connectors > > > > > > and > > > > > > > flink-streaming-connectors into a joint flink-connectors > module. > > > > > > > > > > > > > > This would be only an internal restructuring of code and not be > > > > visible > > > > > > to > > > > > > > users (unless we change the module names of the individual > > > connectors > > > > > > which > > > > > > > is not necessary, IMO). > > > > > > > > > > > > > > What do others think? > > > > > > > > > > > > > > Best, Fabian > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi all,
should we do this refactoring for the 1.2 release? If yes, I'll prepare a PR for that. Cheers, Fabian 2016-09-26 13:55 GMT+02:00 Fabian Hueske <[hidden email]>: > Thanks everybody for your comments. > > I opened FLINK-4676 [1] for merging the connector modules. > > [1] https://issues.apache.org/jira/browse/FLINK-4676 > > 2016-09-26 13:17 GMT+02:00 Robert Metzger <[hidden email]>: > >> +1 good suggestion. >> >> On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen <[hidden email]> wrote: >> >> > The module would have both dependencies, but both are provided anyways, >> so >> > that would not be much of an issue, I think. >> > >> > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann <[hidden email]> >> > wrote: >> > >> > > I think this only holds true for modules which depend on the batch or >> > > streaming counter part, respectively. We could refactor these modules >> by >> > > pulling out common types which are independent of streaming/batch and >> are >> > > used by the batch and streaming module. >> > > >> > > Cheers, >> > > Till >> > > >> > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek < >> [hidden email]> >> > > wrote: >> > > >> > > > I don't think it's that easy. The streaming connectors have >> > > flink-streaming >> > > > as dependency while the batch connectors have the batch >> dependencies. >> > > > >> > > > Combining them would mean that users always have all dependencies, >> > right? >> > > > >> > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen <[hidden email]> wrote: >> > > > >> > > > > +1 for Fabian's suggestion >> > > > > >> > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < >> > > > [hidden email] >> > > > > > >> > > > > wrote: >> > > > > >> > > > > > +1 >> > > > > > It will be good to have one module flink-connectors (union of >> > > streaming >> > > > > and >> > > > > > batch connectors). >> > > > > > >> > > > > > Regards, >> > > > > > Swapnil >> > > > > > >> > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske < >> [hidden email]> >> > > > > wrote: >> > > > > > >> > > > > > > Hi everybody, >> > > > > > > >> > > > > > > right now, we have two separate Maven modules for batch and >> > > streaming >> > > > > > > connectors (flink-batch-connectors and >> > flink-streaming-connectors) >> > > > that >> > > > > > > contain modules for the individual external systems and >> storage >> > > > formats >> > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. >> > > > > > > >> > > > > > > Some of these systems can be used in streaming as well as >> batch >> > > jobs >> > > > as >> > > > > > for >> > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to >> the >> > > > > > separate >> > > > > > > main modules for streaming and batch connectors, we currently >> > need >> > > to >> > > > > > > decide where to put a connector. For example, the >> > > > > > flink-connector-cassandra >> > > > > > > module is located in flink-streaming-connectors but includes a >> > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch >> > > source >> > > > > and >> > > > > > > sink). >> > > > > > > >> > > > > > > In my opinion, it would be better to just merge >> > > > flink-batch-connectors >> > > > > > and >> > > > > > > flink-streaming-connectors into a joint flink-connectors >> module. >> > > > > > > >> > > > > > > This would be only an internal restructuring of code and not >> be >> > > > visible >> > > > > > to >> > > > > > > users (unless we change the module names of the individual >> > > connectors >> > > > > > which >> > > > > > > is not necessary, IMO). >> > > > > > > >> > > > > > > What do others think? >> > > > > > > >> > > > > > > Best, Fabian >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> > > |
+1
On Tue, Nov 22, 2016 at 9:08 AM, Fabian Hueske <[hidden email]> wrote: > Hi all, > > should we do this refactoring for the 1.2 release? > If yes, I'll prepare a PR for that. > > Cheers, > Fabian > > 2016-09-26 13:55 GMT+02:00 Fabian Hueske <[hidden email]>: > >> Thanks everybody for your comments. >> >> I opened FLINK-4676 [1] for merging the connector modules. >> >> [1] https://issues.apache.org/jira/browse/FLINK-4676 >> >> 2016-09-26 13:17 GMT+02:00 Robert Metzger <[hidden email]>: >> >>> +1 good suggestion. >>> >>> On Mon, Sep 26, 2016 at 1:03 PM, Stephan Ewen <[hidden email]> wrote: >>> >>> > The module would have both dependencies, but both are provided anyways, >>> so >>> > that would not be much of an issue, I think. >>> > >>> > On Mon, Sep 26, 2016 at 12:25 PM, Till Rohrmann <[hidden email]> >>> > wrote: >>> > >>> > > I think this only holds true for modules which depend on the batch or >>> > > streaming counter part, respectively. We could refactor these modules >>> by >>> > > pulling out common types which are independent of streaming/batch and >>> are >>> > > used by the batch and streaming module. >>> > > >>> > > Cheers, >>> > > Till >>> > > >>> > > On Fri, Sep 23, 2016 at 11:15 AM, Aljoscha Krettek < >>> [hidden email]> >>> > > wrote: >>> > > >>> > > > I don't think it's that easy. The streaming connectors have >>> > > flink-streaming >>> > > > as dependency while the batch connectors have the batch >>> dependencies. >>> > > > >>> > > > Combining them would mean that users always have all dependencies, >>> > right? >>> > > > >>> > > > On Thu, 22 Sep 2016 at 15:41 Stephan Ewen <[hidden email]> wrote: >>> > > > >>> > > > > +1 for Fabian's suggestion >>> > > > > >>> > > > > On Thu, Sep 22, 2016 at 3:25 PM, Swapnil Chougule < >>> > > > [hidden email] >>> > > > > > >>> > > > > wrote: >>> > > > > >>> > > > > > +1 >>> > > > > > It will be good to have one module flink-connectors (union of >>> > > streaming >>> > > > > and >>> > > > > > batch connectors). >>> > > > > > >>> > > > > > Regards, >>> > > > > > Swapnil >>> > > > > > >>> > > > > > On Thu, Sep 22, 2016 at 6:35 PM, Fabian Hueske < >>> [hidden email]> >>> > > > > wrote: >>> > > > > > >>> > > > > > > Hi everybody, >>> > > > > > > >>> > > > > > > right now, we have two separate Maven modules for batch and >>> > > streaming >>> > > > > > > connectors (flink-batch-connectors and >>> > flink-streaming-connectors) >>> > > > that >>> > > > > > > contain modules for the individual external systems and >>> storage >>> > > > formats >>> > > > > > > such as HBase, Cassandra, Avro, Elasticsearch, etc. >>> > > > > > > >>> > > > > > > Some of these systems can be used in streaming as well as >>> batch >>> > > jobs >>> > > > as >>> > > > > > for >>> > > > > > > instance HBase, Cassandra, and Elasticsearch. However, due to >>> the >>> > > > > > separate >>> > > > > > > main modules for streaming and batch connectors, we currently >>> > need >>> > > to >>> > > > > > > decide where to put a connector. For example, the >>> > > > > > flink-connector-cassandra >>> > > > > > > module is located in flink-streaming-connectors but includes a >>> > > > > > > CassandraInputFormat and CassandraOutputFormat (i.e., a batch >>> > > source >>> > > > > and >>> > > > > > > sink). >>> > > > > > > >>> > > > > > > In my opinion, it would be better to just merge >>> > > > flink-batch-connectors >>> > > > > > and >>> > > > > > > flink-streaming-connectors into a joint flink-connectors >>> module. >>> > > > > > > >>> > > > > > > This would be only an internal restructuring of code and not >>> be >>> > > > visible >>> > > > > > to >>> > > > > > > users (unless we change the module names of the individual >>> > > connectors >>> > > > > > which >>> > > > > > > is not necessary, IMO). >>> > > > > > > >>> > > > > > > What do others think? >>> > > > > > > >>> > > > > > > Best, Fabian >>> > > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> |
Free forum by Nabble | Edit this page |