What do you think about dividing the streaming connectors project into
various smaller projects, basically one per connector? I am personally always happy when projects offer me artifacts that contain what I need, and not a lot of other unnecessary dependencies as well Many people using the streaming connectors as a dependency in other setups will have to define a long list of exclusions, to get rid of all dependencies (and their transitives) that they do not need. We have seen how these "super fat" dependencies cause trouble, for example at the case of Hadoop 1 where everything was one artifact and how much easier it is with Hadoop 2 where we can use subcomponents as dependencies. What do you think? Stephan |
Would this proposal also include packaging streaming connectors into
separate source and binary jars? - Henry On Tue, Apr 7, 2015 at 12:21 PM, Stephan Ewen <[hidden email]> wrote: > What do you think about dividing the streaming connectors project into > various smaller projects, basically one per connector? > > I am personally always happy when projects offer me artifacts that contain > what I need, and not a lot of other unnecessary dependencies as well > > Many people using the streaming connectors as a dependency in other setups > will have to define a long list of exclusions, to get rid of all > dependencies (and their transitives) that they do not need. > > We have seen how these "super fat" dependencies cause trouble, for example > at the case of Hadoop 1 where everything was one artifact and how much > easier it is with Hadoop 2 where we can use subcomponents as dependencies. > > What do you think? > > Stephan |
Exactly each streaming connector would be a separate jar:
- stream-connector-kafka - stream-connector-rabbitmq - stream-connector-flume - ... On Tue, Apr 7, 2015 at 10:59 PM, Henry Saputra <[hidden email]> wrote: > Would this proposal also include packaging streaming connectors into > separate source and binary jars? > > - Henry > > On Tue, Apr 7, 2015 at 12:21 PM, Stephan Ewen <[hidden email]> wrote: > > What do you think about dividing the streaming connectors project into > > various smaller projects, basically one per connector? > > > > I am personally always happy when projects offer me artifacts that > contain > > what I need, and not a lot of other unnecessary dependencies as well > > > > Many people using the streaming connectors as a dependency in other > setups > > will have to define a long list of exclusions, to get rid of all > > dependencies (and their transitives) that they do not need. > > > > We have seen how these "super fat" dependencies cause trouble, for > example > > at the case of Hadoop 1 where everything was one artifact and how much > > easier it is with Hadoop 2 where we can use subcomponents as > dependencies. > > > > What do you think? > > > > Stephan > |
Overall I think this is a nice approach, but let us then also discuss where
would we like to put these jars. Currently these jars are not in the lib folder of the Flink distribution, which mean that whenever a user would like to use them they have to package it with there usercode which is a bit intuitive I think as they are in the org.apache.flink namespace. The current approach was perfectly fine a month ago, when the connectors where practically examples and not really connectors in the sense that we actually expect users to use these exact classes as entry points to message qeues. Now with the new PersistentKafkaSource I'm not quite sure that this should be the case. Of course one can argue that these modules inherently pull a lot of dependencies (Kafka, Zookeeper etc.) so it is better to avoid them. If we decide to add the connectors to the lib of the distribution then it is not much use to separate them. If not then I support to do it. On Wed, Apr 8, 2015 at 1:33 PM, Stephan Ewen <[hidden email]> wrote: > Exactly each streaming connector would be a separate jar: > > - stream-connector-kafka > - stream-connector-rabbitmq > - stream-connector-flume > - ... > > On Tue, Apr 7, 2015 at 10:59 PM, Henry Saputra <[hidden email]> > wrote: > > > Would this proposal also include packaging streaming connectors into > > separate source and binary jars? > > > > - Henry > > > > On Tue, Apr 7, 2015 at 12:21 PM, Stephan Ewen <[hidden email]> wrote: > > > What do you think about dividing the streaming connectors project into > > > various smaller projects, basically one per connector? > > > > > > I am personally always happy when projects offer me artifacts that > > contain > > > what I need, and not a lot of other unnecessary dependencies as well > > > > > > Many people using the streaming connectors as a dependency in other > > setups > > > will have to define a long list of exclusions, to get rid of all > > > dependencies (and their transitives) that they do not need. > > > > > > We have seen how these "super fat" dependencies cause trouble, for > > example > > > at the case of Hadoop 1 where everything was one artifact and how much > > > easier it is with Hadoop 2 where we can use subcomponents as > > dependencies. > > > > > > What do you think? > > > > > > Stephan > > > |
Thanks for bringing up this discussion.
I'm very much in favor of splitting up the connectors into separate maven modules. The transitive dependencies are a mess otherwise. Also, I would not put them to "flink-dist" (=lib folder) because we would have the dependency mess again. On Wed, Apr 8, 2015 at 3:00 PM, Márton Balassi <[hidden email]> wrote: > Overall I think this is a nice approach, but let us then also discuss where > would we like to put these jars. Currently these jars are not in the lib > folder of the Flink distribution, which mean that whenever a user would > like to use them they have to package it with there usercode which is a bit > intuitive I think as they are in the org.apache.flink namespace. > > The current approach was perfectly fine a month ago, when the connectors > where practically examples and not really connectors in the sense that we > actually expect users to use these exact classes as entry points to message > qeues. Now with the new PersistentKafkaSource I'm not quite sure that this > should be the case. Of course one can argue that these modules inherently > pull a lot of dependencies (Kafka, Zookeeper etc.) so it is better to avoid > them. > > If we decide to add the connectors to the lib of the distribution then it > is not much use to separate them. If not then I support to do it. > > On Wed, Apr 8, 2015 at 1:33 PM, Stephan Ewen <[hidden email]> wrote: > > > Exactly each streaming connector would be a separate jar: > > > > - stream-connector-kafka > > - stream-connector-rabbitmq > > - stream-connector-flume > > - ... > > > > On Tue, Apr 7, 2015 at 10:59 PM, Henry Saputra <[hidden email]> > > wrote: > > > > > Would this proposal also include packaging streaming connectors into > > > separate source and binary jars? > > > > > > - Henry > > > > > > On Tue, Apr 7, 2015 at 12:21 PM, Stephan Ewen <[hidden email]> > wrote: > > > > What do you think about dividing the streaming connectors project > into > > > > various smaller projects, basically one per connector? > > > > > > > > I am personally always happy when projects offer me artifacts that > > > contain > > > > what I need, and not a lot of other unnecessary dependencies as well > > > > > > > > Many people using the streaming connectors as a dependency in other > > > setups > > > > will have to define a long list of exclusions, to get rid of all > > > > dependencies (and their transitives) that they do not need. > > > > > > > > We have seen how these "super fat" dependencies cause trouble, for > > > example > > > > at the case of Hadoop 1 where everything was one artifact and how > much > > > > easier it is with Hadoop 2 where we can use subcomponents as > > > dependencies. > > > > > > > > What do you think? > > > > > > > > Stephan > > > > > > |
I filed a JIRA to address the issue:
https://issues.apache.org/jira/browse/FLINK-1874 On Thu, Apr 9, 2015 at 4:08 PM, Robert Metzger <[hidden email]> wrote: > Thanks for bringing up this discussion. > > I'm very much in favor of splitting up the connectors into separate maven > modules. > The transitive dependencies are a mess otherwise. > > Also, I would not put them to "flink-dist" (=lib folder) because we would > have the dependency mess again. > > > On Wed, Apr 8, 2015 at 3:00 PM, Márton Balassi <[hidden email]> > wrote: > >> Overall I think this is a nice approach, but let us then also discuss >> where >> would we like to put these jars. Currently these jars are not in the lib >> folder of the Flink distribution, which mean that whenever a user would >> like to use them they have to package it with there usercode which is a >> bit >> intuitive I think as they are in the org.apache.flink namespace. >> >> The current approach was perfectly fine a month ago, when the connectors >> where practically examples and not really connectors in the sense that we >> actually expect users to use these exact classes as entry points to >> message >> qeues. Now with the new PersistentKafkaSource I'm not quite sure that this >> should be the case. Of course one can argue that these modules inherently >> pull a lot of dependencies (Kafka, Zookeeper etc.) so it is better to >> avoid >> them. >> >> If we decide to add the connectors to the lib of the distribution then it >> is not much use to separate them. If not then I support to do it. >> >> On Wed, Apr 8, 2015 at 1:33 PM, Stephan Ewen <[hidden email]> wrote: >> >> > Exactly each streaming connector would be a separate jar: >> > >> > - stream-connector-kafka >> > - stream-connector-rabbitmq >> > - stream-connector-flume >> > - ... >> > >> > On Tue, Apr 7, 2015 at 10:59 PM, Henry Saputra <[hidden email] >> > >> > wrote: >> > >> > > Would this proposal also include packaging streaming connectors into >> > > separate source and binary jars? >> > > >> > > - Henry >> > > >> > > On Tue, Apr 7, 2015 at 12:21 PM, Stephan Ewen <[hidden email]> >> wrote: >> > > > What do you think about dividing the streaming connectors project >> into >> > > > various smaller projects, basically one per connector? >> > > > >> > > > I am personally always happy when projects offer me artifacts that >> > > contain >> > > > what I need, and not a lot of other unnecessary dependencies as well >> > > > >> > > > Many people using the streaming connectors as a dependency in other >> > > setups >> > > > will have to define a long list of exclusions, to get rid of all >> > > > dependencies (and their transitives) that they do not need. >> > > > >> > > > We have seen how these "super fat" dependencies cause trouble, for >> > > example >> > > > at the case of Hadoop 1 where everything was one artifact and how >> much >> > > > easier it is with Hadoop 2 where we can use subcomponents as >> > > dependencies. >> > > > >> > > > What do you think? >> > > > >> > > > Stephan >> > > >> > >> > > |
Free forum by Nabble | Edit this page |