The streaming connectors currently pull a massive amount of dependencies.
For example, we transitively get the scala compiler/reflection/etc and ZooKeeper. A lot of stuff comes with flume and kafka. Are those required to make the connectors work? Otherwise, it might be good to exclude them, to prevent conflicts for users that actually depend on those components. |
Yes, you are right, kafka and flume are the heavy ones.
We always have the choice to take out them from the package and maybe have a separate repo for all the different connectors and only keep 1-2 most important ones. I don't think there's much else to do because we don't use the packages you mentioned, but they get pulled by the kafka and flume dependencies. On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> wrote: > The streaming connectors currently pull a massive amount of dependencies. > > For example, we transitively get the scala compiler/reflection/etc and > ZooKeeper. > > A lot of stuff comes with flume and kafka. Are those required to make the > connectors work? Otherwise, it might be good to exclude them, to prevent > conflicts for users that actually depend on those components. > |
You may be able to solve this with careful exclusions.
It seems kafka is monolithic, having no separation between connector and engine. If you know for example that zookeeper is not required by the connector (you have to be sure), you can exclude it as the dependency. We have done this for Hadoop1, where we only use the HDFS client functionality. On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]> wrote: > Yes, you are right, kafka and flume are the heavy ones. > > We always have the choice to take out them from the package and maybe have > a separate repo for all the different connectors and only keep 1-2 most > important ones. I don't think there's much else to do because we don't use > the packages you mentioned, but they get pulled by the kafka and flume > dependencies. > > > > > On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> wrote: > > > The streaming connectors currently pull a massive amount of dependencies. > > > > For example, we transitively get the scala compiler/reflection/etc and > > ZooKeeper. > > > > A lot of stuff comes with flume and kafka. Are those required to make the > > connectors work? Otherwise, it might be good to exclude them, to prevent > > conflicts for users that actually depend on those components. > > > |
Thanks, I will look into this and try to figure it out, as you can see I am not a maven pro :)
On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote: > You may be able to solve this with careful exclusions. > > It seems kafka is monolithic, having no separation between connector and > engine. If you know for example that zookeeper is not required by the > connector (you have to be sure), you can exclude it as the dependency. We > have done this for Hadoop1, where we only use the HDFS client functionality. > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]> wrote: > >> Yes, you are right, kafka and flume are the heavy ones. >> >> We always have the choice to take out them from the package and maybe have >> a separate repo for all the different connectors and only keep 1-2 most >> important ones. I don't think there's much else to do because we don't use >> the packages you mentioned, but they get pulled by the kafka and flume >> dependencies. >> >> >> >> >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> wrote: >> >>> The streaming connectors currently pull a massive amount of dependencies. >>> >>> For example, we transitively get the scala compiler/reflection/etc and >>> ZooKeeper. >>> >>> A lot of stuff comes with flume and kafka. Are those required to make the >>> connectors work? Otherwise, it might be good to exclude them, to prevent >>> conflicts for users that actually depend on those components. >>> >> |
Shipping the connectors with the job jars would thin out the dependencies,
but make it more cumbersome to assemble a job jar. On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[hidden email]> wrote: > Thanks, I will look into this and try to figure it out, as you can see I > am not a maven pro :) > > On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote: > > > You may be able to solve this with careful exclusions. > > > > It seems kafka is monolithic, having no separation between connector and > > engine. If you know for example that zookeeper is not required by the > > connector (you have to be sure), you can exclude it as the dependency. We > > have done this for Hadoop1, where we only use the HDFS client > functionality. > > > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]> > wrote: > > > >> Yes, you are right, kafka and flume are the heavy ones. > >> > >> We always have the choice to take out them from the package and maybe > have > >> a separate repo for all the different connectors and only keep 1-2 most > >> important ones. I don't think there's much else to do because we don't > use > >> the packages you mentioned, but they get pulled by the kafka and flume > >> dependencies. > >> > >> > >> > >> > >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> wrote: > >> > >>> The streaming connectors currently pull a massive amount of > dependencies. > >>> > >>> For example, we transitively get the scala compiler/reflection/etc and > >>> ZooKeeper. > >>> > >>> A lot of stuff comes with flume and kafka. Are those required to make > the > >>> connectors work? Otherwise, it might be good to exclude them, to > prevent > >>> conflicts for users that actually depend on those components. > >>> > >> > > |
Good catch. Give me some time to deal with my fresh jet lag and we will
figure it out with Gyula. :) On Sep 29, 2014 12:50 PM, "Stephan Ewen" <[hidden email]> wrote: > Shipping the connectors with the job jars would thin out the dependencies, > but make it more cumbersome to assemble a job jar. > > On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[hidden email]> wrote: > > > Thanks, I will look into this and try to figure it out, as you can see I > > am not a maven pro :) > > > > On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote: > > > > > You may be able to solve this with careful exclusions. > > > > > > It seems kafka is monolithic, having no separation between connector > and > > > engine. If you know for example that zookeeper is not required by the > > > connector (you have to be sure), you can exclude it as the dependency. > We > > > have done this for Hadoop1, where we only use the HDFS client > > functionality. > > > > > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]> > > wrote: > > > > > >> Yes, you are right, kafka and flume are the heavy ones. > > >> > > >> We always have the choice to take out them from the package and maybe > > have > > >> a separate repo for all the different connectors and only keep 1-2 > most > > >> important ones. I don't think there's much else to do because we don't > > use > > >> the packages you mentioned, but they get pulled by the kafka and flume > > >> dependencies. > > >> > > >> > > >> > > >> > > >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> > wrote: > > >> > > >>> The streaming connectors currently pull a massive amount of > > dependencies. > > >>> > > >>> For example, we transitively get the scala compiler/reflection/etc > and > > >>> ZooKeeper. > > >>> > > >>> A lot of stuff comes with flume and kafka. Are those required to make > > the > > >>> connectors work? Otherwise, it might be good to exclude them, to > > prevent > > >>> conflicts for users that actually depend on those components. > > >>> > > >> > > > > > |
Have a look at this PR and maybe build on top of it:
https://github.com/apache/incubator-flink/pull/133 On Mon, Sep 29, 2014 at 10:45 PM, Márton Balassi <[hidden email]> wrote: > Good catch. Give me some time to deal with my fresh jet lag and we will > figure it out with Gyula. :) > On Sep 29, 2014 12:50 PM, "Stephan Ewen" <[hidden email]> wrote: > > > Shipping the connectors with the job jars would thin out the > dependencies, > > but make it more cumbersome to assemble a job jar. > > > > On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[hidden email]> > wrote: > > > > > Thanks, I will look into this and try to figure it out, as you can see > I > > > am not a maven pro :) > > > > > > On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote: > > > > > > > You may be able to solve this with careful exclusions. > > > > > > > > It seems kafka is monolithic, having no separation between connector > > and > > > > engine. If you know for example that zookeeper is not required by the > > > > connector (you have to be sure), you can exclude it as the > dependency. > > We > > > > have done this for Hadoop1, where we only use the HDFS client > > > functionality. > > > > > > > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]> > > > wrote: > > > > > > > >> Yes, you are right, kafka and flume are the heavy ones. > > > >> > > > >> We always have the choice to take out them from the package and > maybe > > > have > > > >> a separate repo for all the different connectors and only keep 1-2 > > most > > > >> important ones. I don't think there's much else to do because we > don't > > > use > > > >> the packages you mentioned, but they get pulled by the kafka and > flume > > > >> dependencies. > > > >> > > > >> > > > >> > > > >> > > > >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> > > wrote: > > > >> > > > >>> The streaming connectors currently pull a massive amount of > > > dependencies. > > > >>> > > > >>> For example, we transitively get the scala compiler/reflection/etc > > and > > > >>> ZooKeeper. > > > >>> > > > >>> A lot of stuff comes with flume and kafka. Are those required to > make > > > the > > > >>> connectors work? Otherwise, it might be good to exclude them, to > > > prevent > > > >>> conflicts for users that actually depend on those components. > > > >>> > > > >> > > > > > > > > > |
Gabor excluded the unnecessary recursive dependencies:
https://github.com/mbalassi/incubator-flink/commit/9caece6bef610cbebaeb538f5a358ce363c055b7#diff-127d25c59a9bb45f12aab41520d65d42R103 Scala e.g. was eliminated. We could not get rid of zookeeper by the way. On Tue, Sep 30, 2014 at 11:16 AM, Stephan Ewen <[hidden email]> wrote: > Have a look at this PR and maybe build on top of it: > https://github.com/apache/incubator-flink/pull/133 > > On Mon, Sep 29, 2014 at 10:45 PM, Márton Balassi <[hidden email] > > > wrote: > > > Good catch. Give me some time to deal with my fresh jet lag and we will > > figure it out with Gyula. :) > > On Sep 29, 2014 12:50 PM, "Stephan Ewen" <[hidden email]> wrote: > > > > > Shipping the connectors with the job jars would thin out the > > dependencies, > > > but make it more cumbersome to assemble a job jar. > > > > > > On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[hidden email]> > > wrote: > > > > > > > Thanks, I will look into this and try to figure it out, as you can > see > > I > > > > am not a maven pro :) > > > > > > > > On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote: > > > > > > > > > You may be able to solve this with careful exclusions. > > > > > > > > > > It seems kafka is monolithic, having no separation between > connector > > > and > > > > > engine. If you know for example that zookeeper is not required by > the > > > > > connector (you have to be sure), you can exclude it as the > > dependency. > > > We > > > > > have done this for Hadoop1, where we only use the HDFS client > > > > functionality. > > > > > > > > > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]> > > > > wrote: > > > > > > > > > >> Yes, you are right, kafka and flume are the heavy ones. > > > > >> > > > > >> We always have the choice to take out them from the package and > > maybe > > > > have > > > > >> a separate repo for all the different connectors and only keep 1-2 > > > most > > > > >> important ones. I don't think there's much else to do because we > > don't > > > > use > > > > >> the packages you mentioned, but they get pulled by the kafka and > > flume > > > > >> dependencies. > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> > > > wrote: > > > > >> > > > > >>> The streaming connectors currently pull a massive amount of > > > > dependencies. > > > > >>> > > > > >>> For example, we transitively get the scala > compiler/reflection/etc > > > and > > > > >>> ZooKeeper. > > > > >>> > > > > >>> A lot of stuff comes with flume and kafka. Are those required to > > make > > > > the > > > > >>> connectors work? Otherwise, it might be good to exclude them, to > > > > prevent > > > > >>> conflicts for users that actually depend on those components. > > > > >>> > > > > >> > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |