Clean up dependencies in streaming connectors

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Clean up dependencies in streaming connectors

Stephan Ewen
The streaming connectors currently pull a massive amount of dependencies.

For example, we transitively get the scala compiler/reflection/etc and
ZooKeeper.

A lot of stuff comes with flume and kafka. Are those required to make the
connectors work? Otherwise, it might be good to exclude them, to prevent
conflicts for users that actually depend on those components.
Reply | Threaded
Open this post in threaded view
|

Re: Clean up dependencies in streaming connectors

Gyula Fóra
Yes, you are right, kafka and flume are the heavy ones.

We always have the choice to take out them from the package and maybe have
a separate repo for all the different connectors and only keep 1-2 most
important ones. I don't think there's much else to do because we don't use
the packages you mentioned, but they get pulled by the kafka and flume
dependencies.




On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> wrote:

> The streaming connectors currently pull a massive amount of dependencies.
>
> For example, we transitively get the scala compiler/reflection/etc and
> ZooKeeper.
>
> A lot of stuff comes with flume and kafka. Are those required to make the
> connectors work? Otherwise, it might be good to exclude them, to prevent
> conflicts for users that actually depend on those components.
>
Reply | Threaded
Open this post in threaded view
|

Re: Clean up dependencies in streaming connectors

Stephan Ewen
You may be able to solve this with careful exclusions.

It seems kafka is monolithic, having no separation between connector and
engine. If you know for example that zookeeper is not required by the
connector (you have to be sure), you can exclude it as the dependency. We
have done this for Hadoop1, where we only use the HDFS client functionality.

On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]> wrote:

> Yes, you are right, kafka and flume are the heavy ones.
>
> We always have the choice to take out them from the package and maybe have
> a separate repo for all the different connectors and only keep 1-2 most
> important ones. I don't think there's much else to do because we don't use
> the packages you mentioned, but they get pulled by the kafka and flume
> dependencies.
>
>
>
>
> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> wrote:
>
> > The streaming connectors currently pull a massive amount of dependencies.
> >
> > For example, we transitively get the scala compiler/reflection/etc and
> > ZooKeeper.
> >
> > A lot of stuff comes with flume and kafka. Are those required to make the
> > connectors work? Otherwise, it might be good to exclude them, to prevent
> > conflicts for users that actually depend on those components.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Clean up dependencies in streaming connectors

Gyula Fóra
Thanks, I will look into this and try to figure it out, as you can see I am not a maven pro :)

On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote:

> You may be able to solve this with careful exclusions.
>
> It seems kafka is monolithic, having no separation between connector and
> engine. If you know for example that zookeeper is not required by the
> connector (you have to be sure), you can exclude it as the dependency. We
> have done this for Hadoop1, where we only use the HDFS client functionality.
>
> On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]> wrote:
>
>> Yes, you are right, kafka and flume are the heavy ones.
>>
>> We always have the choice to take out them from the package and maybe have
>> a separate repo for all the different connectors and only keep 1-2 most
>> important ones. I don't think there's much else to do because we don't use
>> the packages you mentioned, but they get pulled by the kafka and flume
>> dependencies.
>>
>>
>>
>>
>> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> wrote:
>>
>>> The streaming connectors currently pull a massive amount of dependencies.
>>>
>>> For example, we transitively get the scala compiler/reflection/etc and
>>> ZooKeeper.
>>>
>>> A lot of stuff comes with flume and kafka. Are those required to make the
>>> connectors work? Otherwise, it might be good to exclude them, to prevent
>>> conflicts for users that actually depend on those components.
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Clean up dependencies in streaming connectors

Stephan Ewen
Shipping the connectors with the job jars would thin out the dependencies,
but make it more cumbersome to assemble a job jar.

On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[hidden email]> wrote:

> Thanks, I will look into this and try to figure it out, as you can see I
> am not a maven pro :)
>
> On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote:
>
> > You may be able to solve this with careful exclusions.
> >
> > It seems kafka is monolithic, having no separation between connector and
> > engine. If you know for example that zookeeper is not required by the
> > connector (you have to be sure), you can exclude it as the dependency. We
> > have done this for Hadoop1, where we only use the HDFS client
> functionality.
> >
> > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]>
> wrote:
> >
> >> Yes, you are right, kafka and flume are the heavy ones.
> >>
> >> We always have the choice to take out them from the package and maybe
> have
> >> a separate repo for all the different connectors and only keep 1-2 most
> >> important ones. I don't think there's much else to do because we don't
> use
> >> the packages you mentioned, but they get pulled by the kafka and flume
> >> dependencies.
> >>
> >>
> >>
> >>
> >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]> wrote:
> >>
> >>> The streaming connectors currently pull a massive amount of
> dependencies.
> >>>
> >>> For example, we transitively get the scala compiler/reflection/etc and
> >>> ZooKeeper.
> >>>
> >>> A lot of stuff comes with flume and kafka. Are those required to make
> the
> >>> connectors work? Otherwise, it might be good to exclude them, to
> prevent
> >>> conflicts for users that actually depend on those components.
> >>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Clean up dependencies in streaming connectors

Márton Balassi
Good catch. Give me some time to deal with my fresh jet lag and we will
figure it out with Gyula. :)
On Sep 29, 2014 12:50 PM, "Stephan Ewen" <[hidden email]> wrote:

> Shipping the connectors with the job jars would thin out the dependencies,
> but make it more cumbersome to assemble a job jar.
>
> On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[hidden email]> wrote:
>
> > Thanks, I will look into this and try to figure it out, as you can see I
> > am not a maven pro :)
> >
> > On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote:
> >
> > > You may be able to solve this with careful exclusions.
> > >
> > > It seems kafka is monolithic, having no separation between connector
> and
> > > engine. If you know for example that zookeeper is not required by the
> > > connector (you have to be sure), you can exclude it as the dependency.
> We
> > > have done this for Hadoop1, where we only use the HDFS client
> > functionality.
> > >
> > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]>
> > wrote:
> > >
> > >> Yes, you are right, kafka and flume are the heavy ones.
> > >>
> > >> We always have the choice to take out them from the package and maybe
> > have
> > >> a separate repo for all the different connectors and only keep 1-2
> most
> > >> important ones. I don't think there's much else to do because we don't
> > use
> > >> the packages you mentioned, but they get pulled by the kafka and flume
> > >> dependencies.
> > >>
> > >>
> > >>
> > >>
> > >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]>
> wrote:
> > >>
> > >>> The streaming connectors currently pull a massive amount of
> > dependencies.
> > >>>
> > >>> For example, we transitively get the scala compiler/reflection/etc
> and
> > >>> ZooKeeper.
> > >>>
> > >>> A lot of stuff comes with flume and kafka. Are those required to make
> > the
> > >>> connectors work? Otherwise, it might be good to exclude them, to
> > prevent
> > >>> conflicts for users that actually depend on those components.
> > >>>
> > >>
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Clean up dependencies in streaming connectors

Stephan Ewen
Have a look at this PR and maybe build on top of it:
https://github.com/apache/incubator-flink/pull/133

On Mon, Sep 29, 2014 at 10:45 PM, Márton Balassi <[hidden email]>
wrote:

> Good catch. Give me some time to deal with my fresh jet lag and we will
> figure it out with Gyula. :)
> On Sep 29, 2014 12:50 PM, "Stephan Ewen" <[hidden email]> wrote:
>
> > Shipping the connectors with the job jars would thin out the
> dependencies,
> > but make it more cumbersome to assemble a job jar.
> >
> > On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[hidden email]>
> wrote:
> >
> > > Thanks, I will look into this and try to figure it out, as you can see
> I
> > > am not a maven pro :)
> > >
> > > On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote:
> > >
> > > > You may be able to solve this with careful exclusions.
> > > >
> > > > It seems kafka is monolithic, having no separation between connector
> > and
> > > > engine. If you know for example that zookeeper is not required by the
> > > > connector (you have to be sure), you can exclude it as the
> dependency.
> > We
> > > > have done this for Hadoop1, where we only use the HDFS client
> > > functionality.
> > > >
> > > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]>
> > > wrote:
> > > >
> > > >> Yes, you are right, kafka and flume are the heavy ones.
> > > >>
> > > >> We always have the choice to take out them from the package and
> maybe
> > > have
> > > >> a separate repo for all the different connectors and only keep 1-2
> > most
> > > >> important ones. I don't think there's much else to do because we
> don't
> > > use
> > > >> the packages you mentioned, but they get pulled by the kafka and
> flume
> > > >> dependencies.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]>
> > wrote:
> > > >>
> > > >>> The streaming connectors currently pull a massive amount of
> > > dependencies.
> > > >>>
> > > >>> For example, we transitively get the scala compiler/reflection/etc
> > and
> > > >>> ZooKeeper.
> > > >>>
> > > >>> A lot of stuff comes with flume and kafka. Are those required to
> make
> > > the
> > > >>> connectors work? Otherwise, it might be good to exclude them, to
> > > prevent
> > > >>> conflicts for users that actually depend on those components.
> > > >>>
> > > >>
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Clean up dependencies in streaming connectors

Márton Balassi
Gabor excluded the unnecessary recursive dependencies:

https://github.com/mbalassi/incubator-flink/commit/9caece6bef610cbebaeb538f5a358ce363c055b7#diff-127d25c59a9bb45f12aab41520d65d42R103

Scala e.g. was eliminated. We could not get rid of zookeeper by the way.

On Tue, Sep 30, 2014 at 11:16 AM, Stephan Ewen <[hidden email]> wrote:

> Have a look at this PR and maybe build on top of it:
> https://github.com/apache/incubator-flink/pull/133
>
> On Mon, Sep 29, 2014 at 10:45 PM, Márton Balassi <[hidden email]
> >
> wrote:
>
> > Good catch. Give me some time to deal with my fresh jet lag and we will
> > figure it out with Gyula. :)
> > On Sep 29, 2014 12:50 PM, "Stephan Ewen" <[hidden email]> wrote:
> >
> > > Shipping the connectors with the job jars would thin out the
> > dependencies,
> > > but make it more cumbersome to assemble a job jar.
> > >
> > > On Mon, Sep 29, 2014 at 6:47 PM, Gyula Fora <[hidden email]>
> > wrote:
> > >
> > > > Thanks, I will look into this and try to figure it out, as you can
> see
> > I
> > > > am not a maven pro :)
> > > >
> > > > On 29 Sep 2014, at 18:44, Stephan Ewen <[hidden email]> wrote:
> > > >
> > > > > You may be able to solve this with careful exclusions.
> > > > >
> > > > > It seems kafka is monolithic, having no separation between
> connector
> > > and
> > > > > engine. If you know for example that zookeeper is not required by
> the
> > > > > connector (you have to be sure), you can exclude it as the
> > dependency.
> > > We
> > > > > have done this for Hadoop1, where we only use the HDFS client
> > > > functionality.
> > > > >
> > > > > On Mon, Sep 29, 2014 at 6:40 PM, Gyula Fóra <[hidden email]>
> > > > wrote:
> > > > >
> > > > >> Yes, you are right, kafka and flume are the heavy ones.
> > > > >>
> > > > >> We always have the choice to take out them from the package and
> > maybe
> > > > have
> > > > >> a separate repo for all the different connectors and only keep 1-2
> > > most
> > > > >> important ones. I don't think there's much else to do because we
> > don't
> > > > use
> > > > >> the packages you mentioned, but they get pulled by the kafka and
> > flume
> > > > >> dependencies.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Mon, Sep 29, 2014 at 6:24 PM, Stephan Ewen <[hidden email]>
> > > wrote:
> > > > >>
> > > > >>> The streaming connectors currently pull a massive amount of
> > > > dependencies.
> > > > >>>
> > > > >>> For example, we transitively get the scala
> compiler/reflection/etc
> > > and
> > > > >>> ZooKeeper.
> > > > >>>
> > > > >>> A lot of stuff comes with flume and kafka. Are those required to
> > make
> > > > the
> > > > >>> connectors work? Otherwise, it might be good to exclude them, to
> > > > prevent
> > > > >>> conflicts for users that actually depend on those components.
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>