SerializableHadoopConfiguration

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

SerializableHadoopConfiguration

Sivaprasanna
Hi

The flink-sequence-file module has a class named
SerializableHadoopConfiguration[1] which is nothing but a wrapper class for
Hadoop Configuration. I believe this class can be moved to a common module
since this is not necessarily tightly coupled with sequence-file module,
and also because it can be used by many other modules, for ex.
flink-compress. Thoughts?

-
Sivaprasanna
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Arvid Heise-3
Hi Sivaprasanna,

we actually want to remove Hadoop from all core modules, so we could not
place it in some very common place like flink-core.

But I think the module flink-hadoop-fs could be a fitting place.

On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <[hidden email]>
wrote:

> Hi
>
> The flink-sequence-file module has a class named
> SerializableHadoopConfiguration[1] which is nothing but a wrapper class for
> Hadoop Configuration. I believe this class can be moved to a common module
> since this is not necessarily tightly coupled with sequence-file module,
> and also because it can be used by many other modules, for ex.
> flink-compress. Thoughts?
>
> -
> Sivaprasanna
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Sivaprasanna
Hi Arvid,

Thanks for the quick reply. Yes, it actually makes sense to avoid Hadoop
dependencies from getting into Flink's core modules but I also wonder if it
will be an overkill to add flink-hadoop-fs as a dependency just because we
want to use a utility class from that module.

-
Sivaprasanna

On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <[hidden email]> wrote:

> Hi Sivaprasanna,
>
> we actually want to remove Hadoop from all core modules, so we could not
> place it in some very common place like flink-core.
>
> But I think the module flink-hadoop-fs could be a fitting place.
>
> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <[hidden email]>
> wrote:
>
> > Hi
> >
> > The flink-sequence-file module has a class named
> > SerializableHadoopConfiguration[1] which is nothing but a wrapper class
> for
> > Hadoop Configuration. I believe this class can be moved to a common
> module
> > since this is not necessarily tightly coupled with sequence-file module,
> > and also because it can be used by many other modules, for ex.
> > flink-compress. Thoughts?
> >
> > -
> > Sivaprasanna
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Sivaprasanna
BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if any Flink
module is going to use Hadoop in any way, it will most probably include
flink-shaded-hadoop-2 as a dependency.
However, flink-shaded modules don't have any source files. Is that a strict
convention that the community follows?

-
Sivaprasanna

On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <[hidden email]>
wrote:

> Hi Arvid,
>
> Thanks for the quick reply. Yes, it actually makes sense to avoid Hadoop
> dependencies from getting into Flink's core modules but I also wonder if it
> will be an overkill to add flink-hadoop-fs as a dependency just because we
> want to use a utility class from that module.
>
> -
> Sivaprasanna
>
> On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <[hidden email]> wrote:
>
>> Hi Sivaprasanna,
>>
>> we actually want to remove Hadoop from all core modules, so we could not
>> place it in some very common place like flink-core.
>>
>> But I think the module flink-hadoop-fs could be a fitting place.
>>
>> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <[hidden email]>
>> wrote:
>>
>> > Hi
>> >
>> > The flink-sequence-file module has a class named
>> > SerializableHadoopConfiguration[1] which is nothing but a wrapper class
>> for
>> > Hadoop Configuration. I believe this class can be moved to a common
>> module
>> > since this is not necessarily tightly coupled with sequence-file module,
>> > and also because it can be used by many other modules, for ex.
>> > flink-compress. Thoughts?
>> >
>> > -
>> > Sivaprasanna
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Till Rohrmann
Hi Sivaprasanna,

we don't upload the source jars for the flink-shaded modules. However you
can build them yourself and install by cloning the flink-shaded repository
[1] and then call `mvn package -Dshade-sources`.

[1] https://github.com/apache/flink-shaded

Cheers,
Till

On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <[hidden email]>
wrote:

> BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if any Flink
> module is going to use Hadoop in any way, it will most probably include
> flink-shaded-hadoop-2 as a dependency.
> However, flink-shaded modules don't have any source files. Is that a strict
> convention that the community follows?
>
> -
> Sivaprasanna
>
> On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <[hidden email]>
> wrote:
>
> > Hi Arvid,
> >
> > Thanks for the quick reply. Yes, it actually makes sense to avoid Hadoop
> > dependencies from getting into Flink's core modules but I also wonder if
> it
> > will be an overkill to add flink-hadoop-fs as a dependency just because
> we
> > want to use a utility class from that module.
> >
> > -
> > Sivaprasanna
> >
> > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <[hidden email]> wrote:
> >
> >> Hi Sivaprasanna,
> >>
> >> we actually want to remove Hadoop from all core modules, so we could not
> >> place it in some very common place like flink-core.
> >>
> >> But I think the module flink-hadoop-fs could be a fitting place.
> >>
> >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <[hidden email]
> >
> >> wrote:
> >>
> >> > Hi
> >> >
> >> > The flink-sequence-file module has a class named
> >> > SerializableHadoopConfiguration[1] which is nothing but a wrapper
> class
> >> for
> >> > Hadoop Configuration. I believe this class can be moved to a common
> >> module
> >> > since this is not necessarily tightly coupled with sequence-file
> module,
> >> > and also because it can be used by many other modules, for ex.
> >> > flink-compress. Thoughts?
> >> >
> >> > -
> >> > Sivaprasanna
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Stephan Ewen
Do we have more cases of "common Hadoop Utils"?

If yes, does it make sense to create a "flink-hadoop-utils" module with
exactly such classes? It would have an optional dependency on
"flink-shaded-hadoop".

On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann <[hidden email]> wrote:

> Hi Sivaprasanna,
>
> we don't upload the source jars for the flink-shaded modules. However you
> can build them yourself and install by cloning the flink-shaded repository
> [1] and then call `mvn package -Dshade-sources`.
>
> [1] https://github.com/apache/flink-shaded
>
> Cheers,
> Till
>
> On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <[hidden email]>
> wrote:
>
> > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if any
> Flink
> > module is going to use Hadoop in any way, it will most probably include
> > flink-shaded-hadoop-2 as a dependency.
> > However, flink-shaded modules don't have any source files. Is that a
> strict
> > convention that the community follows?
> >
> > -
> > Sivaprasanna
> >
> > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <[hidden email]>
> > wrote:
> >
> > > Hi Arvid,
> > >
> > > Thanks for the quick reply. Yes, it actually makes sense to avoid
> Hadoop
> > > dependencies from getting into Flink's core modules but I also wonder
> if
> > it
> > > will be an overkill to add flink-hadoop-fs as a dependency just because
> > we
> > > want to use a utility class from that module.
> > >
> > > -
> > > Sivaprasanna
> > >
> > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <[hidden email]>
> wrote:
> > >
> > >> Hi Sivaprasanna,
> > >>
> > >> we actually want to remove Hadoop from all core modules, so we could
> not
> > >> place it in some very common place like flink-core.
> > >>
> > >> But I think the module flink-hadoop-fs could be a fitting place.
> > >>
> > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> [hidden email]
> > >
> > >> wrote:
> > >>
> > >> > Hi
> > >> >
> > >> > The flink-sequence-file module has a class named
> > >> > SerializableHadoopConfiguration[1] which is nothing but a wrapper
> > class
> > >> for
> > >> > Hadoop Configuration. I believe this class can be moved to a common
> > >> module
> > >> > since this is not necessarily tightly coupled with sequence-file
> > module,
> > >> > and also because it can be used by many other modules, for ex.
> > >> > flink-compress. Thoughts?
> > >> >
> > >> > -
> > >> > Sivaprasanna
> > >> >
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Sivaprasanna
Hi Stephen,

I guess it is a valid point to have something like 'flink-hadoop-utils'.
Maybe a [DISCUSS] thread can be started to understand what the community
thinks?

On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen <[hidden email]> wrote:

> Do we have more cases of "common Hadoop Utils"?
>
> If yes, does it make sense to create a "flink-hadoop-utils" module with
> exactly such classes? It would have an optional dependency on
> "flink-shaded-hadoop".
>
> On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann <[hidden email]> wrote:
>
> > Hi Sivaprasanna,
> >
> > we don't upload the source jars for the flink-shaded modules. However you
> > can build them yourself and install by cloning the flink-shaded
> repository
> > [1] and then call `mvn package -Dshade-sources`.
> >
> > [1] https://github.com/apache/flink-shaded
> >
> > Cheers,
> > Till
> >
> > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <[hidden email]>
> > wrote:
> >
> > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if any
> > Flink
> > > module is going to use Hadoop in any way, it will most probably include
> > > flink-shaded-hadoop-2 as a dependency.
> > > However, flink-shaded modules don't have any source files. Is that a
> > strict
> > > convention that the community follows?
> > >
> > > -
> > > Sivaprasanna
> > >
> > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
> [hidden email]>
> > > wrote:
> > >
> > > > Hi Arvid,
> > > >
> > > > Thanks for the quick reply. Yes, it actually makes sense to avoid
> > Hadoop
> > > > dependencies from getting into Flink's core modules but I also wonder
> > if
> > > it
> > > > will be an overkill to add flink-hadoop-fs as a dependency just
> because
> > > we
> > > > want to use a utility class from that module.
> > > >
> > > > -
> > > > Sivaprasanna
> > > >
> > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <[hidden email]>
> > wrote:
> > > >
> > > >> Hi Sivaprasanna,
> > > >>
> > > >> we actually want to remove Hadoop from all core modules, so we could
> > not
> > > >> place it in some very common place like flink-core.
> > > >>
> > > >> But I think the module flink-hadoop-fs could be a fitting place.
> > > >>
> > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> > [hidden email]
> > > >
> > > >> wrote:
> > > >>
> > > >> > Hi
> > > >> >
> > > >> > The flink-sequence-file module has a class named
> > > >> > SerializableHadoopConfiguration[1] which is nothing but a wrapper
> > > class
> > > >> for
> > > >> > Hadoop Configuration. I believe this class can be moved to a
> common
> > > >> module
> > > >> > since this is not necessarily tightly coupled with sequence-file
> > > module,
> > > >> > and also because it can be used by many other modules, for ex.
> > > >> > flink-compress. Thoughts?
> > > >> >
> > > >> > -
> > > >> > Sivaprasanna
> > > >> >
> > > >>
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

João Boto
We could merge the two modules into one?
sequence-files its another way of compressing files..


On 2020/03/05 13:02:46, Sivaprasanna <[hidden email]> wrote:

> Hi Stephen,
>
> I guess it is a valid point to have something like 'flink-hadoop-utils'.
> Maybe a [DISCUSS] thread can be started to understand what the community
> thinks?
>
> On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen <[hidden email]> wrote:
>
> > Do we have more cases of "common Hadoop Utils"?
> >
> > If yes, does it make sense to create a "flink-hadoop-utils" module with
> > exactly such classes? It would have an optional dependency on
> > "flink-shaded-hadoop".
> >
> > On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann <[hidden email]> wrote:
> >
> > > Hi Sivaprasanna,
> > >
> > > we don't upload the source jars for the flink-shaded modules. However you
> > > can build them yourself and install by cloning the flink-shaded
> > repository
> > > [1] and then call `mvn package -Dshade-sources`.
> > >
> > > [1] https://github.com/apache/flink-shaded
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <[hidden email]>
> > > wrote:
> > >
> > > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if any
> > > Flink
> > > > module is going to use Hadoop in any way, it will most probably include
> > > > flink-shaded-hadoop-2 as a dependency.
> > > > However, flink-shaded modules don't have any source files. Is that a
> > > strict
> > > > convention that the community follows?
> > > >
> > > > -
> > > > Sivaprasanna
> > > >
> > > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
> > [hidden email]>
> > > > wrote:
> > > >
> > > > > Hi Arvid,
> > > > >
> > > > > Thanks for the quick reply. Yes, it actually makes sense to avoid
> > > Hadoop
> > > > > dependencies from getting into Flink's core modules but I also wonder
> > > if
> > > > it
> > > > > will be an overkill to add flink-hadoop-fs as a dependency just
> > because
> > > > we
> > > > > want to use a utility class from that module.
> > > > >
> > > > > -
> > > > > Sivaprasanna
> > > > >
> > > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <[hidden email]>
> > > wrote:
> > > > >
> > > > >> Hi Sivaprasanna,
> > > > >>
> > > > >> we actually want to remove Hadoop from all core modules, so we could
> > > not
> > > > >> place it in some very common place like flink-core.
> > > > >>
> > > > >> But I think the module flink-hadoop-fs could be a fitting place.
> > > > >>
> > > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> > > [hidden email]
> > > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Hi
> > > > >> >
> > > > >> > The flink-sequence-file module has a class named
> > > > >> > SerializableHadoopConfiguration[1] which is nothing but a wrapper
> > > > class
> > > > >> for
> > > > >> > Hadoop Configuration. I believe this class can be moved to a
> > common
> > > > >> module
> > > > >> > since this is not necessarily tightly coupled with sequence-file
> > > > module,
> > > > >> > and also because it can be used by many other modules, for ex.
> > > > >> > flink-compress. Thoughts?
> > > > >> >
> > > > >> > -
> > > > >> > Sivaprasanna
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Sivaprasanna
That also makes sense but that, I believe, would be a breaking/major
change. If we are okay with merging them together, we can name something
like "flink-hadoop-compress" since SequenceFile is also a Hadoop format and
the existing "flink-compress" module, as of now, deals with Hadoop based
compression.

On Fri, Mar 6, 2020 at 1:33 AM João Boto <[hidden email]> wrote:

> We could merge the two modules into one?
> sequence-files its another way of compressing files..
>
>
> On 2020/03/05 13:02:46, Sivaprasanna <[hidden email]> wrote:
> > Hi Stephen,
> >
> > I guess it is a valid point to have something like 'flink-hadoop-utils'.
> > Maybe a [DISCUSS] thread can be started to understand what the community
> > thinks?
> >
> > On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen <[hidden email]> wrote:
> >
> > > Do we have more cases of "common Hadoop Utils"?
> > >
> > > If yes, does it make sense to create a "flink-hadoop-utils" module with
> > > exactly such classes? It would have an optional dependency on
> > > "flink-shaded-hadoop".
> > >
> > > On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann <[hidden email]>
> wrote:
> > >
> > > > Hi Sivaprasanna,
> > > >
> > > > we don't upload the source jars for the flink-shaded modules.
> However you
> > > > can build them yourself and install by cloning the flink-shaded
> > > repository
> > > > [1] and then call `mvn package -Dshade-sources`.
> > > >
> > > > [1] https://github.com/apache/flink-shaded
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if
> any
> > > > Flink
> > > > > module is going to use Hadoop in any way, it will most probably
> include
> > > > > flink-shaded-hadoop-2 as a dependency.
> > > > > However, flink-shaded modules don't have any source files. Is that
> a
> > > > strict
> > > > > convention that the community follows?
> > > > >
> > > > > -
> > > > > Sivaprasanna
> > > > >
> > > > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
> > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Hi Arvid,
> > > > > >
> > > > > > Thanks for the quick reply. Yes, it actually makes sense to avoid
> > > > Hadoop
> > > > > > dependencies from getting into Flink's core modules but I also
> wonder
> > > > if
> > > > > it
> > > > > > will be an overkill to add flink-hadoop-fs as a dependency just
> > > because
> > > > > we
> > > > > > want to use a utility class from that module.
> > > > > >
> > > > > > -
> > > > > > Sivaprasanna
> > > > > >
> > > > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <[hidden email]>
> > > > wrote:
> > > > > >
> > > > > >> Hi Sivaprasanna,
> > > > > >>
> > > > > >> we actually want to remove Hadoop from all core modules, so we
> could
> > > > not
> > > > > >> place it in some very common place like flink-core.
> > > > > >>
> > > > > >> But I think the module flink-hadoop-fs could be a fitting place.
> > > > > >>
> > > > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> > > > [hidden email]
> > > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi
> > > > > >> >
> > > > > >> > The flink-sequence-file module has a class named
> > > > > >> > SerializableHadoopConfiguration[1] which is nothing but a
> wrapper
> > > > > class
> > > > > >> for
> > > > > >> > Hadoop Configuration. I believe this class can be moved to a
> > > common
> > > > > >> module
> > > > > >> > since this is not necessarily tightly coupled with
> sequence-file
> > > > > module,
> > > > > >> > and also because it can be used by many other modules, for ex.
> > > > > >> > flink-compress. Thoughts?
> > > > > >> >
> > > > > >> > -
> > > > > >> > Sivaprasanna
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Till Rohrmann
Hi Sivaprasanna,

do you want to collect the set of Hadoop utility classes which could be
moved to a flink-hadoop-utils module and start a discuss thread about it? I
think this could be a first good step into cleaning up the module structure
a bit.

Cheers,
Till

On Fri, Mar 6, 2020 at 7:27 AM Sivaprasanna <[hidden email]>
wrote:

> That also makes sense but that, I believe, would be a breaking/major
> change. If we are okay with merging them together, we can name something
> like "flink-hadoop-compress" since SequenceFile is also a Hadoop format and
> the existing "flink-compress" module, as of now, deals with Hadoop based
> compression.
>
> On Fri, Mar 6, 2020 at 1:33 AM João Boto <[hidden email]> wrote:
>
> > We could merge the two modules into one?
> > sequence-files its another way of compressing files..
> >
> >
> > On 2020/03/05 13:02:46, Sivaprasanna <[hidden email]> wrote:
> > > Hi Stephen,
> > >
> > > I guess it is a valid point to have something like
> 'flink-hadoop-utils'.
> > > Maybe a [DISCUSS] thread can be started to understand what the
> community
> > > thinks?
> > >
> > > On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen <[hidden email]> wrote:
> > >
> > > > Do we have more cases of "common Hadoop Utils"?
> > > >
> > > > If yes, does it make sense to create a "flink-hadoop-utils" module
> with
> > > > exactly such classes? It would have an optional dependency on
> > > > "flink-shaded-hadoop".
> > > >
> > > > On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann <[hidden email]>
> > wrote:
> > > >
> > > > > Hi Sivaprasanna,
> > > > >
> > > > > we don't upload the source jars for the flink-shaded modules.
> > However you
> > > > > can build them yourself and install by cloning the flink-shaded
> > > > repository
> > > > > [1] and then call `mvn package -Dshade-sources`.
> > > > >
> > > > > [1] https://github.com/apache/flink-shaded
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask, if
> > any
> > > > > Flink
> > > > > > module is going to use Hadoop in any way, it will most probably
> > include
> > > > > > flink-shaded-hadoop-2 as a dependency.
> > > > > > However, flink-shaded modules don't have any source files. Is
> that
> > a
> > > > > strict
> > > > > > convention that the community follows?
> > > > > >
> > > > > > -
> > > > > > Sivaprasanna
> > > > > >
> > > > > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
> > > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Arvid,
> > > > > > >
> > > > > > > Thanks for the quick reply. Yes, it actually makes sense to
> avoid
> > > > > Hadoop
> > > > > > > dependencies from getting into Flink's core modules but I also
> > wonder
> > > > > if
> > > > > > it
> > > > > > > will be an overkill to add flink-hadoop-fs as a dependency just
> > > > because
> > > > > > we
> > > > > > > want to use a utility class from that module.
> > > > > > >
> > > > > > > -
> > > > > > > Sivaprasanna
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <
> [hidden email]>
> > > > > wrote:
> > > > > > >
> > > > > > >> Hi Sivaprasanna,
> > > > > > >>
> > > > > > >> we actually want to remove Hadoop from all core modules, so we
> > could
> > > > > not
> > > > > > >> place it in some very common place like flink-core.
> > > > > > >>
> > > > > > >> But I think the module flink-hadoop-fs could be a fitting
> place.
> > > > > > >>
> > > > > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> > > > > [hidden email]
> > > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi
> > > > > > >> >
> > > > > > >> > The flink-sequence-file module has a class named
> > > > > > >> > SerializableHadoopConfiguration[1] which is nothing but a
> > wrapper
> > > > > > class
> > > > > > >> for
> > > > > > >> > Hadoop Configuration. I believe this class can be moved to a
> > > > common
> > > > > > >> module
> > > > > > >> > since this is not necessarily tightly coupled with
> > sequence-file
> > > > > > module,
> > > > > > >> > and also because it can be used by many other modules, for
> ex.
> > > > > > >> > flink-compress. Thoughts?
> > > > > > >> >
> > > > > > >> > -
> > > > > > >> > Sivaprasanna
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Sivaprasanna
Hi Till,

Sure. I'll take a look and start a discuss thread soon.

Thanks,
Sivaprasanna

On Mon, Mar 16, 2020 at 4:01 PM Till Rohrmann <[hidden email]> wrote:

> Hi Sivaprasanna,
>
> do you want to collect the set of Hadoop utility classes which could be
> moved to a flink-hadoop-utils module and start a discuss thread about it? I
> think this could be a first good step into cleaning up the module structure
> a bit.
>
> Cheers,
> Till
>
> On Fri, Mar 6, 2020 at 7:27 AM Sivaprasanna <[hidden email]>
> wrote:
>
> > That also makes sense but that, I believe, would be a breaking/major
> > change. If we are okay with merging them together, we can name something
> > like "flink-hadoop-compress" since SequenceFile is also a Hadoop format
> and
> > the existing "flink-compress" module, as of now, deals with Hadoop based
> > compression.
> >
> > On Fri, Mar 6, 2020 at 1:33 AM João Boto <[hidden email]> wrote:
> >
> > > We could merge the two modules into one?
> > > sequence-files its another way of compressing files..
> > >
> > >
> > > On 2020/03/05 13:02:46, Sivaprasanna <[hidden email]>
> wrote:
> > > > Hi Stephen,
> > > >
> > > > I guess it is a valid point to have something like
> > 'flink-hadoop-utils'.
> > > > Maybe a [DISCUSS] thread can be started to understand what the
> > community
> > > > thinks?
> > > >
> > > > On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen <[hidden email]>
> wrote:
> > > >
> > > > > Do we have more cases of "common Hadoop Utils"?
> > > > >
> > > > > If yes, does it make sense to create a "flink-hadoop-utils" module
> > with
> > > > > exactly such classes? It would have an optional dependency on
> > > > > "flink-shaded-hadoop".
> > > > >
> > > > > On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann <[hidden email]
> >
> > > wrote:
> > > > >
> > > > > > Hi Sivaprasanna,
> > > > > >
> > > > > > we don't upload the source jars for the flink-shaded modules.
> > > However you
> > > > > > can build them yourself and install by cloning the flink-shaded
> > > > > repository
> > > > > > [1] and then call `mvn package -Dshade-sources`.
> > > > > >
> > > > > > [1] https://github.com/apache/flink-shaded
> > > > > >
> > > > > > Cheers,
> > > > > > Till
> > > > > >
> > > > > > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <
> > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask,
> if
> > > any
> > > > > > Flink
> > > > > > > module is going to use Hadoop in any way, it will most probably
> > > include
> > > > > > > flink-shaded-hadoop-2 as a dependency.
> > > > > > > However, flink-shaded modules don't have any source files. Is
> > that
> > > a
> > > > > > strict
> > > > > > > convention that the community follows?
> > > > > > >
> > > > > > > -
> > > > > > > Sivaprasanna
> > > > > > >
> > > > > > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
> > > > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Arvid,
> > > > > > > >
> > > > > > > > Thanks for the quick reply. Yes, it actually makes sense to
> > avoid
> > > > > > Hadoop
> > > > > > > > dependencies from getting into Flink's core modules but I
> also
> > > wonder
> > > > > > if
> > > > > > > it
> > > > > > > > will be an overkill to add flink-hadoop-fs as a dependency
> just
> > > > > because
> > > > > > > we
> > > > > > > > want to use a utility class from that module.
> > > > > > > >
> > > > > > > > -
> > > > > > > > Sivaprasanna
> > > > > > > >
> > > > > > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <
> > [hidden email]>
> > > > > > wrote:
> > > > > > > >
> > > > > > > >> Hi Sivaprasanna,
> > > > > > > >>
> > > > > > > >> we actually want to remove Hadoop from all core modules, so
> we
> > > could
> > > > > > not
> > > > > > > >> place it in some very common place like flink-core.
> > > > > > > >>
> > > > > > > >> But I think the module flink-hadoop-fs could be a fitting
> > place.
> > > > > > > >>
> > > > > > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
> > > > > > [hidden email]
> > > > > > > >
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hi
> > > > > > > >> >
> > > > > > > >> > The flink-sequence-file module has a class named
> > > > > > > >> > SerializableHadoopConfiguration[1] which is nothing but a
> > > wrapper
> > > > > > > class
> > > > > > > >> for
> > > > > > > >> > Hadoop Configuration. I believe this class can be moved
> to a
> > > > > common
> > > > > > > >> module
> > > > > > > >> > since this is not necessarily tightly coupled with
> > > sequence-file
> > > > > > > module,
> > > > > > > >> > and also because it can be used by many other modules, for
> > ex.
> > > > > > > >> > flink-compress. Thoughts?
> > > > > > > >> >
> > > > > > > >> > -
> > > > > > > >> > Sivaprasanna
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: SerializableHadoopConfiguration

Sivaprasanna
Till, Stephen, & Others,

I have created a discuss thread a few days back. Attaching the link here.
Appreciate if you could take a look.
https://lists.apache.org/thread.html/rf885987160bede5911a7f61923307a6d5ae07f850da0a90555728e5f%40%3Cdev.flink.apache.org%3E

Please let me know if you want me to improve/edit the content to make it
better.

Thanks,
Sivaprasanna

On Tue, Mar 17, 2020 at 8:22 PM Sivaprasanna <[hidden email]>
wrote:

> Hi Till,
>
> Sure. I'll take a look and start a discuss thread soon.
>
> Thanks,
> Sivaprasanna
>
> On Mon, Mar 16, 2020 at 4:01 PM Till Rohrmann <[hidden email]>
> wrote:
>
>> Hi Sivaprasanna,
>>
>> do you want to collect the set of Hadoop utility classes which could be
>> moved to a flink-hadoop-utils module and start a discuss thread about it?
>> I
>> think this could be a first good step into cleaning up the module
>> structure
>> a bit.
>>
>> Cheers,
>> Till
>>
>> On Fri, Mar 6, 2020 at 7:27 AM Sivaprasanna <[hidden email]>
>> wrote:
>>
>> > That also makes sense but that, I believe, would be a breaking/major
>> > change. If we are okay with merging them together, we can name something
>> > like "flink-hadoop-compress" since SequenceFile is also a Hadoop format
>> and
>> > the existing "flink-compress" module, as of now, deals with Hadoop based
>> > compression.
>> >
>> > On Fri, Mar 6, 2020 at 1:33 AM João Boto <[hidden email]> wrote:
>> >
>> > > We could merge the two modules into one?
>> > > sequence-files its another way of compressing files..
>> > >
>> > >
>> > > On 2020/03/05 13:02:46, Sivaprasanna <[hidden email]>
>> wrote:
>> > > > Hi Stephen,
>> > > >
>> > > > I guess it is a valid point to have something like
>> > 'flink-hadoop-utils'.
>> > > > Maybe a [DISCUSS] thread can be started to understand what the
>> > community
>> > > > thinks?
>> > > >
>> > > > On Thu, Mar 5, 2020 at 4:22 PM Stephan Ewen <[hidden email]>
>> wrote:
>> > > >
>> > > > > Do we have more cases of "common Hadoop Utils"?
>> > > > >
>> > > > > If yes, does it make sense to create a "flink-hadoop-utils" module
>> > with
>> > > > > exactly such classes? It would have an optional dependency on
>> > > > > "flink-shaded-hadoop".
>> > > > >
>> > > > > On Wed, Mar 4, 2020 at 9:12 AM Till Rohrmann <
>> [hidden email]>
>> > > wrote:
>> > > > >
>> > > > > > Hi Sivaprasanna,
>> > > > > >
>> > > > > > we don't upload the source jars for the flink-shaded modules.
>> > > However you
>> > > > > > can build them yourself and install by cloning the flink-shaded
>> > > > > repository
>> > > > > > [1] and then call `mvn package -Dshade-sources`.
>> > > > > >
>> > > > > > [1] https://github.com/apache/flink-shaded
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Till
>> > > > > >
>> > > > > > On Tue, Mar 3, 2020 at 6:29 PM Sivaprasanna <
>> > > [hidden email]>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > BTW, can we leverage flink-shaded-hadoop-2? Reason why I ask,
>> if
>> > > any
>> > > > > > Flink
>> > > > > > > module is going to use Hadoop in any way, it will most
>> probably
>> > > include
>> > > > > > > flink-shaded-hadoop-2 as a dependency.
>> > > > > > > However, flink-shaded modules don't have any source files. Is
>> > that
>> > > a
>> > > > > > strict
>> > > > > > > convention that the community follows?
>> > > > > > >
>> > > > > > > -
>> > > > > > > Sivaprasanna
>> > > > > > >
>> > > > > > > On Tue, Mar 3, 2020 at 10:48 PM Sivaprasanna <
>> > > > > [hidden email]>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hi Arvid,
>> > > > > > > >
>> > > > > > > > Thanks for the quick reply. Yes, it actually makes sense to
>> > avoid
>> > > > > > Hadoop
>> > > > > > > > dependencies from getting into Flink's core modules but I
>> also
>> > > wonder
>> > > > > > if
>> > > > > > > it
>> > > > > > > > will be an overkill to add flink-hadoop-fs as a dependency
>> just
>> > > > > because
>> > > > > > > we
>> > > > > > > > want to use a utility class from that module.
>> > > > > > > >
>> > > > > > > > -
>> > > > > > > > Sivaprasanna
>> > > > > > > >
>> > > > > > > > On Tue, Mar 3, 2020 at 4:17 PM Arvid Heise <
>> > [hidden email]>
>> > > > > > wrote:
>> > > > > > > >
>> > > > > > > >> Hi Sivaprasanna,
>> > > > > > > >>
>> > > > > > > >> we actually want to remove Hadoop from all core modules,
>> so we
>> > > could
>> > > > > > not
>> > > > > > > >> place it in some very common place like flink-core.
>> > > > > > > >>
>> > > > > > > >> But I think the module flink-hadoop-fs could be a fitting
>> > place.
>> > > > > > > >>
>> > > > > > > >> On Tue, Mar 3, 2020 at 11:25 AM Sivaprasanna <
>> > > > > > [hidden email]
>> > > > > > > >
>> > > > > > > >> wrote:
>> > > > > > > >>
>> > > > > > > >> > Hi
>> > > > > > > >> >
>> > > > > > > >> > The flink-sequence-file module has a class named
>> > > > > > > >> > SerializableHadoopConfiguration[1] which is nothing but a
>> > > wrapper
>> > > > > > > class
>> > > > > > > >> for
>> > > > > > > >> > Hadoop Configuration. I believe this class can be moved
>> to a
>> > > > > common
>> > > > > > > >> module
>> > > > > > > >> > since this is not necessarily tightly coupled with
>> > > sequence-file
>> > > > > > > module,
>> > > > > > > >> > and also because it can be used by many other modules,
>> for
>> > ex.
>> > > > > > > >> > flink-compress. Thoughts?
>> > > > > > > >> >
>> > > > > > > >> > -
>> > > > > > > >> > Sivaprasanna
>> > > > > > > >> >
>> > > > > > > >>
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>