Classes naming conflict with Hadoop file system classes

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Classes naming conflict with Hadoop file system classes

Henry Saputra
When reviewing Robert's patch for generalizing Hadoop compatible FS I
just noticed there are some class names that are exactly the same as
Hadoop, such as FSDataInputStream or Configuration, which makes
programming a bit awkward when trying to use both in one class.
Hence we see a lot of full class named used such as
org.apache.hadoop.fs.FSDataInputStream.

Is the name conflicts intentional or just naming convention?
We could just Prefixed with "Flink" for all those classes that
conflict with Java IO or Hadoop. So instead of FSDataInputStream we
could have FlinkDataInputStream.

Thoughts?

- Henry
Reply | Threaded
Open this post in threaded view
|

Re: Classes naming conflict with Hadoop file system classes

Robert Metzger
Hi,

it is indeed a bit annoying to work in parts of the system where you need
classes from both systems. But I think thats "only" the case in the YARN,
Hadoop Compat and FileSystem code.
I'm not sure if its a good idea to rename classes such as "Configuration"
or "Path", which are used everywhere in the system. Having a
"FlinkConfiguration" or a "FlinkPath" is just annoying to type (you have to
at least write "FlinkCo" for the autocomplete to recognize it (instead of
"Co").

A cleaner approach would actually be to use the Hadoop classes itself in
our system (the FileSystem, Path, FSDataInputStream etc.). However after my
experience with the YARN client I'm against this. Hadoop had some annoying
bugs where some methods caused OutOfBoundsExceptions etc. and I had to
manually implement them (it was just a utility method).

Scala users can rename the classes at import time.

To sum it up: I'm against changing this right now. We have two 10k+ changes
pending and the cases where the names conflict are too rare.
But I'm open to change my mind if somebody has more arguments.



On Tue, Dec 16, 2014 at 7:28 AM, Henry Saputra <[hidden email]>
wrote:

>
> When reviewing Robert's patch for generalizing Hadoop compatible FS I
> just noticed there are some class names that are exactly the same as
> Hadoop, such as FSDataInputStream or Configuration, which makes
> programming a bit awkward when trying to use both in one class.
> Hence we see a lot of full class named used such as
> org.apache.hadoop.fs.FSDataInputStream.
>
> Is the name conflicts intentional or just naming convention?
> We could just Prefixed with "Flink" for all those classes that
> conflict with Java IO or Hadoop. So instead of FSDataInputStream we
> could have FlinkDataInputStream.
>
> Thoughts?
>
> - Henry
>