[jira] [Created] (FLINK-2580) HadoopDataOutputStream does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-2580) HadoopDataOutputStream does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

Shang Yuanchun (Jira)
Arnaud Linz created FLINK-2580:
----------------------------------

             Summary: HadoopDataOutputStream does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
                 Key: FLINK-2580
                 URL: https://issues.apache.org/jira/browse/FLINK-2580
             Project: Flink
          Issue Type: Improvement
          Components: Hadoop Compatibility
            Reporter: Arnaud Linz
            Priority: Minor


I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a  HadoopDataOutputStream that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper).
 
However, FSDataOutputStream exposes many methods like flush,   getPos etc, but HadoopDataOutputStream only wraps write & close.
 
For instance, flush() calls the default, empty implementation of OutputStream instead of the hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface, hsync() and hflush() are not exposed to Flink.

I see two options:

- complete the class to wrap all methods of OutputStream and add a getWrappedStream() to access other stuff like hsync().

- get rid of the Hadoop wrapping and directly use Hadoop file system objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)