[jira] [Created] (FLINK-13852) Support storing in-progress/pending files in different directories (StreamingFileSink)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-13852) Support storing in-progress/pending files in different directories (StreamingFileSink)

Shang Yuanchun (Jira)
Gyula Fora created FLINK-13852:
----------------------------------

             Summary: Support storing in-progress/pending files in different directories (StreamingFileSink)
                 Key: FLINK-13852
                 URL: https://issues.apache.org/jira/browse/FLINK-13852
             Project: Flink
          Issue Type: New Feature
          Components: Connectors / FileSystem
            Reporter: Gyula Fora


Currently in-progress and pending files are stored in the same directory as the final output file. This can be problematic depending on the usage of the final output files. One example would be loading the data to hive where we can only load all files in a certain directory.

I suggest we allow specifying a Pending/Inprogress base path where we create the same bucketing structure as the final files to store only the non-final files.

To support this we need to extend the RecoverableWriter interface with a new open method for example:

RecoverableFsDataOutputStream open(Path path, Path tmpPath) throws IOException;



--
This message was sent by Atlassian Jira
(v8.3.2#803003)