[jira] [Created] (FLINK-11868) [filesystems] Introduce listStatusIterator API to file system

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11868) [filesystems] Introduce listStatusIterator API to file system

Shang Yuanchun (Jira)
Yun Tang created FLINK-11868:
--------------------------------

             Summary: [filesystems] Introduce listStatusIterator API to file system
                 Key: FLINK-11868
                 URL: https://issues.apache.org/jira/browse/FLINK-11868
             Project: Flink
          Issue Type: Improvement
          Components: FileSystems
            Reporter: Yun Tang
            Assignee: Yun Tang
             Fix For: 1.9.0


From existed experience, we know {{listStatus}} is expensive for many distributed file systems especially when the folder contains too many files. This method would not only block the thread until result is return but also could cause OOM due to the returned array of {{FileStatus}} is really large. I think we should learn it from FLINK-7266 and FLINK-8540.

However, list file status under a path is really helpful in many situations. Thankfully, many distributed file system noticed that and provide API such as {{[listStatusIterator|https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#listStatusIterator(org.apache.hadoop.fs.Path)]}} to call the file system on demand.

 

We should also introduce this API and replace current implementation which used previous {{listStatus}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)