Yun Tang created FLINK-11868:
--------------------------------
Summary: [filesystems] Introduce listStatusIterator API to file system
Key: FLINK-11868
URL:
https://issues.apache.org/jira/browse/FLINK-11868 Project: Flink
Issue Type: Improvement
Components: FileSystems
Reporter: Yun Tang
Assignee: Yun Tang
Fix For: 1.9.0
From existed experience, we know {{listStatus}} is expensive for many distributed file systems especially when the folder contains too many files. This method would not only block the thread until result is return but also could cause OOM due to the returned array of {{FileStatus}} is really large. I think we should learn it from FLINK-7266 and FLINK-8540.
However, list file status under a path is really helpful in many situations. Thankfully, many distributed file system noticed that and provide API such as {{[listStatusIterator|
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileSystem.html#listStatusIterator(org.apache.hadoop.fs.Path)]}} to call the file system on demand.
We should also introduce this API and replace current implementation which used previous {{listStatus}}.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)