Angel Barragán created FLINK-16544:
--------------------------------------
Summary: Flink FileSystem for web.uploadDir
Key: FLINK-16544
URL:
https://issues.apache.org/jira/browse/FLINK-16544 Project: Flink
Issue Type: Improvement
Components: API / Core
Affects Versions: 1.10.0
Reporter: Angel Barragán
Currently the configuration properties "web.upload.dir" and "web.upload.dir" only supports paths on the local filesystem. When we deploy Flink under another cluster environment like yarn, it is more useful to be able to configure those directories to be on HDFS, so the size and maintenance tasks are easier, than trying to find out on which node yarn has launched the Jobmanager task, and manage the upload directory there.
In my concrete case, I found this management (let's say disadvantage) creating an AWS EMR cluster with Flink, where the default configuration creates this directory under /tmp on the local filesystem of the CORE node where the JobManager is deployed by Yarn. We found that EMR cluster is also configured to fully empty /tmp on a month basis, removing the upload directory for Flink, and in that case makigng Flink to fail when you try to submit a new Job. We had to recreate the directory manually.
The first solution I tried is to change the above configuration properties to use hdfs like we did with configuration property "state.checkpoints.dir", and we found it doesn't work on yarn environment. So I checked Flink code to see how this configuration is being used and found it is the local file system.
I think, that this solution would be an improvement on the management for Flink when running on another Cluster environment where we can use a shared storage like HDFS or S3.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)