[jira] [Created] (FLINK-11895) Allow FileSystem Configs to be altered at Runtime

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11895) Allow FileSystem Configs to be altered at Runtime

Shang Yuanchun (Jira)
Luka Jurukovski created FLINK-11895:
---------------------------------------

             Summary: Allow FileSystem Configs to be altered at Runtime
                 Key: FLINK-11895
                 URL: https://issues.apache.org/jira/browse/FLINK-11895
             Project: Flink
          Issue Type: Improvement
            Reporter: Luka Jurukovski


This stems from a need to be able to pass in S3 auth keys at runtime in order to allow users to specify the keys they want to use. Based on the documentation it seems that currently S3 keys need to be part of the Flink cluster configuration, in a hadoop file (which the cluster needs to pointed to) or JVM args.


This only seems to apply to the streaming API. Also Feel free to correct the following if I am wrong, as there may be pieces I have no run across, or parts of the code I have misunderstood.


Currently it seems that FileSystems are inferred based on the extension type and a set of cached Filesystems that are generated in the background. These seem to use the config as defined at the time they are stood up. Unfortunately there is no way to tap into this control mechanism or override this behavior as many places in the code pulls from this cache. This is particularly painful in the sink instance as there are places where this is used that are not accessible outside the package it is implemented.

Through a pretty hacky mechanism I have proved out that this is a self imposed limitation, as I was able to change the code to pass in a Filesystem from the top level and have it read and write to S3 given keys I set at runtime.

The current methodology is convenient, however there should be finer grain controls for instances where the cluster is in a multitenant environment.

As a final note it seems like both the FileSystem and FileSystemFactory classes are not Serializable. I can see why this would be the case in former, but I am not clear as to why a factory class would not be Serializable (like in the case of BucketFactory). If this can be made serializable this should make this a much cleaner process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)