Ruben Laguna created FLINK-19903:
------------------------------------
Summary: Implement equivalent of Spark's f.input_file_name()
Key: FLINK-19903
URL:
https://issues.apache.org/jira/browse/FLINK-19903 Project: Flink
Issue Type: Improvement
Components: API / Core
Reporter: Ruben Laguna
Use case:
I have a dataset where they embedded some information in the filenames
(200k files) and I need to extract that as a new column.
In Spark I could `
.withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
but I don't see how can I do the same with Flink.
Apparently there is [FLIP-107|[
https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]] which would allow SQL connectors and formats to expose metadata.
So it would be great for the Filesystem SQL connector to expose the path.
Ideally for me the path could be exposed via a function that read the metadata. So I could write something akin to `SELECT input_file_name(),* FROM table1`
[1]: [
https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]
[2]:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html--
This message was sent by Atlassian Jira
(v8.3.4#803005)