[jira] [Created] (FLINK-19903) Implement equivalent of Spark's f.input_file_name()

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19903) Implement equivalent of Spark's f.input_file_name()

Shang Yuanchun (Jira)
Ruben Laguna created FLINK-19903:
------------------------------------

             Summary: Implement equivalent of Spark's f.input_file_name()
                 Key: FLINK-19903
                 URL: https://issues.apache.org/jira/browse/FLINK-19903
             Project: Flink
          Issue Type: Improvement
          Components: API / Core
            Reporter: Ruben Laguna


Use case: 

I have a dataset where they embedded some information in the filenames
(200k files) and I need to extract that as a new column.

In Spark I could `
.withColumn("id",f.split(f.reverse(f.split(f.input_file_name(),'/'))[0],'\.')[0])`
 but I don't see how can I do the same with Flink.

 

Apparently there is [FLIP-107|[https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]] which would allow SQL connectors and formats to expose metadata. 

 

So it would be great for the Filesystem SQL connector to expose the path. 

Ideally for me the path could be exposed via a function that read the metadata. So I could write  something akin to `SELECT input_file_name(),* FROM table1`

 

 

[1]: [https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Handling+of+metadata+in+SQL+connectors]

[2]: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-I-get-the-filename-as-a-column-td39096.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)