[jira] [Created] (FLINK-12003) Revert the config option about mapreduce.output.basename in HadoopOutputFormatBase

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-12003) Revert the config option about mapreduce.output.basename in HadoopOutputFormatBase

Shang Yuanchun (Jira)
vinoyang created FLINK-12003:
--------------------------------

             Summary: Revert the config option about mapreduce.output.basename in HadoopOutputFormatBase
                 Key: FLINK-12003
                 URL: https://issues.apache.org/jira/browse/FLINK-12003
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Hadoop Compatibility
            Reporter: vinoyang
            Assignee: vinoyang


In {{HadoopOutputFormatBase}} open method, the config option {{mapreduce.output.basename}} was changed to "tmp" and there is not any documentation state this change.

By default, HDFS will use this format "part-x-yyyyy" to name its file, the x and y means : 
 * {{x}} is either 'm' or 'r', depending on whether the job was a map only job, or reduce
 * {{yyyyy}} is the mapper or reducer task number (zero based)

 

The keyword "part" has used in many place in user's business logic to match the hdfs's file name. So I suggest to revert this config option or document it.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)