[DISCUSS] Add Bucket File System Connector

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Add Bucket File System Connector

zhangjun
Hi,everyone:
      In the current flink system, use flink sql to read data and then write it to File System with kind of formats is not supported, the current File System Connector is only experimental [1], so I have developed a new File System Connector.
       Thanks to the suggestion of Kurt and Fabian, I carefully studied the design documentation of FLIP-63, redesigned this feature, enriched the functionality of the existing File System Connector, and add partition support. Users can add this File System Connector by using code or DDL, and then use flink sql to write data to the file system.
       We can treat it as a sub-task of FLIP-63. I wrote a design document and put it in google docs [2].
       I hope everyone will give me some more suggestion, thank you very much.。
     
[1].https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#file-system-connector
[2].https://docs.google.com/document/d/1R5K_tKgy1MhqhQmolGD_hKnEAKglfeHRDa2f4tB-xew/edit?usp=sharing
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Add Bucket File System Connector

JingsongLee-2
Hi jun:

Sorry for the late reply,  I share my thoughts on StreamingFileSink in FLIP-63 [1] and I don't recommend using StreamingFileSink to support partitioning in Table.
1.The bucket concept and SQL's bucket concept are in serious conflict.[2]
2.In table, we need support single-partition writing, grouped multi-partition writing, non-grouped multi-partition writing.
3.We need a global role to commit files to metastore.
4.We need an abstraction to support both streaming and batch mode.
5.Table partition is simpler than StreamingFileSink, the concept of partitioning is that we only support partition references on fields, rather than being as flexible as runtime.

The DDL can like this:
CREATE TABLE USER_T (
  a INT,
  b STRING,
  c DOUBLE
) PARTITIONED BY (date STRING, country STRING)
WITH (
  'connector.type' = ‘filesystem’,
  'connector.path' = 'hdfs:///tmp/xxx',
  'format.type' = 'csv',
  'update-mode' = 'append',
'partition-support' = 'true'
 )
In SQL world, we can only support row inputs.
The only difference from the previous FileSystem is that the partition-support attribute is required. We can use this identifier to represent the new connector support partition without changing the previous connector.
Other attributes can be completely consistent. We can add parquet, Orc and other formats incrementally later.

[1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-63-Rework-table-partition-support-td32770.html
[2] https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL+BucketedTables


------------------------------------------------------------------
From:Jun Zhang <[hidden email]>
Send Time:2019年9月22日(星期日) 23:18
To:dev <[hidden email]>
Cc:Kurt Young <[hidden email]>; [hidden email] <[hidden email]>
Subject:[DISCUSS] Add Bucket File System Connector

Hi,everyone:
&nbsp; &nbsp; &nbsp; In the current flink system, use flink sql to read data and then write it to File System with kind of formats is not supported, the current File System Connector is only experimental [1], so I have developed a new File System Connector.
&nbsp; &nbsp; &nbsp; &nbsp;Thanks to the suggestion of Kurt and Fabian, I carefully studied the design documentation of FLIP-63, redesigned this feature, enriched the functionality of the existing File System Connector, and add partition support. Users can add this File System Connector by using code or DDL, and then use flink sql to write data to the file system.
&nbsp; &nbsp; &nbsp; &nbsp;We can treat it as a sub-task of FLIP-63. I wrote a design document and put it in google docs [2].
&nbsp; &nbsp; &nbsp; &nbsp;I hope everyone will give me some more suggestion, thank you very much.。
&nbsp; &nbsp; &nbsp;
[1].https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html#file-system-connector
[2].https://docs.google.com/document/d/1R5K_tKgy1MhqhQmolGD_hKnEAKglfeHRDa2f4tB-xew/edit?usp=sharing