[jira] [Created] (FLINK-13027) StreamingFileSink bulk-encoded writer supports file rolling upon customized events

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-13027) StreamingFileSink bulk-encoded writer supports file rolling upon customized events

Shang Yuanchun (Jira)
Ying Xu created FLINK-13027:
-------------------------------

             Summary: StreamingFileSink bulk-encoded writer supports file rolling upon customized events
                 Key: FLINK-13027
                 URL: https://issues.apache.org/jira/browse/FLINK-13027
             Project: Flink
          Issue Type: New Feature
          Components: API / DataStream
            Reporter: Ying Xu


When writing in bulk-encoded format such as Parquet, StreamingFileSink only supports OnCheckpointRollingPolicy, which rolls file at checkpointing time.    

In many scenarios, it is beneficial that the sink can roll file upon certain events, for example, when the file size reaches a limit. Such a rolling policy can also potentially alleviate some of the side effects of OnCheckpointRollingPolicy, e.g.,, most of the heavy liftings including file uploading all happen at the checkpoint time.  

Specifically, this Jira calls for a new rolling policy that rolls file: 
 # whenever a customized event happens, e.g., the file size reaches certain limit. 
 # whenever a checkpoint happens. This is needed for providing exactly-once guarantees when writing bulk-encoded files. 

Users of this rolling policy need to be aware that the customized event and the next checkpoint epoch may be close to each other, thus may yield a tiny file per checkpoint at the worst. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)