Jun Qin created FLINK-17288:
-------------------------------
Summary: Speedup loading from savepoints into RocksDB by bulk load
Key: FLINK-17288
URL:
https://issues.apache.org/jira/browse/FLINK-17288 Project: Flink
Issue Type: Improvement
Components: Runtime / State Backends
Reporter: Jun Qin
When resource is a constraint, loading a big savepoint into RocksDB may take some time. This may also impact the job recovery time when the savepoint was used for recovery.
Bulk load from savepoint should help in this regard. Here is an excerpt from the RocksDB FAQ:
{quote}*Q: What's the fastest way to load data into RocksDB?*
A: A fast way to direct insert data to the DB:
# using single writer thread and insert in sorted order
# batch hundreds of keys into one write batch
# use vector memtable
# make sure options.max_background_flushes is at least 4
# before inserting the data, disable automatic compaction, set options.level0_file_num_compaction_trigger, options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger to very large. After inserting all the data, issue a manual compaction.
3-5 will be automatically done if you call Options::PrepareForBulkLoad() to your option
If you can pre-process the data offline before inserting. There is a faster way: you can sort the data, generate SST files with non-overlapping ranges in parallel and bulkload the SST files. See [
https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
{quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)