(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-17288) Speedup loading from savepoints into RocksDB by bulk load

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-17288) Speedup loading from savepoints into RocksDB by bulk load

Jun Qin created FLINK-17288:
-------------------------------

Summary: Speedup loading from savepoints into RocksDB by bulk load
Key: FLINK-17288
URL: https://issues.apache.org/jira/browse/FLINK-17288
Project: Flink
Issue Type: Improvement
Components: Runtime / State Backends
Reporter: Jun Qin

When resource is a constraint, loading a big savepoint into RocksDB may take some time. This may also impact the job recovery time when the savepoint was used for recovery.

Bulk load from savepoint should help in this regard. Here is an excerpt from the RocksDB FAQ:
{quote}*Q: What's the fastest way to load data into RocksDB?*

A: A fast way to direct insert data to the DB:
# using single writer thread and insert in sorted order
# batch hundreds of keys into one write batch
# use vector memtable
# make sure options.max_background_flushes is at least 4
# before inserting the data, disable automatic compaction, set options.level0_file_num_compaction_trigger, options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger to very large. After inserting all the data, issue a manual compaction.

3-5 will be automatically done if you call Options::PrepareForBulkLoad() to your option

If you can pre-process the data offline before inserting. There is a faster way: you can sort the data, generate SST files with non-overlapping ranges in parallel and bulkload the SST files. See [https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]
{quote}

--
This message was sent by Atlassian Jira
(v8.3.4#803005)