[jira] [Created] (FLINK-20496) RocksDB partitioned index filter option

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-20496) RocksDB partitioned index filter option

Shang Yuanchun (Jira)
YufeiLiu created FLINK-20496:
--------------------------------

             Summary: RocksDB partitioned index filter option
                 Key: FLINK-20496
                 URL: https://issues.apache.org/jira/browse/FLINK-20496
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / State Backends
            Reporter: YufeiLiu


When using RocksDBStateBackend and enabling {{state.backend.rocksdb.memory.managed}} and {{state.backend.rocksdb.memory.fixed-per-slot}}, flink will strictly limited rocksdb memory usage which contains "write buffer" and "block cache". With these options rocksdb stores index and filters in block cache, because in default options index/filters can grows unlimited.
But it's lead another issue, if high-priority cache(configure by {{state.backend.rocksdb.memory.high-prio-pool-ratio}}) can't fit all index/filters blocks, it will load all metadata from disk when cache missed, and program went extremely slow. According to [Partitioned Index Filters|https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters][1], we can enable two-level index having acceptable performance when index/filters cache missed.
Enable these options can get over 10x faster in my case[2], I think we can add an option {{state.backend.rocksdb.partitioned-index-filters}} and default value is false, so we can use this feature easily.


[1] Partitioned Index Filters: https://github.com/facebook/rocksdb/wiki/Partitioned-Index-Filters
[2] Deduplicate scenario, state.backend.rocksdb.memory.fixed-per-slot=256M, SSD, elapsed time 4.91ms -> 0.33ms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)