Hi,
Activating bloom filter in the RocksDB state backend improves read performance. Currently activating bloom filter can only be done by implementing a custom ConfigurableRocksDBOptionsFactory. I think we should provide an option to activate bloom filter via Flink configuration. What do you think? If so, what about the following configuration? state.backend.rocksdb.bloom-filter.enabled: false (default) state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default) state.backend.rocksdb.bloom-filter.block-based: true (default) Thanks Jun |
Hi Jun,
Making things easier to use and configure is a good idea. Hence, +1 for this proposal. Maybe create a JIRA ticket for it. For the concrete default values it would be nice to hear the opinion of a RocksDB expert. Cheers, Till On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <[hidden email]> wrote: > Hi, > > Activating bloom filter in the RocksDB state backend improves read > performance. Currently activating bloom filter can only be done by > implementing a custom ConfigurableRocksDBOptionsFactory. I think we should > provide an option to activate bloom filter via Flink configuration. What > do you think? If so, what about the following configuration? > > state.backend.rocksdb.bloom-filter.enabled: false (default) > state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default) > state.backend.rocksdb.bloom-filter.block-based: true (default) > > > Thanks > Jun |
Hi Jun,
Some predefined options would also activate bloom filters, e.g. PredefinedOptions#SPINNING_DISK_OPTIMIZED_HIGH_MEM, but I think offering configurable option is good idea. +1 for this. When talking about the bloom filter default value, I slight prefer to use full format [1] instead of old block format. This is related with FLINK-20496 [2] which try to add option to enable partitioned index & filter. [1] https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#full-filters-new-format [2] https://issues.apache.org/jira/browse/FLINK-20496 Best Yun Tang ________________________________ From: Till Rohrmann <[hidden email]> Sent: Monday, February 8, 2021 17:06 To: dev <[hidden email]> Subject: Re: Activate bloom filter in RocksDB State Backend via Flink configuration Hi Jun, Making things easier to use and configure is a good idea. Hence, +1 for this proposal. Maybe create a JIRA ticket for it. For the concrete default values it would be nice to hear the opinion of a RocksDB expert. Cheers, Till On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <[hidden email]> wrote: > Hi, > > Activating bloom filter in the RocksDB state backend improves read > performance. Currently activating bloom filter can only be done by > implementing a custom ConfigurableRocksDBOptionsFactory. I think we should > provide an option to activate bloom filter via Flink configuration. What > do you think? If so, what about the following configuration? > > state.backend.rocksdb.bloom-filter.enabled: false (default) > state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default) > state.backend.rocksdb.bloom-filter.block-based: true (default) > > > Thanks > Jun |
Thanks Till and Yun Tang.
I’ve created https://issues.apache.org/jira/browse/FLINK-21336 <https://issues.apache.org/jira/browse/FLINK-21336> and I will work on it. Thanks Jun > On Feb 9, 2021, at 7:52 AM, Yun Tang <[hidden email]> wrote: > > Hi Jun, > > Some predefined options would also activate bloom filters, e.g. PredefinedOptions#SPINNING_DISK_OPTIMIZED_HIGH_MEM, but I think offering configurable option is good idea. +1 for this. > > When talking about the bloom filter default value, I slight prefer to use full format [1] instead of old block format. This is related with FLINK-20496 [2] which try to add option to enable partitioned index & filter. > > [1] https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#full-filters-new-format > [2] https://issues.apache.org/jira/browse/FLINK-20496 > > Best > Yun Tang > ________________________________ > From: Till Rohrmann <[hidden email]> > Sent: Monday, February 8, 2021 17:06 > To: dev <[hidden email]> > Subject: Re: Activate bloom filter in RocksDB State Backend via Flink configuration > > Hi Jun, > > Making things easier to use and configure is a good idea. Hence, +1 for > this proposal. Maybe create a JIRA ticket for it. > > For the concrete default values it would be nice to hear the opinion of a > RocksDB expert. > > Cheers, > Till > > On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <[hidden email]> wrote: > >> Hi, >> >> Activating bloom filter in the RocksDB state backend improves read >> performance. Currently activating bloom filter can only be done by >> implementing a custom ConfigurableRocksDBOptionsFactory. I think we should >> provide an option to activate bloom filter via Flink configuration. What >> do you think? If so, what about the following configuration? >> >> state.backend.rocksdb.bloom-filter.enabled: false (default) >> state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default) >> state.backend.rocksdb.bloom-filter.block-based: true (default) >> >> >> Thanks >> Jun |
In reply to this post by Jun Qin
Hi Jun Qin,
Do you have any example OptionsFactory for Bloom Filter. I did experiment and change Options from FLASH_SSD_OPTIMIZED to SPINNING_DISK_OPTIMIZED_HIGH_MEM. This gives me 2x better performance on NVME disk. I think the reason is that I'm doing a lot of reads and SPINNING_DISK_OPTIMIZED_HIGH_MEM is the only with Bloom filter enabled by default. Regards, Maciek -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ |
Free forum by Nabble | Edit this page |