Activate bloom filter in RocksDB State Backend via Flink configuration

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Activate bloom filter in RocksDB State Backend via Flink configuration

Jun Qin
Hi,

Activating bloom filter in the RocksDB state backend improves read performance. Currently activating bloom filter can only be done by implementing a custom ConfigurableRocksDBOptionsFactory. I think we should provide an option to activate bloom filter via Flink configuration.  What do you think? If so, what about the following configuration?

state.backend.rocksdb.bloom-filter.enabled: false (default)
state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
state.backend.rocksdb.bloom-filter.block-based: true (default)


Thanks
Jun
Reply | Threaded
Open this post in threaded view
|

Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Till Rohrmann
Hi Jun,

Making things easier to use and configure is a good idea. Hence, +1 for
this proposal. Maybe create a JIRA ticket for it.

For the concrete default values it would be nice to hear the opinion of a
RocksDB expert.

Cheers,
Till

On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <[hidden email]> wrote:

> Hi,
>
> Activating bloom filter in the RocksDB state backend improves read
> performance. Currently activating bloom filter can only be done by
> implementing a custom ConfigurableRocksDBOptionsFactory. I think we should
> provide an option to activate bloom filter via Flink configuration.  What
> do you think? If so, what about the following configuration?
>
> state.backend.rocksdb.bloom-filter.enabled: false (default)
> state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
> state.backend.rocksdb.bloom-filter.block-based: true (default)
>
>
> Thanks
> Jun
Reply | Threaded
Open this post in threaded view
|

Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Yun Tang
Hi Jun,

Some predefined options would also activate bloom filters, e.g.  PredefinedOptions#SPINNING_DISK_OPTIMIZED_HIGH_MEM, but I think offering configurable option is good idea. +1 for this.

When talking about the bloom filter default value, I slight prefer to use full format [1] instead of old block format. This is related with FLINK-20496 [2] which try to add option to enable partitioned index & filter.

[1] https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#full-filters-new-format
[2] https://issues.apache.org/jira/browse/FLINK-20496

Best
Yun Tang
________________________________
From: Till Rohrmann <[hidden email]>
Sent: Monday, February 8, 2021 17:06
To: dev <[hidden email]>
Subject: Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Hi Jun,

Making things easier to use and configure is a good idea. Hence, +1 for
this proposal. Maybe create a JIRA ticket for it.

For the concrete default values it would be nice to hear the opinion of a
RocksDB expert.

Cheers,
Till

On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <[hidden email]> wrote:

> Hi,
>
> Activating bloom filter in the RocksDB state backend improves read
> performance. Currently activating bloom filter can only be done by
> implementing a custom ConfigurableRocksDBOptionsFactory. I think we should
> provide an option to activate bloom filter via Flink configuration.  What
> do you think? If so, what about the following configuration?
>
> state.backend.rocksdb.bloom-filter.enabled: false (default)
> state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
> state.backend.rocksdb.bloom-filter.block-based: true (default)
>
>
> Thanks
> Jun
Reply | Threaded
Open this post in threaded view
|

Re: Activate bloom filter in RocksDB State Backend via Flink configuration

Jun Qin
Thanks Till and Yun Tang.

I’ve created https://issues.apache.org/jira/browse/FLINK-21336 <https://issues.apache.org/jira/browse/FLINK-21336> and I will work on it.

Thanks
Jun

> On Feb 9, 2021, at 7:52 AM, Yun Tang <[hidden email]> wrote:
>
> Hi Jun,
>
> Some predefined options would also activate bloom filters, e.g.  PredefinedOptions#SPINNING_DISK_OPTIMIZED_HIGH_MEM, but I think offering configurable option is good idea. +1 for this.
>
> When talking about the bloom filter default value, I slight prefer to use full format [1] instead of old block format. This is related with FLINK-20496 [2] which try to add option to enable partitioned index & filter.
>
> [1] https://github.com/facebook/rocksdb/wiki/RocksDB-Bloom-Filter#full-filters-new-format
> [2] https://issues.apache.org/jira/browse/FLINK-20496
>
> Best
> Yun Tang
> ________________________________
> From: Till Rohrmann <[hidden email]>
> Sent: Monday, February 8, 2021 17:06
> To: dev <[hidden email]>
> Subject: Re: Activate bloom filter in RocksDB State Backend via Flink configuration
>
> Hi Jun,
>
> Making things easier to use and configure is a good idea. Hence, +1 for
> this proposal. Maybe create a JIRA ticket for it.
>
> For the concrete default values it would be nice to hear the opinion of a
> RocksDB expert.
>
> Cheers,
> Till
>
> On Sun, Feb 7, 2021 at 7:23 PM Jun Qin <[hidden email]> wrote:
>
>> Hi,
>>
>> Activating bloom filter in the RocksDB state backend improves read
>> performance. Currently activating bloom filter can only be done by
>> implementing a custom ConfigurableRocksDBOptionsFactory. I think we should
>> provide an option to activate bloom filter via Flink configuration.  What
>> do you think? If so, what about the following configuration?
>>
>> state.backend.rocksdb.bloom-filter.enabled: false (default)
>> state.backend.rocksdb.bloom-filter.bits-per-key: 10 (default)
>> state.backend.rocksdb.bloom-filter.block-based: true (default)
>>
>>
>> Thanks
>> Jun

Reply | Threaded
Open this post in threaded view
|

Re: Activate bloom filter in RocksDB State Backend via Flink configuration

maver1ck
In reply to this post by Jun Qin
Hi Jun Qin,
Do you have any example OptionsFactory for Bloom Filter.

I did experiment and change Options from FLASH_SSD_OPTIMIZED to
SPINNING_DISK_OPTIMIZED_HIGH_MEM.
This gives me 2x better performance on NVME disk.
I think the reason is that I'm doing a lot of reads and
SPINNING_DISK_OPTIMIZED_HIGH_MEM is the only with Bloom filter enabled by
default.

Regards,
Maciek



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/