[DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Stephan Ewen
Hi all!

I would suggest a change of the current default for timers. A bit of
background:

  - Timers (for windows, process functions, etc.) are state that is managed
and checkpointed as well.
  - When using the MemoryStateBackend and the FsStateBackend, timers are
kept on the JVM heap, like regular state.
  - When using the RocksDBStateBackend, timers can be kept in RocksDB (like
other state) or on the JVM heap. The JVM heap is the default though!

I find this a bit un-intuitive and would propose to change this to let the
RocksDBStateBackend store all state in RocksDB by default.
The rationale being that if there is a tradeoff (like here), safe and
scalable should be the default and unsafe performance be an explicit choice.

This sentiment seems to be shared by various users as well, see
https://twitter.com/StephanEwen/status/1214590846168903680 and
https://twitter.com/StephanEwen/status/1214594273565388801
We would of course keep the switch and mention in the performance tuning
section that this is an option.

# RocksDB State Backend Timers on Heap
  - Pro: faster
  - Con: not memory safe, GC overhead, longer synchronous checkpoint time,
no incremental checkpoints

#  RocksDB State Backend Timers on in RocksDB
  - Pro: safe and scalable, asynchronously and incrementally checkpointed
  - Con: performance overhead.

Please chime in and let me know what you think.

Best,
Stephan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Andrey Zagrebin-4
Hi Stephan,

Thanks for starting this discussion. I am +1 for this change.
In general, number of timer state keys can have the same order as number of
main state keys.
So if RocksDB is used for main state for scalability, it makes sense to
have timers there as well
unless timers are used for only very limited subset of keys which fits into
memory.

Best,
Andrey

On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen <[hidden email]> wrote:

> Hi all!
>
> I would suggest a change of the current default for timers. A bit of
> background:
>
>   - Timers (for windows, process functions, etc.) are state that is
> managed and checkpointed as well.
>   - When using the MemoryStateBackend and the FsStateBackend, timers are
> kept on the JVM heap, like regular state.
>   - When using the RocksDBStateBackend, timers can be kept in RocksDB
> (like other state) or on the JVM heap. The JVM heap is the default though!
>
> I find this a bit un-intuitive and would propose to change this to let the
> RocksDBStateBackend store all state in RocksDB by default.
> The rationale being that if there is a tradeoff (like here), safe and
> scalable should be the default and unsafe performance be an explicit choice.
>
> This sentiment seems to be shared by various users as well, see
> https://twitter.com/StephanEwen/status/1214590846168903680 and
> https://twitter.com/StephanEwen/status/1214594273565388801
> We would of course keep the switch and mention in the performance tuning
> section that this is an option.
>
> # RocksDB State Backend Timers on Heap
>   - Pro: faster
>   - Con: not memory safe, GC overhead, longer synchronous checkpoint time,
> no incremental checkpoints
>
> #  RocksDB State Backend Timers on in RocksDB
>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>   - Con: performance overhead.
>
> Please chime in and let me know what you think.
>
> Best,
> Stephan
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Yun Tang
Hi Stephan,

I am +1 for the change which stores timers in RocksDB by default.

Some users hope the checkpoint could be completed as fast as possible, which also need the timer stored in RocksDB to not affect the sync part of checkpoint.

Best
Yun Tang
________________________________
From: Andrey Zagrebin <[hidden email]>
Sent: Friday, January 17, 2020 0:07
To: Stephan Ewen <[hidden email]>
Cc: dev <[hidden email]>; user <[hidden email]>
Subject: Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Hi Stephan,

Thanks for starting this discussion. I am +1 for this change.
In general, number of timer state keys can have the same order as number of main state keys.
So if RocksDB is used for main state for scalability, it makes sense to have timers there as well
unless timers are used for only very limited subset of keys which fits into memory.

Best,
Andrey

On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen <[hidden email]<mailto:[hidden email]>> wrote:
Hi all!

I would suggest a change of the current default for timers. A bit of background:

  - Timers (for windows, process functions, etc.) are state that is managed and checkpointed as well.
  - When using the MemoryStateBackend and the FsStateBackend, timers are kept on the JVM heap, like regular state.
  - When using the RocksDBStateBackend, timers can be kept in RocksDB (like other state) or on the JVM heap. The JVM heap is the default though!

I find this a bit un-intuitive and would propose to change this to let the RocksDBStateBackend store all state in RocksDB by default.
The rationale being that if there is a tradeoff (like here), safe and scalable should be the default and unsafe performance be an explicit choice.

This sentiment seems to be shared by various users as well, see https://twitter.com/StephanEwen/status/1214590846168903680 and https://twitter.com/StephanEwen/status/1214594273565388801
We would of course keep the switch and mention in the performance tuning section that this is an option.

# RocksDB State Backend Timers on Heap
  - Pro: faster
  - Con: not memory safe, GC overhead, longer synchronous checkpoint time, no incremental checkpoints

#  RocksDB State Backend Timers on in RocksDB
  - Pro: safe and scalable, asynchronously and incrementally checkpointed
  - Con: performance overhead.

Please chime in and let me know what you think.

Best,
Stephan

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Jingsong Li
Hi Stephan,

Thanks for starting this discussion.
+1 for stores times in RocksDB by default.
In the past, when Flink didn't save the times with RocksDb, I had a
headache. I always adjusted parameters carefully to ensure that there was
no risk of Out of Memory.

Just curious, how much impact of heap and RocksDb for times on performance
- if there is no order of magnitude difference between heap and RocksDb,
there is no problem in using RocksDb.
- if there is, maybe we should improve our documentation to let users know
about this option. (Looks like a lot of users didn't know)

Best,
Jingsong Lee

On Fri, Jan 17, 2020 at 3:18 AM Yun Tang <[hidden email]> wrote:

> Hi Stephan,
>
> I am +1 for the change which stores timers in RocksDB by default.
>
> Some users hope the checkpoint could be completed as fast as possible,
> which also need the timer stored in RocksDB to not affect the sync part of
> checkpoint.
>
> Best
> Yun Tang
> ------------------------------
> *From:* Andrey Zagrebin <[hidden email]>
> *Sent:* Friday, January 17, 2020 0:07
> *To:* Stephan Ewen <[hidden email]>
> *Cc:* dev <[hidden email]>; user <[hidden email]>
> *Subject:* Re: [DISCUSS] Change default for RocksDB timers: Java Heap =>
> in RocksDB
>
> Hi Stephan,
>
> Thanks for starting this discussion. I am +1 for this change.
> In general, number of timer state keys can have the same order as number
> of main state keys.
> So if RocksDB is used for main state for scalability, it makes sense to
> have timers there as well
> unless timers are used for only very limited subset of keys which fits
> into memory.
>
> Best,
> Andrey
>
> On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen <[hidden email]> wrote:
>
> Hi all!
>
> I would suggest a change of the current default for timers. A bit of
> background:
>
>   - Timers (for windows, process functions, etc.) are state that is
> managed and checkpointed as well.
>   - When using the MemoryStateBackend and the FsStateBackend, timers are
> kept on the JVM heap, like regular state.
>   - When using the RocksDBStateBackend, timers can be kept in RocksDB
> (like other state) or on the JVM heap. The JVM heap is the default though!
>
> I find this a bit un-intuitive and would propose to change this to let the
> RocksDBStateBackend store all state in RocksDB by default.
> The rationale being that if there is a tradeoff (like here), safe and
> scalable should be the default and unsafe performance be an explicit choice.
>
> This sentiment seems to be shared by various users as well, see
> https://twitter.com/StephanEwen/status/1214590846168903680 and
> https://twitter.com/StephanEwen/status/1214594273565388801
> We would of course keep the switch and mention in the performance tuning
> section that this is an option.
>
> # RocksDB State Backend Timers on Heap
>   - Pro: faster
>   - Con: not memory safe, GC overhead, longer synchronous checkpoint time,
> no incremental checkpoints
>
> #  RocksDB State Backend Timers on in RocksDB
>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>   - Con: performance overhead.
>
> Please chime in and let me know what you think.
>
> Best,
> Stephan
>
>

--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Biao Liu
+1

I think that's how it should be. Timer should align with other regular
state.

If user wants a better performance without memory concern, memory or FS
statebackend might be considered. Or maybe we could optimize the
performance by introducing a specific column family for timer. It could
have its own tuned options.

Thanks,
Biao /'bɪ.aʊ/



On Fri, 17 Jan 2020 at 10:11, Jingsong Li <[hidden email]> wrote:

> Hi Stephan,
>
> Thanks for starting this discussion.
> +1 for stores times in RocksDB by default.
> In the past, when Flink didn't save the times with RocksDb, I had a
> headache. I always adjusted parameters carefully to ensure that there was
> no risk of Out of Memory.
>
> Just curious, how much impact of heap and RocksDb for times on performance
> - if there is no order of magnitude difference between heap and RocksDb,
> there is no problem in using RocksDb.
> - if there is, maybe we should improve our documentation to let users know
> about this option. (Looks like a lot of users didn't know)
>
> Best,
> Jingsong Lee
>
> On Fri, Jan 17, 2020 at 3:18 AM Yun Tang <[hidden email]> wrote:
>
>> Hi Stephan,
>>
>> I am +1 for the change which stores timers in RocksDB by default.
>>
>> Some users hope the checkpoint could be completed as fast as possible,
>> which also need the timer stored in RocksDB to not affect the sync part of
>> checkpoint.
>>
>> Best
>> Yun Tang
>> ------------------------------
>> *From:* Andrey Zagrebin <[hidden email]>
>> *Sent:* Friday, January 17, 2020 0:07
>> *To:* Stephan Ewen <[hidden email]>
>> *Cc:* dev <[hidden email]>; user <[hidden email]>
>> *Subject:* Re: [DISCUSS] Change default for RocksDB timers: Java Heap =>
>> in RocksDB
>>
>> Hi Stephan,
>>
>> Thanks for starting this discussion. I am +1 for this change.
>> In general, number of timer state keys can have the same order as number
>> of main state keys.
>> So if RocksDB is used for main state for scalability, it makes sense to
>> have timers there as well
>> unless timers are used for only very limited subset of keys which fits
>> into memory.
>>
>> Best,
>> Andrey
>>
>> On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen <[hidden email]> wrote:
>>
>> Hi all!
>>
>> I would suggest a change of the current default for timers. A bit of
>> background:
>>
>>   - Timers (for windows, process functions, etc.) are state that is
>> managed and checkpointed as well.
>>   - When using the MemoryStateBackend and the FsStateBackend, timers are
>> kept on the JVM heap, like regular state.
>>   - When using the RocksDBStateBackend, timers can be kept in RocksDB
>> (like other state) or on the JVM heap. The JVM heap is the default though!
>>
>> I find this a bit un-intuitive and would propose to change this to let
>> the RocksDBStateBackend store all state in RocksDB by default.
>> The rationale being that if there is a tradeoff (like here), safe and
>> scalable should be the default and unsafe performance be an explicit choice.
>>
>> This sentiment seems to be shared by various users as well, see
>> https://twitter.com/StephanEwen/status/1214590846168903680 and
>> https://twitter.com/StephanEwen/status/1214594273565388801
>> We would of course keep the switch and mention in the performance tuning
>> section that this is an option.
>>
>> # RocksDB State Backend Timers on Heap
>>   - Pro: faster
>>   - Con: not memory safe, GC overhead, longer synchronous checkpoint
>> time, no incremental checkpoints
>>
>> #  RocksDB State Backend Timers on in RocksDB
>>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>>   - Con: performance overhead.
>>
>> Please chime in and let me know what you think.
>>
>> Best,
>> Stephan
>>
>>
>
> --
> Best, Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Congxian Qiu
+1 to store timers in RocksDB default.

Store timers in Heap can encounter OOM problems, and make the checkpoint
much slower, and store times in RocksDB can get ride of both.

Best,
Congxian


Biao Liu <[hidden email]> 于2020年1月17日周五 下午3:10写道:

> +1
>
> I think that's how it should be. Timer should align with other regular
> state.
>
> If user wants a better performance without memory concern, memory or FS
> statebackend might be considered. Or maybe we could optimize the
> performance by introducing a specific column family for timer. It could
> have its own tuned options.
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, 17 Jan 2020 at 10:11, Jingsong Li <[hidden email]> wrote:
>
>> Hi Stephan,
>>
>> Thanks for starting this discussion.
>> +1 for stores times in RocksDB by default.
>> In the past, when Flink didn't save the times with RocksDb, I had a
>> headache. I always adjusted parameters carefully to ensure that there was
>> no risk of Out of Memory.
>>
>> Just curious, how much impact of heap and RocksDb for times on performance
>> - if there is no order of magnitude difference between heap and RocksDb,
>> there is no problem in using RocksDb.
>> - if there is, maybe we should improve our documentation to let users
>> know about this option. (Looks like a lot of users didn't know)
>>
>> Best,
>> Jingsong Lee
>>
>> On Fri, Jan 17, 2020 at 3:18 AM Yun Tang <[hidden email]> wrote:
>>
>>> Hi Stephan,
>>>
>>> I am +1 for the change which stores timers in RocksDB by default.
>>>
>>> Some users hope the checkpoint could be completed as fast as possible,
>>> which also need the timer stored in RocksDB to not affect the sync part of
>>> checkpoint.
>>>
>>> Best
>>> Yun Tang
>>> ------------------------------
>>> *From:* Andrey Zagrebin <[hidden email]>
>>> *Sent:* Friday, January 17, 2020 0:07
>>> *To:* Stephan Ewen <[hidden email]>
>>> *Cc:* dev <[hidden email]>; user <[hidden email]>
>>> *Subject:* Re: [DISCUSS] Change default for RocksDB timers: Java Heap
>>> => in RocksDB
>>>
>>> Hi Stephan,
>>>
>>> Thanks for starting this discussion. I am +1 for this change.
>>> In general, number of timer state keys can have the same order as number
>>> of main state keys.
>>> So if RocksDB is used for main state for scalability, it makes sense to
>>> have timers there as well
>>> unless timers are used for only very limited subset of keys which fits
>>> into memory.
>>>
>>> Best,
>>> Andrey
>>>
>>> On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen <[hidden email]> wrote:
>>>
>>> Hi all!
>>>
>>> I would suggest a change of the current default for timers. A bit of
>>> background:
>>>
>>>   - Timers (for windows, process functions, etc.) are state that is
>>> managed and checkpointed as well.
>>>   - When using the MemoryStateBackend and the FsStateBackend, timers are
>>> kept on the JVM heap, like regular state.
>>>   - When using the RocksDBStateBackend, timers can be kept in RocksDB
>>> (like other state) or on the JVM heap. The JVM heap is the default though!
>>>
>>> I find this a bit un-intuitive and would propose to change this to let
>>> the RocksDBStateBackend store all state in RocksDB by default.
>>> The rationale being that if there is a tradeoff (like here), safe and
>>> scalable should be the default and unsafe performance be an explicit choice.
>>>
>>> This sentiment seems to be shared by various users as well, see
>>> https://twitter.com/StephanEwen/status/1214590846168903680 and
>>> https://twitter.com/StephanEwen/status/1214594273565388801
>>> We would of course keep the switch and mention in the performance tuning
>>> section that this is an option.
>>>
>>> # RocksDB State Backend Timers on Heap
>>>   - Pro: faster
>>>   - Con: not memory safe, GC overhead, longer synchronous checkpoint
>>> time, no incremental checkpoints
>>>
>>> #  RocksDB State Backend Timers on in RocksDB
>>>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>>>   - Con: performance overhead.
>>>
>>> Please chime in and let me know what you think.
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>
>> --
>> Best, Jingsong Lee
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB

Piotr Nowojski-3
+1 for making it consistent. When using X state backend, timers should be stored in X by default.

Also I think any configuration option controlling that needs to be well documented in some performance tuning section of the docs.

Piotrek

> On 17 Jan 2020, at 09:16, Congxian Qiu <[hidden email]> wrote:
>
> +1 to store timers in RocksDB default.
>
> Store timers in Heap can encounter OOM problems, and make the checkpoint much slower, and store times in RocksDB can get ride of both.
>
> Best,
> Congxian
>
>
> Biao Liu <[hidden email] <mailto:[hidden email]>> 于2020年1月17日周五 下午3:10写道:
> +1
>
> I think that's how it should be. Timer should align with other regular state.
>
> If user wants a better performance without memory concern, memory or FS statebackend might be considered. Or maybe we could optimize the performance by introducing a specific column family for timer. It could have its own tuned options.
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, 17 Jan 2020 at 10:11, Jingsong Li <[hidden email] <mailto:[hidden email]>> wrote:
> Hi Stephan,
>
> Thanks for starting this discussion.
> +1 for stores times in RocksDB by default.
> In the past, when Flink didn't save the times with RocksDb, I had a headache. I always adjusted parameters carefully to ensure that there was no risk of Out of Memory.
>
> Just curious, how much impact of heap and RocksDb for times on performance
> - if there is no order of magnitude difference between heap and RocksDb, there is no problem in using RocksDb.
> - if there is, maybe we should improve our documentation to let users know about this option. (Looks like a lot of users didn't know)
>
> Best,
> Jingsong Lee
>
> On Fri, Jan 17, 2020 at 3:18 AM Yun Tang <[hidden email] <mailto:[hidden email]>> wrote:
> Hi Stephan,
>
> I am +1 for the change which stores timers in RocksDB by default.
>
> Some users hope the checkpoint could be completed as fast as possible, which also need the timer stored in RocksDB to not affect the sync part of checkpoint.
>
> Best
> Yun Tang
> From: Andrey Zagrebin <[hidden email] <mailto:[hidden email]>>
> Sent: Friday, January 17, 2020 0:07
> To: Stephan Ewen <[hidden email] <mailto:[hidden email]>>
> Cc: dev <[hidden email] <mailto:[hidden email]>>; user <[hidden email] <mailto:[hidden email]>>
> Subject: Re: [DISCUSS] Change default for RocksDB timers: Java Heap => in RocksDB
>  
> Hi Stephan,
>
> Thanks for starting this discussion. I am +1 for this change.
> In general, number of timer state keys can have the same order as number of main state keys.
> So if RocksDB is used for main state for scalability, it makes sense to have timers there as well
> unless timers are used for only very limited subset of keys which fits into memory.
>
> Best,
> Andrey
>
> On Thu, Jan 16, 2020 at 4:27 PM Stephan Ewen <[hidden email] <mailto:[hidden email]>> wrote:
> Hi all!
>
> I would suggest a change of the current default for timers. A bit of background:
>
>   - Timers (for windows, process functions, etc.) are state that is managed and checkpointed as well.
>   - When using the MemoryStateBackend and the FsStateBackend, timers are kept on the JVM heap, like regular state.
>   - When using the RocksDBStateBackend, timers can be kept in RocksDB (like other state) or on the JVM heap. The JVM heap is the default though!
>
> I find this a bit un-intuitive and would propose to change this to let the RocksDBStateBackend store all state in RocksDB by default.
> The rationale being that if there is a tradeoff (like here), safe and scalable should be the default and unsafe performance be an explicit choice.
>
> This sentiment seems to be shared by various users as well, see https://twitter.com/StephanEwen/status/1214590846168903680 <https://twitter.com/StephanEwen/status/1214590846168903680> and https://twitter.com/StephanEwen/status/1214594273565388801 <https://twitter.com/StephanEwen/status/1214594273565388801>
> We would of course keep the switch and mention in the performance tuning section that this is an option.
>
> # RocksDB State Backend Timers on Heap
>   - Pro: faster
>   - Con: not memory safe, GC overhead, longer synchronous checkpoint time, no incremental checkpoints
>
> #  RocksDB State Backend Timers on in RocksDB
>   - Pro: safe and scalable, asynchronously and incrementally checkpointed
>   - Con: performance overhead.
>
> Please chime in and let me know what you think.
>
> Best,
> Stephan
>
>
>
> --
> Best, Jingsong Lee