[PROPOSAL] Improving Flink’s timer management for large state

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[PROPOSAL] Improving Flink’s timer management for large state

Stefan Richter
Hi,

I am currently planning how to improve Flink’s timer management for large state. In particular, I would like to introduce timer state that is managed in RocksDB and also to improve the capabilities of the heap-based timer service, e.g. support for asynchronous checkpoints. You can find a short outline of my planned approach in this document:

https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC26Pz04EvxdA7Jc/edit?usp=sharing

As always, your questions, feedback, and comments are highly appreciated.

Best,
Stefan
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Improving Flink’s timer management for large state

bowen.li
+1 LGTM. RocksDB timer service is one of the most highly anticipated
features from Flink users, and it's finally coming, officially. I also
would love to see bringing timer more closely to state backend, for the
sake of easier development and maintenance of code.

On Fri, May 25, 2018 at 7:13 AM, Stefan Richter <[hidden email]
> wrote:

> Hi,
>
> I am currently planning how to improve Flink’s timer management for large
> state. In particular, I would like to introduce timer state that is managed
> in RocksDB and also to improve the capabilities of the heap-based timer
> service, e.g. support for asynchronous checkpoints. You can find a short
> outline of my planned approach in this document:
>
> https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC
> 26Pz04EvxdA7Jc/edit?usp=sharing
>
> As always, your questions, feedback, and comments are highly appreciated.
>
> Best,
> Stefan
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Improving Flink’s timer management for large state

Ted Yu
+1
-------- Original message --------From: Bowen Li <[hidden email]> Date: 5/27/18  12:31 AM  (GMT-08:00) To: [hidden email] Subject: Re: [PROPOSAL] Improving Flink’s timer management for large state
+1 LGTM. RocksDB timer service is one of the most highly anticipated
features from Flink users, and it's finally coming, officially. I also
would love to see bringing timer more closely to state backend, for the
sake of easier development and maintenance of code.

On Fri, May 25, 2018 at 7:13 AM, Stefan Richter <[hidden email]
> wrote:

> Hi,
>
> I am currently planning how to improve Flink’s timer management for large
> state. In particular, I would like to introduce timer state that is managed
> in RocksDB and also to improve the capabilities of the heap-based timer
> service, e.g. support for asynchronous checkpoints. You can find a short
> outline of my planned approach in this document:
>
> https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC
> 26Pz04EvxdA7Jc/edit?usp=sharing
>
> As always, your questions, feedback, and comments are highly appreciated.
>
> Best,
> Stefan
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Improving Flink’s timer management for large state

sihua zhou
In reply to this post by bowen.li


I also +1 for this very good proposal!
In general, the design is good, especially the part the related to the timer on Heap, but refer to the part of the timer on RocksDB, I think there may still exist some improvement that we can do, I just left the comments on the doc.


Best, Sihua




On 05/27/2018 15:31,Bowen Li<[hidden email]> wrote:
+1 LGTM. RocksDB timer service is one of the most highly anticipated
features from Flink users, and it's finally coming, officially. I also
would love to see bringing timer more closely to state backend, for the
sake of easier development and maintenance of code.

On Fri, May 25, 2018 at 7:13 AM, Stefan Richter <[hidden email]
wrote:

Hi,

I am currently planning how to improve Flink’s timer management for large
state. In particular, I would like to introduce timer state that is managed
in RocksDB and also to improve the capabilities of the heap-based timer
service, e.g. support for asynchronous checkpoints. You can find a short
outline of my planned approach in this document:

https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC
26Pz04EvxdA7Jc/edit?usp=sharing

As always, your questions, feedback, and comments are highly appreciated.

Best,
Stefan
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Improving Flink’s timer management for large state

Stefan Richter
Thanks for the positive feedback so far!

@Sihua: I totally agree with your comments about improvements in performance for the existing RocksDB timer code. In fact, that is why I phrased it like „ implementation that is loosely based on some ideas“ to point out a solution can look roughly like the existing code but that it probably should *not* be a simple copy-paste. As for the comment of supporting heap timers with RocksDB state, I think there is nothing that speaks fundamentally against it and I think the design can be in a way to support that. It just makes the configuration more complex and we need to slightly „special case“ in incremental checkpoints. I was already wondering if heap timers with RocksDB state could not already become a byproduct of a stepwise implementation, i.e. when the first step of the plan is pushing timer state into the backends and there does not yet exist a RocksDB timer state.

> Am 27.05.2018 um 10:03 schrieb sihua zhou <[hidden email]>:
>
>
>
> I also +1 for this very good proposal!
> In general, the design is good, especially the part the related to the timer on Heap, but refer to the part of the timer on RocksDB, I think there may still exist some improvement that we can do, I just left the comments on the doc.
>
>
> Best, Sihua
>
>
>
>
> On 05/27/2018 15:31,Bowen Li<[hidden email]> wrote:
> +1 LGTM. RocksDB timer service is one of the most highly anticipated
> features from Flink users, and it's finally coming, officially. I also
> would love to see bringing timer more closely to state backend, for the
> sake of easier development and maintenance of code.
>
> On Fri, May 25, 2018 at 7:13 AM, Stefan Richter <[hidden email]
> wrote:
>
> Hi,
>
> I am currently planning how to improve Flink’s timer management for large
> state. In particular, I would like to introduce timer state that is managed
> in RocksDB and also to improve the capabilities of the heap-based timer
> service, e.g. support for asynchronous checkpoints. You can find a short
> outline of my planned approach in this document:
>
> https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC
> 26Pz04EvxdA7Jc/edit?usp=sharing
>
> As always, your questions, feedback, and comments are highly appreciated.
>
> Best,
> Stefan

Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Improving Flink’s timer management for large state

Till Rohrmann
Thanks for the great design document Stefan. Unifying how Flink handles
state such that all checkpointable state is maintained by the StateBackend
makes a lot of sense to me. Also making the timer service scalable and
adding support for asynchronous checkpoints is really important for many
Flink users. Consequently +1 for this improvement.

Cheers,
Till

On Mon, May 28, 2018 at 9:45 AM, Stefan Richter <[hidden email]
> wrote:

> Thanks for the positive feedback so far!
>
> @Sihua: I totally agree with your comments about improvements in
> performance for the existing RocksDB timer code. In fact, that is why I
> phrased it like „ implementation that is loosely based on some ideas“ to
> point out a solution can look roughly like the existing code but that it
> probably should *not* be a simple copy-paste. As for the comment of
> supporting heap timers with RocksDB state, I think there is nothing that
> speaks fundamentally against it and I think the design can be in a way to
> support that. It just makes the configuration more complex and we need to
> slightly „special case“ in incremental checkpoints. I was already wondering
> if heap timers with RocksDB state could not already become a byproduct of a
> stepwise implementation, i.e. when the first step of the plan is pushing
> timer state into the backends and there does not yet exist a RocksDB timer
> state.
>
> > Am 27.05.2018 um 10:03 schrieb sihua zhou <[hidden email]>:
> >
> >
> >
> > I also +1 for this very good proposal!
> > In general, the design is good, especially the part the related to the
> timer on Heap, but refer to the part of the timer on RocksDB, I think there
> may still exist some improvement that we can do, I just left the comments
> on the doc.
> >
> >
> > Best, Sihua
> >
> >
> >
> >
> > On 05/27/2018 15:31,Bowen Li<[hidden email]> wrote:
> > +1 LGTM. RocksDB timer service is one of the most highly anticipated
> > features from Flink users, and it's finally coming, officially. I also
> > would love to see bringing timer more closely to state backend, for the
> > sake of easier development and maintenance of code.
> >
> > On Fri, May 25, 2018 at 7:13 AM, Stefan Richter <
> [hidden email]
> > wrote:
> >
> > Hi,
> >
> > I am currently planning how to improve Flink’s timer management for large
> > state. In particular, I would like to introduce timer state that is
> managed
> > in RocksDB and also to improve the capabilities of the heap-based timer
> > service, e.g. support for asynchronous checkpoints. You can find a short
> > outline of my planned approach in this document:
> >
> > https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC
> > 26Pz04EvxdA7Jc/edit?usp=sharing
> >
> > As always, your questions, feedback, and comments are highly appreciated.
> >
> > Best,
> > Stefan
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [PROPOSAL] Improving Flink’s timer management for large state

Aljoscha Krettek-2
+1

> On 29. May 2018, at 09:34, Till Rohrmann <[hidden email]> wrote:
>
> Thanks for the great design document Stefan. Unifying how Flink handles
> state such that all checkpointable state is maintained by the StateBackend
> makes a lot of sense to me. Also making the timer service scalable and
> adding support for asynchronous checkpoints is really important for many
> Flink users. Consequently +1 for this improvement.
>
> Cheers,
> Till
>
> On Mon, May 28, 2018 at 9:45 AM, Stefan Richter <[hidden email]
>> wrote:
>
>> Thanks for the positive feedback so far!
>>
>> @Sihua: I totally agree with your comments about improvements in
>> performance for the existing RocksDB timer code. In fact, that is why I
>> phrased it like „ implementation that is loosely based on some ideas“ to
>> point out a solution can look roughly like the existing code but that it
>> probably should *not* be a simple copy-paste. As for the comment of
>> supporting heap timers with RocksDB state, I think there is nothing that
>> speaks fundamentally against it and I think the design can be in a way to
>> support that. It just makes the configuration more complex and we need to
>> slightly „special case“ in incremental checkpoints. I was already wondering
>> if heap timers with RocksDB state could not already become a byproduct of a
>> stepwise implementation, i.e. when the first step of the plan is pushing
>> timer state into the backends and there does not yet exist a RocksDB timer
>> state.
>>
>>> Am 27.05.2018 um 10:03 schrieb sihua zhou <[hidden email]>:
>>>
>>>
>>>
>>> I also +1 for this very good proposal!
>>> In general, the design is good, especially the part the related to the
>> timer on Heap, but refer to the part of the timer on RocksDB, I think there
>> may still exist some improvement that we can do, I just left the comments
>> on the doc.
>>>
>>>
>>> Best, Sihua
>>>
>>>
>>>
>>>
>>> On 05/27/2018 15:31,Bowen Li<[hidden email]> wrote:
>>> +1 LGTM. RocksDB timer service is one of the most highly anticipated
>>> features from Flink users, and it's finally coming, officially. I also
>>> would love to see bringing timer more closely to state backend, for the
>>> sake of easier development and maintenance of code.
>>>
>>> On Fri, May 25, 2018 at 7:13 AM, Stefan Richter <
>> [hidden email]
>>> wrote:
>>>
>>> Hi,
>>>
>>> I am currently planning how to improve Flink’s timer management for large
>>> state. In particular, I would like to introduce timer state that is
>> managed
>>> in RocksDB and also to improve the capabilities of the heap-based timer
>>> service, e.g. support for asynchronous checkpoints. You can find a short
>>> outline of my planned approach in this document:
>>>
>>> https://docs.google.com/document/d/1XbhJRbig5c5Ftd77d0mKND1bePyTC
>>> 26Pz04EvxdA7Jc/edit?usp=sharing
>>>
>>> As always, your questions, feedback, and comments are highly appreciated.
>>>
>>> Best,
>>> Stefan
>>
>>