[DISCUSS] FLIP-158: Generalized incremental checkpoints

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-158: Generalized incremental checkpoints

Khachatryan Roman
Hi devs,

I'd like to start a discussion of FLIP-158: Generalized incremental
checkpoints [1]

FLIP motivation:
Low end-to-end latency is a much-demanded property in many Flink setups.
With exactly-once, this latency depends on checkpoint interval/duration
which in turn is defined by the slowest node (usually the one doing a full
non-incremental snapshot). In large setups with many nodes, the probability
of at least one node being slow gets higher, making almost every checkpoint
slow.

This FLIP proposes a mechanism to deal with this by materializing and
uploading state continuously and only uploading the changed part during the
checkpoint itself. It differs from other approaches in that 1) checkpoints
are always incremental; 2) works for any state backend.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints

Any feedback highly appreciated!

Regards,
Roman
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-158: Generalized incremental checkpoints

Piotr Nowojski-5
Hi Roman,

+1 from my side on this proposal. Also big +1 for the recent changes in
this FLIP in the motivation and high level overview sections.

For me there are quite a bit of unanswered things around how to actually
implement the proposed changes and especially how to integrate it with the
state backends and checkpointing, but maybe we can do that in either a
follow up design docs or discuss it in the tickets or even maybe some PoC.

Piotrek

pt., 15 sty 2021 o 07:49 Khachatryan Roman <[hidden email]>
napisał(a):

> Hi devs,
>
> I'd like to start a discussion of FLIP-158: Generalized incremental
> checkpoints [1]
>
> FLIP motivation:
> Low end-to-end latency is a much-demanded property in many Flink setups.
> With exactly-once, this latency depends on checkpoint interval/duration
> which in turn is defined by the slowest node (usually the one doing a full
> non-incremental snapshot). In large setups with many nodes, the probability
> of at least one node being slow gets higher, making almost every checkpoint
> slow.
>
> This FLIP proposes a mechanism to deal with this by materializing and
> uploading state continuously and only uploading the changed part during the
> checkpoint itself. It differs from other approaches in that 1) checkpoints
> are always incremental; 2) works for any state backend.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
>
> Any feedback highly appreciated!
>
> Regards,
> Roman
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-158: Generalized incremental checkpoints

Stephan Ewen
+1 to this FLIP in general.

I like the general idea very much (full disclosure, have been involved in
the discussions and drafting of the design for a while, so I am not a
new/neutral reviewer here).

One thing I would like to see us do here, is reaching out to users early
with this, and validating this approach. It is a very fundamental change
that also shifts certain tradeoffs, like "cost during execution" vs. "cost
on recovery". This approach will increase the data write rate to
S3/HDFS/...
So before we build every bit of the complex implementation, let's try and
validate/test the critical bits with the users.

In my assessment, the most critical bit is the continuous log writing,
which adds overhead during execution time. Recovery is less critical,
there'll be no overhead or additional load, so recovery should be strictly
better than currently.
I would propose we hence focus on the implementation of the logging first
(ignoring recovery, focusing on one target FileSystem/Object store) and
test run this with a few users, see that it works well and whether they
like the new characteristics.

I am also trying to contribute some adjustments to the FLIP text, like more
illustrations/explanations, to make it easier to share this FLIP with a
wider audience, so we can get the above-mentioned user input and validation.

Best,
Stephan




On Thu, Jan 28, 2021 at 10:46 AM Piotr Nowojski <[hidden email]>
wrote:

> Hi Roman,
>
> +1 from my side on this proposal. Also big +1 for the recent changes in
> this FLIP in the motivation and high level overview sections.
>
> For me there are quite a bit of unanswered things around how to actually
> implement the proposed changes and especially how to integrate it with the
> state backends and checkpointing, but maybe we can do that in either a
> follow up design docs or discuss it in the tickets or even maybe some PoC.
>
> Piotrek
>
> pt., 15 sty 2021 o 07:49 Khachatryan Roman <[hidden email]>
> napisał(a):
>
> > Hi devs,
> >
> > I'd like to start a discussion of FLIP-158: Generalized incremental
> > checkpoints [1]
> >
> > FLIP motivation:
> > Low end-to-end latency is a much-demanded property in many Flink setups.
> > With exactly-once, this latency depends on checkpoint interval/duration
> > which in turn is defined by the slowest node (usually the one doing a
> full
> > non-incremental snapshot). In large setups with many nodes, the
> probability
> > of at least one node being slow gets higher, making almost every
> checkpoint
> > slow.
> >
> > This FLIP proposes a mechanism to deal with this by materializing and
> > uploading state continuously and only uploading the changed part during
> the
> > checkpoint itself. It differs from other approaches in that 1)
> checkpoints
> > are always incremental; 2) works for any state backend.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> >
> > Any feedback highly appreciated!
> >
> > Regards,
> > Roman
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-158: Generalized incremental checkpoints

curcur
Big +1 onto this FLIP!

Great to see it is stepping forward since this idea is discussed for quite
a while. :-)

1. I totally agree that the critical part is the overhead added during
normal state updates (forward additional state updates to log as well as
state updates itself). Once we have this part ready and assessed, we can
have a better understanding/confidence of impact introduced by writing
additional log over normal processing.

2. Besides that, it would be also helpful to understand how this idea fits
into the overall picture of the fault-tolerance story. But this question
might be out of the scope of this FLIP, I guess.

Best
Yuan

On Thu, Jan 28, 2021 at 8:12 PM Stephan Ewen <[hidden email]> wrote:

> +1 to this FLIP in general.
>
> I like the general idea very much (full disclosure, have been involved in
> the discussions and drafting of the design for a while, so I am not a
> new/neutral reviewer here).
>
> One thing I would like to see us do here, is reaching out to users early
> with this, and validating this approach. It is a very fundamental change
> that also shifts certain tradeoffs, like "cost during execution" vs. "cost
> on recovery". This approach will increase the data write rate to
> S3/HDFS/...
> So before we build every bit of the complex implementation, let's try and
> validate/test the critical bits with the users.
>
> In my assessment, the most critical bit is the continuous log writing,
> which adds overhead during execution time. Recovery is less critical,
> there'll be no overhead or additional load, so recovery should be strictly
> better than currently.
> I would propose we hence focus on the implementation of the logging first
> (ignoring recovery, focusing on one target FileSystem/Object store) and
> test run this with a few users, see that it works well and whether they
> like the new characteristics.
>
> I am also trying to contribute some adjustments to the FLIP text, like more
> illustrations/explanations, to make it easier to share this FLIP with a
> wider audience, so we can get the above-mentioned user input and
> validation.
>
> Best,
> Stephan
>
>
>
>
> On Thu, Jan 28, 2021 at 10:46 AM Piotr Nowojski <[hidden email]>
> wrote:
>
> > Hi Roman,
> >
> > +1 from my side on this proposal. Also big +1 for the recent changes in
> > this FLIP in the motivation and high level overview sections.
> >
> > For me there are quite a bit of unanswered things around how to actually
> > implement the proposed changes and especially how to integrate it with
> the
> > state backends and checkpointing, but maybe we can do that in either a
> > follow up design docs or discuss it in the tickets or even maybe some
> PoC.
> >
> > Piotrek
> >
> > pt., 15 sty 2021 o 07:49 Khachatryan Roman <[hidden email]>
> > napisał(a):
> >
> > > Hi devs,
> > >
> > > I'd like to start a discussion of FLIP-158: Generalized incremental
> > > checkpoints [1]
> > >
> > > FLIP motivation:
> > > Low end-to-end latency is a much-demanded property in many Flink
> setups.
> > > With exactly-once, this latency depends on checkpoint interval/duration
> > > which in turn is defined by the slowest node (usually the one doing a
> > full
> > > non-incremental snapshot). In large setups with many nodes, the
> > probability
> > > of at least one node being slow gets higher, making almost every
> > checkpoint
> > > slow.
> > >
> > > This FLIP proposes a mechanism to deal with this by materializing and
> > > uploading state continuously and only uploading the changed part during
> > the
> > > checkpoint itself. It differs from other approaches in that 1)
> > checkpoints
> > > are always incremental; 2) works for any state backend.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> > >
> > > Any feedback highly appreciated!
> > >
> > > Regards,
> > > Roman
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-158: Generalized incremental checkpoints

Khachatryan Roman
Thanks a lot for your replies!

Yes, feedback is very much appreciated! Especially regarding the approach
in general.

I think that's a good idea to discuss some details in the follow-up docs or
tickets (but I'd be happy to discuss it here as well).

As for the PoC, I hope we'll publish it soon (logging only), so all
interested users will be able to try it out.

Regards,
Roman


On Thu, Jan 28, 2021 at 1:58 PM Yuan Mei <[hidden email]> wrote:

> Big +1 onto this FLIP!
>
> Great to see it is stepping forward since this idea is discussed for quite
> a while. :-)
>
> 1. I totally agree that the critical part is the overhead added during
> normal state updates (forward additional state updates to log as well as
> state updates itself). Once we have this part ready and assessed, we can
> have a better understanding/confidence of impact introduced by writing
> additional log over normal processing.
>
> 2. Besides that, it would be also helpful to understand how this idea fits
> into the overall picture of the fault-tolerance story. But this question
> might be out of the scope of this FLIP, I guess.
>
> Best
> Yuan
>
> On Thu, Jan 28, 2021 at 8:12 PM Stephan Ewen <[hidden email]> wrote:
>
> > +1 to this FLIP in general.
> >
> > I like the general idea very much (full disclosure, have been involved in
> > the discussions and drafting of the design for a while, so I am not a
> > new/neutral reviewer here).
> >
> > One thing I would like to see us do here, is reaching out to users early
> > with this, and validating this approach. It is a very fundamental change
> > that also shifts certain tradeoffs, like "cost during execution" vs.
> "cost
> > on recovery". This approach will increase the data write rate to
> > S3/HDFS/...
> > So before we build every bit of the complex implementation, let's try and
> > validate/test the critical bits with the users.
> >
> > In my assessment, the most critical bit is the continuous log writing,
> > which adds overhead during execution time. Recovery is less critical,
> > there'll be no overhead or additional load, so recovery should be
> strictly
> > better than currently.
> > I would propose we hence focus on the implementation of the logging first
> > (ignoring recovery, focusing on one target FileSystem/Object store) and
> > test run this with a few users, see that it works well and whether they
> > like the new characteristics.
> >
> > I am also trying to contribute some adjustments to the FLIP text, like
> more
> > illustrations/explanations, to make it easier to share this FLIP with a
> > wider audience, so we can get the above-mentioned user input and
> > validation.
> >
> > Best,
> > Stephan
> >
> >
> >
> >
> > On Thu, Jan 28, 2021 at 10:46 AM Piotr Nowojski <[hidden email]>
> > wrote:
> >
> > > Hi Roman,
> > >
> > > +1 from my side on this proposal. Also big +1 for the recent changes in
> > > this FLIP in the motivation and high level overview sections.
> > >
> > > For me there are quite a bit of unanswered things around how to
> actually
> > > implement the proposed changes and especially how to integrate it with
> > the
> > > state backends and checkpointing, but maybe we can do that in either a
> > > follow up design docs or discuss it in the tickets or even maybe some
> > PoC.
> > >
> > > Piotrek
> > >
> > > pt., 15 sty 2021 o 07:49 Khachatryan Roman <
> [hidden email]>
> > > napisał(a):
> > >
> > > > Hi devs,
> > > >
> > > > I'd like to start a discussion of FLIP-158: Generalized incremental
> > > > checkpoints [1]
> > > >
> > > > FLIP motivation:
> > > > Low end-to-end latency is a much-demanded property in many Flink
> > setups.
> > > > With exactly-once, this latency depends on checkpoint
> interval/duration
> > > > which in turn is defined by the slowest node (usually the one doing a
> > > full
> > > > non-incremental snapshot). In large setups with many nodes, the
> > > probability
> > > > of at least one node being slow gets higher, making almost every
> > > checkpoint
> > > > slow.
> > > >
> > > > This FLIP proposes a mechanism to deal with this by materializing and
> > > > uploading state continuously and only uploading the changed part
> during
> > > the
> > > > checkpoint itself. It differs from other approaches in that 1)
> > > checkpoints
> > > > are always incremental; 2) works for any state backend.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-158%3A+Generalized+incremental+checkpoints
> > > >
> > > > Any feedback highly appreciated!
> > > >
> > > > Regards,
> > > > Roman
> > > >
> > >
> >
>