Hi all,
I would like to start the vote for FLIP-76 [1], which is discussed and reached a consensus in the discussion thread [2]. The vote will be open until March. 13th (72h), unless there is an objection or not enough votes. Thanks, Arvid [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html |
+1
Thanks for putting this together, looking forward to the experimental support in the next release. One clarification: since the MVP won't support rescaling, does it imply that savepoints will always use aligned checkpointing? If so, this would still block the user from taking a savepoint and resume with increased parallelism to resolve a prolonged/permanent backpressure condition? Thanks, Thomas On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> wrote: > Hi all, > > I would like to start the vote for FLIP-76 [1], which is discussed and > reached a consensus in the discussion thread [2]. > > The vote will be open until March. 13th (72h), unless there is an objection > or not enough votes. > > Thanks, > Arvid > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > [2] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > |
+1 on the overall design and thanks for the efforts!
I totally agree with the plan of implementing the MVP first. However, since the FLIP is for the whole feature instead of only MVP, how about adding a *Roadmap* or *Future Work* section to write down plans include (but not limited to): * Dynamic switching between unaligned and aligned checkpoints * Supporting local recovery What do you think? What's more, the existing PoC result of e2e checkpoint duration and throughput looks great, but the recovery/restore time is not mentioned. Not sure whether we also aim at providing a comparative recovery speed with aligned checkpoint in the MVP implementation? Hopefully we could (smile). Best Regards, Yu On Wed, 11 Mar 2020 at 06:14, Thomas Weise <[hidden email]> wrote: > +1 > > Thanks for putting this together, looking forward to the experimental > support in the next release. > > One clarification: since the MVP won't support rescaling, does it imply > that savepoints will always use aligned checkpointing? If so, this would > still block the user from taking a savepoint and resume with increased > parallelism to resolve a prolonged/permanent backpressure condition? > > Thanks, > Thomas > > > On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> wrote: > > > Hi all, > > > > I would like to start the vote for FLIP-76 [1], which is discussed and > > reached a consensus in the discussion thread [2]. > > > > The vote will be open until March. 13th (72h), unless there is an > objection > > or not enough votes. > > > > Thanks, > > Arvid > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > > [2] > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > > > |
In reply to this post by Thomas Weise
Hi Thomas,
it's like you said. The first version will not support rescaling and mostly addresses the concerns about making little to no progress because of frequent crashes. The main reason is that we cannot guarantee the ordering of non-keyed data (and even keyed data in some weird cases) when rescaling currently. We have a general concept to address that, which would also enable dynamic rescaling in the future, but that would make the changes even bigger and we would not have any version ready for 1.11. The current plan, of course, is to continue improving unaligned checkpoints immediately after release, such that we have the full feature set for 1.12. Potentially, unaligned checkpoints (with timeouts) would even become the default option. On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> wrote: > +1 > > Thanks for putting this together, looking forward to the experimental > support in the next release. > > One clarification: since the MVP won't support rescaling, does it imply > that savepoints will always use aligned checkpointing? If so, this would > still block the user from taking a savepoint and resume with increased > parallelism to resolve a prolonged/permanent backpressure condition? > > Thanks, > Thomas > > > On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> wrote: > > > Hi all, > > > > I would like to start the vote for FLIP-76 [1], which is discussed and > > reached a consensus in the discussion thread [2]. > > > > The vote will be open until March. 13th (72h), unless there is an > objection > > or not enough votes. > > > > Thanks, > > Arvid > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > > [2] > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > > > |
+1 I like where this is headed.
One question: during restore, it could happen that a new task manager is configured with fewer or smaller buffers than was previously the case. How will this be handled? David On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <[hidden email]> wrote: > Hi Thomas, > > it's like you said. The first version will not support rescaling and mostly > addresses the concerns about making little to no progress because of > frequent crashes. > > The main reason is that we cannot guarantee the ordering of non-keyed data > (and even keyed data in some weird cases) when rescaling currently. We have > a general concept to address that, which would also enable dynamic > rescaling in the future, but that would make the changes even bigger and we > would not have any version ready for 1.11. > > The current plan, of course, is to continue improving unaligned checkpoints > immediately after release, such that we have the full feature set for 1.12. > Potentially, unaligned checkpoints (with timeouts) would even become the > default option. > > On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> wrote: > > > +1 > > > > Thanks for putting this together, looking forward to the experimental > > support in the next release. > > > > One clarification: since the MVP won't support rescaling, does it imply > > that savepoints will always use aligned checkpointing? If so, this would > > still block the user from taking a savepoint and resume with increased > > parallelism to resolve a prolonged/permanent backpressure condition? > > > > Thanks, > > Thomas > > > > > > On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> wrote: > > > > > Hi all, > > > > > > I would like to start the vote for FLIP-76 [1], which is discussed and > > > reached a consensus in the discussion thread [2]. > > > > > > The vote will be open until March. 13th (72h), unless there is an > > objection > > > or not enough votes. > > > > > > Thanks, > > > Arvid > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > > > [2] > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > > > > > > |
+1 (binding).
Piotrek > On 11 Mar 2020, at 09:19, David Anderson <[hidden email]> wrote: > > +1 I like where this is headed. > > One question: during restore, it could happen that a new task manager is > configured with fewer or smaller buffers than was previously the case. How > will this be handled? > > David > > > On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <[hidden email]> wrote: > >> Hi Thomas, >> >> it's like you said. The first version will not support rescaling and mostly >> addresses the concerns about making little to no progress because of >> frequent crashes. >> >> The main reason is that we cannot guarantee the ordering of non-keyed data >> (and even keyed data in some weird cases) when rescaling currently. We have >> a general concept to address that, which would also enable dynamic >> rescaling in the future, but that would make the changes even bigger and we >> would not have any version ready for 1.11. >> >> The current plan, of course, is to continue improving unaligned checkpoints >> immediately after release, such that we have the full feature set for 1.12. >> Potentially, unaligned checkpoints (with timeouts) would even become the >> default option. >> >> On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> wrote: >> >>> +1 >>> >>> Thanks for putting this together, looking forward to the experimental >>> support in the next release. >>> >>> One clarification: since the MVP won't support rescaling, does it imply >>> that savepoints will always use aligned checkpointing? If so, this would >>> still block the user from taking a savepoint and resume with increased >>> parallelism to resolve a prolonged/permanent backpressure condition? >>> >>> Thanks, >>> Thomas >>> >>> >>> On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> wrote: >>> >>>> Hi all, >>>> >>>> I would like to start the vote for FLIP-76 [1], which is discussed and >>>> reached a consensus in the discussion thread [2]. >>>> >>>> The vote will be open until March. 13th (72h), unless there is an >>> objection >>>> or not enough votes. >>>> >>>> Thanks, >>>> Arvid >>>> >>>> [1] >>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints >>>> [2] >>>> >>>> >>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html >>>> >>> >> |
+1 (binding).
As for David's concern of smaller buffers after recovery, I ever had a draft design [1] to solve this issue. You can take a look and leave comments if still have concerns. :) [1] https://docs.google.com/document/d/16_MOQymzxrKvUHXh6QFr2AAXIKt_2vPUf8vzKy4H_tU/edit Best, Zhijiang ------------------------------------------------------------------ From:Piotr Nowojski <[hidden email]> Send Time:2020 Mar. 11 (Wed.) 21:19 To:dev <[hidden email]> Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints +1 (binding). Piotrek > On 11 Mar 2020, at 09:19, David Anderson <[hidden email]> wrote: > > +1 I like where this is headed. > > One question: during restore, it could happen that a new task manager is > configured with fewer or smaller buffers than was previously the case. How > will this be handled? > > David > > > On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <[hidden email]> wrote: > >> Hi Thomas, >> >> it's like you said. The first version will not support rescaling and mostly >> addresses the concerns about making little to no progress because of >> frequent crashes. >> >> The main reason is that we cannot guarantee the ordering of non-keyed data >> (and even keyed data in some weird cases) when rescaling currently. We have >> a general concept to address that, which would also enable dynamic >> rescaling in the future, but that would make the changes even bigger and we >> would not have any version ready for 1.11. >> >> The current plan, of course, is to continue improving unaligned checkpoints >> immediately after release, such that we have the full feature set for 1.12. >> Potentially, unaligned checkpoints (with timeouts) would even become the >> default option. >> >> On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> wrote: >> >>> +1 >>> >>> Thanks for putting this together, looking forward to the experimental >>> support in the next release. >>> >>> One clarification: since the MVP won't support rescaling, does it imply >>> that savepoints will always use aligned checkpointing? If so, this would >>> still block the user from taking a savepoint and resume with increased >>> parallelism to resolve a prolonged/permanent backpressure condition? >>> >>> Thanks, >>> Thomas >>> >>> >>> On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> wrote: >>> >>>> Hi all, >>>> >>>> I would like to start the vote for FLIP-76 [1], which is discussed and >>>> reached a consensus in the discussion thread [2]. >>>> >>>> The vote will be open until March. 13th (72h), unless there is an >>> objection >>>> or not enough votes. >>>> >>>> Thanks, >>>> Arvid >>>> >>>> [1] >>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints >>>> [2] >>>> >>>> >>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html >>>> >>> >> |
+1 (non-binding)
Regarding Yu's suggestion about *Roadmap* or *Future Work* section, I think it's a good idea. Currently, some MVP limitations are mentioned at the end of the document, so we can extract and expand it. As for the recovery speed it's not a priority currently, but we could also mention it in this section. On Wed, Mar 11, 2020 at 4:11 PM Zhijiang <[hidden email]> wrote: > +1 (binding). > > As for David's concern of smaller buffers after recovery, I ever had a > draft design [1] to solve this issue. > You can take a look and leave comments if still have concerns. :) > > [1] > https://docs.google.com/document/d/16_MOQymzxrKvUHXh6QFr2AAXIKt_2vPUf8vzKy4H_tU/edit > > Best, > Zhijiang > > > ------------------------------------------------------------------ > From:Piotr Nowojski <[hidden email]> > Send Time:2020 Mar. 11 (Wed.) 21:19 > To:dev <[hidden email]> > Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints > > +1 (binding). > > Piotrek > > > On 11 Mar 2020, at 09:19, David Anderson <[hidden email]> wrote: > > > > +1 I like where this is headed. > > > > One question: during restore, it could happen that a new task manager is > > configured with fewer or smaller buffers than was previously the case. > How > > will this be handled? > > > > David > > > > > > On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <[hidden email]> wrote: > > > >> Hi Thomas, > >> > >> it's like you said. The first version will not support rescaling and > mostly > >> addresses the concerns about making little to no progress because of > >> frequent crashes. > >> > >> The main reason is that we cannot guarantee the ordering of non-keyed > data > >> (and even keyed data in some weird cases) when rescaling currently. We > have > >> a general concept to address that, which would also enable dynamic > >> rescaling in the future, but that would make the changes even bigger > and we > >> would not have any version ready for 1.11. > >> > >> The current plan, of course, is to continue improving unaligned > checkpoints > >> immediately after release, such that we have the full feature set for > 1.12. > >> Potentially, unaligned checkpoints (with timeouts) would even become the > >> default option. > >> > >> On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> wrote: > >> > >>> +1 > >>> > >>> Thanks for putting this together, looking forward to the experimental > >>> support in the next release. > >>> > >>> One clarification: since the MVP won't support rescaling, does it imply > >>> that savepoints will always use aligned checkpointing? If so, this > would > >>> still block the user from taking a savepoint and resume with increased > >>> parallelism to resolve a prolonged/permanent backpressure condition? > >>> > >>> Thanks, > >>> Thomas > >>> > >>> > >>> On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> > wrote: > >>> > >>>> Hi all, > >>>> > >>>> I would like to start the vote for FLIP-76 [1], which is discussed and > >>>> reached a consensus in the discussion thread [2]. > >>>> > >>>> The vote will be open until March. 13th (72h), unless there is an > >>> objection > >>>> or not enough votes. > >>>> > >>>> Thanks, > >>>> Arvid > >>>> > >>>> [1] > >>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > >>>> [2] > >>>> > >>>> > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > >>>> > >>> > >> > > -- Regards, Roman |
+1 (non-binding)
I think the PoC result has shown the effect on reducing checkpoint time when back-pressure occurs, and I totally agree with that the feature could be implemented in steps. ------------------------------------------------------------------ From:Roman Khachatryan <[hidden email]> Send Time:2020 Mar. 12 (Thu.) 01:33 To:dev <[hidden email]>; Zhijiang <[hidden email]> Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints +1 (non-binding) Regarding Yu's suggestion about *Roadmap* or *Future Work* section, I think it's a good idea. Currently, some MVP limitations are mentioned at the end of the document, so we can extract and expand it. As for the recovery speed it's not a priority currently, but we could also mention it in this section. On Wed, Mar 11, 2020 at 4:11 PM Zhijiang <[hidden email]> wrote: > +1 (binding). > > As for David's concern of smaller buffers after recovery, I ever had a > draft design [1] to solve this issue. > You can take a look and leave comments if still have concerns. :) > > [1] > https://docs.google.com/document/d/16_MOQymzxrKvUHXh6QFr2AAXIKt_2vPUf8vzKy4H_tU/edit > > Best, > Zhijiang > > > ------------------------------------------------------------------ > From:Piotr Nowojski <[hidden email]> > Send Time:2020 Mar. 11 (Wed.) 21:19 > To:dev <[hidden email]> > Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints > > +1 (binding). > > Piotrek > > > On 11 Mar 2020, at 09:19, David Anderson <[hidden email]> wrote: > > > > +1 I like where this is headed. > > > > One question: during restore, it could happen that a new task manager is > > configured with fewer or smaller buffers than was previously the case. > How > > will this be handled? > > > > David > > > > > > On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <[hidden email]> wrote: > > > >> Hi Thomas, > >> > >> it's like you said. The first version will not support rescaling and > mostly > >> addresses the concerns about making little to no progress because of > >> frequent crashes. > >> > >> The main reason is that we cannot guarantee the ordering of non-keyed > data > >> (and even keyed data in some weird cases) when rescaling currently. We > have > >> a general concept to address that, which would also enable dynamic > >> rescaling in the future, but that would make the changes even bigger > and we > >> would not have any version ready for 1.11. > >> > >> The current plan, of course, is to continue improving unaligned > checkpoints > >> immediately after release, such that we have the full feature set for > 1.12. > >> Potentially, unaligned checkpoints (with timeouts) would even become the > >> default option. > >> > >> On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> wrote: > >> > >>> +1 > >>> > >>> Thanks for putting this together, looking forward to the experimental > >>> support in the next release. > >>> > >>> One clarification: since the MVP won't support rescaling, does it imply > >>> that savepoints will always use aligned checkpointing? If so, this > would > >>> still block the user from taking a savepoint and resume with increased > >>> parallelism to resolve a prolonged/permanent backpressure condition? > >>> > >>> Thanks, > >>> Thomas > >>> > >>> > >>> On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> > wrote: > >>> > >>>> Hi all, > >>>> > >>>> I would like to start the vote for FLIP-76 [1], which is discussed and > >>>> reached a consensus in the discussion thread [2]. > >>>> > >>>> The vote will be open until March. 13th (72h), unless there is an > >>> objection > >>>> or not enough votes. > >>>> > >>>> Thanks, > >>>> Arvid > >>>> > >>>> [1] > >>>> > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > >>>> [2] > >>>> > >>>> > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > >>>> > >>> > >> > > -- Regards, Roman |
I added a roadmap section to the FLIP as suggested by Yu and Roman.
Unless someone objects, I'd still consider the voting period to end tomorrow. For me, the roadmap is only a clarification of already written and discussed points. We already have enough binding votes, but there may be concerns popping up until tomorrow. On Thu, Mar 12, 2020 at 5:00 PM Yun Gao <[hidden email]> wrote: > +1 (non-binding) > I think the PoC result has shown the effect on reducing checkpoint > time when back-pressure occurs, and I totally agree with that the feature > could be implemented in steps. > > > ------------------------------------------------------------------ > From:Roman Khachatryan <[hidden email]> > Send Time:2020 Mar. 12 (Thu.) 01:33 > To:dev <[hidden email]>; Zhijiang <[hidden email]> > Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints > > +1 (non-binding) > > Regarding Yu's suggestion about *Roadmap* or *Future Work* section, I think > it's a good idea. > Currently, some MVP limitations are mentioned at the end of the document, > so we can extract and expand it. > As for the recovery speed it's not a priority currently, but we could also > mention it in this section. > > > On Wed, Mar 11, 2020 at 4:11 PM Zhijiang <[hidden email] > .invalid> > wrote: > > > +1 (binding). > > > > As for David's concern of smaller buffers after recovery, I ever had a > > draft design [1] to solve this issue. > > You can take a look and leave comments if still have concerns. :) > > > > [1] > > > https://docs.google.com/document/d/16_MOQymzxrKvUHXh6QFr2AAXIKt_2vPUf8vzKy4H_tU/edit > > > > Best, > > Zhijiang > > > > > > ------------------------------------------------------------------ > > From:Piotr Nowojski <[hidden email]> > > Send Time:2020 Mar. 11 (Wed.) 21:19 > > To:dev <[hidden email]> > > Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints > > > > +1 (binding). > > > > Piotrek > > > > > On 11 Mar 2020, at 09:19, David Anderson <[hidden email]> wrote: > > > > > > +1 I like where this is headed. > > > > > > One question: during restore, it could happen that a new task manager > is > > > configured with fewer or smaller buffers than was previously the case. > > How > > > will this be handled? > > > > > > David > > > > > > > > > On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <[hidden email]> > wrote: > > > > > >> Hi Thomas, > > >> > > >> it's like you said. The first version will not support rescaling and > > mostly > > >> addresses the concerns about making little to no progress because of > > >> frequent crashes. > > >> > > >> The main reason is that we cannot guarantee the ordering of non-keyed > > data > > >> (and even keyed data in some weird cases) when rescaling currently. We > > have > > >> a general concept to address that, which would also enable dynamic > > >> rescaling in the future, but that would make the changes even bigger > > and we > > >> would not have any version ready for 1.11. > > >> > > >> The current plan, of course, is to continue improving unaligned > > checkpoints > > >> immediately after release, such that we have the full feature set for > > 1.12. > > >> Potentially, unaligned checkpoints (with timeouts) would even become > the > > >> default option. > > >> > > >> On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> wrote: > > >> > > >>> +1 > > >>> > > >>> Thanks for putting this together, looking forward to the experimental > > >>> support in the next release. > > >>> > > >>> One clarification: since the MVP won't support rescaling, does it > imply > > >>> that savepoints will always use aligned checkpointing? If so, this > > would > > >>> still block the user from taking a savepoint and resume with > increased > > >>> parallelism to resolve a prolonged/permanent backpressure condition? > > >>> > > >>> Thanks, > > >>> Thomas > > >>> > > >>> > > >>> On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> > > wrote: > > >>> > > >>>> Hi all, > > >>>> > > >>>> I would like to start the vote for FLIP-76 [1], which is discussed > and > > >>>> reached a consensus in the discussion thread [2]. > > >>>> > > >>>> The vote will be open until March. 13th (72h), unless there is an > > >>> objection > > >>>> or not enough votes. > > >>>> > > >>>> Thanks, > > >>>> Arvid > > >>>> > > >>>> [1] > > >>>> > > >>>> > > >>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > > >>>> [2] > > >>>> > > >>>> > > >>> > > >> > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > > >>>> > > >>> > > >> > > > > > > -- > Regards, > Roman > > |
In reply to this post by Arvid Heise-3
+1 (non-binding)
Checkpoint timeout in cases of backpressure is hard to tune. I and our users ever spent lots of time on that. It is great to have this feature. Arvid Heise <[hidden email]> 于2020年3月10日周二 下午9:33写道: > Hi all, > > I would like to start the vote for FLIP-76 [1], which is discussed and > reached a consensus in the discussion thread [2]. > > The vote will be open until March. 13th (72h), unless there is an objection > or not enough votes. > > Thanks, > Arvid > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > [2] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > |
In reply to this post by Arvid Heise-3
+1 (binding)
The updated FLIP doc LGTM. Thanks for addressing the comments Arvid and Roman. Best Regards, Yu On Fri, 13 Mar 2020 at 03:48, Arvid Heise <[hidden email]> wrote: > I added a roadmap section to the FLIP as suggested by Yu and Roman. > > Unless someone objects, I'd still consider the voting period to end > tomorrow. For me, the roadmap is only a clarification of already written > and discussed points. > > We already have enough binding votes, but there may be concerns popping up > until tomorrow. > > On Thu, Mar 12, 2020 at 5:00 PM Yun Gao <[hidden email]> > wrote: > > > +1 (non-binding) > > I think the PoC result has shown the effect on reducing checkpoint > > time when back-pressure occurs, and I totally agree with that the feature > > could be implemented in steps. > > > > > > ------------------------------------------------------------------ > > From:Roman Khachatryan <[hidden email]> > > Send Time:2020 Mar. 12 (Thu.) 01:33 > > To:dev <[hidden email]>; Zhijiang <[hidden email]> > > Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints > > > > +1 (non-binding) > > > > Regarding Yu's suggestion about *Roadmap* or *Future Work* section, I > think > > it's a good idea. > > Currently, some MVP limitations are mentioned at the end of the document, > > so we can extract and expand it. > > As for the recovery speed it's not a priority currently, but we could > also > > mention it in this section. > > > > > > On Wed, Mar 11, 2020 at 4:11 PM Zhijiang <[hidden email] > > .invalid> > > wrote: > > > > > +1 (binding). > > > > > > As for David's concern of smaller buffers after recovery, I ever had a > > > draft design [1] to solve this issue. > > > You can take a look and leave comments if still have concerns. :) > > > > > > [1] > > > > > > https://docs.google.com/document/d/16_MOQymzxrKvUHXh6QFr2AAXIKt_2vPUf8vzKy4H_tU/edit > > > > > > Best, > > > Zhijiang > > > > > > > > > ------------------------------------------------------------------ > > > From:Piotr Nowojski <[hidden email]> > > > Send Time:2020 Mar. 11 (Wed.) 21:19 > > > To:dev <[hidden email]> > > > Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints > > > > > > +1 (binding). > > > > > > Piotrek > > > > > > > On 11 Mar 2020, at 09:19, David Anderson <[hidden email]> > wrote: > > > > > > > > +1 I like where this is headed. > > > > > > > > One question: during restore, it could happen that a new task manager > > is > > > > configured with fewer or smaller buffers than was previously the > case. > > > How > > > > will this be handled? > > > > > > > > David > > > > > > > > > > > > On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <[hidden email]> > > wrote: > > > > > > > >> Hi Thomas, > > > >> > > > >> it's like you said. The first version will not support rescaling and > > > mostly > > > >> addresses the concerns about making little to no progress because of > > > >> frequent crashes. > > > >> > > > >> The main reason is that we cannot guarantee the ordering of > non-keyed > > > data > > > >> (and even keyed data in some weird cases) when rescaling currently. > We > > > have > > > >> a general concept to address that, which would also enable dynamic > > > >> rescaling in the future, but that would make the changes even bigger > > > and we > > > >> would not have any version ready for 1.11. > > > >> > > > >> The current plan, of course, is to continue improving unaligned > > > checkpoints > > > >> immediately after release, such that we have the full feature set > for > > > 1.12. > > > >> Potentially, unaligned checkpoints (with timeouts) would even become > > the > > > >> default option. > > > >> > > > >> On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> > wrote: > > > >> > > > >>> +1 > > > >>> > > > >>> Thanks for putting this together, looking forward to the > experimental > > > >>> support in the next release. > > > >>> > > > >>> One clarification: since the MVP won't support rescaling, does it > > imply > > > >>> that savepoints will always use aligned checkpointing? If so, this > > > would > > > >>> still block the user from taking a savepoint and resume with > > increased > > > >>> parallelism to resolve a prolonged/permanent backpressure > condition? > > > >>> > > > >>> Thanks, > > > >>> Thomas > > > >>> > > > >>> > > > >>> On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email]> > > > wrote: > > > >>> > > > >>>> Hi all, > > > >>>> > > > >>>> I would like to start the vote for FLIP-76 [1], which is discussed > > and > > > >>>> reached a consensus in the discussion thread [2]. > > > >>>> > > > >>>> The vote will be open until March. 13th (72h), unless there is an > > > >>> objection > > > >>>> or not enough votes. > > > >>>> > > > >>>> Thanks, > > > >>>> Arvid > > > >>>> > > > >>>> [1] > > > >>>> > > > >>>> > > > >>> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > > > >>>> [2] > > > >>>> > > > >>>> > > > >>> > > > >> > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > > > >>>> > > > >>> > > > >> > > > > > > > > > > -- > > Regards, > > Roman > > > > > |
Voting period is now over even with the roadmap changes (forgot to close on
Friday because of all the Coronavirus chaos). We have 4 binding votes (Thomas, Yu, Piotr, Zhijiang) and no objections, so FLIP-76 passed. Thank you very much for your feedback. On Fri, Mar 13, 2020 at 11:08 AM Yu Li <[hidden email]> wrote: > +1 (binding) > > The updated FLIP doc LGTM. Thanks for addressing the comments Arvid and > Roman. > > Best Regards, > Yu > > > On Fri, 13 Mar 2020 at 03:48, Arvid Heise <[hidden email]> wrote: > > > I added a roadmap section to the FLIP as suggested by Yu and Roman. > > > > Unless someone objects, I'd still consider the voting period to end > > tomorrow. For me, the roadmap is only a clarification of already written > > and discussed points. > > > > We already have enough binding votes, but there may be concerns popping > up > > until tomorrow. > > > > On Thu, Mar 12, 2020 at 5:00 PM Yun Gao <[hidden email]> > > wrote: > > > > > +1 (non-binding) > > > I think the PoC result has shown the effect on reducing checkpoint > > > time when back-pressure occurs, and I totally agree with that the > feature > > > could be implemented in steps. > > > > > > > > > ------------------------------------------------------------------ > > > From:Roman Khachatryan <[hidden email]> > > > Send Time:2020 Mar. 12 (Thu.) 01:33 > > > To:dev <[hidden email]>; Zhijiang <[hidden email]> > > > Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints > > > > > > +1 (non-binding) > > > > > > Regarding Yu's suggestion about *Roadmap* or *Future Work* section, I > > think > > > it's a good idea. > > > Currently, some MVP limitations are mentioned at the end of the > document, > > > so we can extract and expand it. > > > As for the recovery speed it's not a priority currently, but we could > > also > > > mention it in this section. > > > > > > > > > On Wed, Mar 11, 2020 at 4:11 PM Zhijiang <[hidden email] > > > .invalid> > > > wrote: > > > > > > > +1 (binding). > > > > > > > > As for David's concern of smaller buffers after recovery, I ever had > a > > > > draft design [1] to solve this issue. > > > > You can take a look and leave comments if still have concerns. :) > > > > > > > > [1] > > > > > > > > > > https://docs.google.com/document/d/16_MOQymzxrKvUHXh6QFr2AAXIKt_2vPUf8vzKy4H_tU/edit > > > > > > > > Best, > > > > Zhijiang > > > > > > > > > > > > ------------------------------------------------------------------ > > > > From:Piotr Nowojski <[hidden email]> > > > > Send Time:2020 Mar. 11 (Wed.) 21:19 > > > > To:dev <[hidden email]> > > > > Subject:Re: [VOTE] [FLIP-76] Unaligned checkpoints > > > > > > > > +1 (binding). > > > > > > > > Piotrek > > > > > > > > > On 11 Mar 2020, at 09:19, David Anderson <[hidden email]> > > wrote: > > > > > > > > > > +1 I like where this is headed. > > > > > > > > > > One question: during restore, it could happen that a new task > manager > > > is > > > > > configured with fewer or smaller buffers than was previously the > > case. > > > > How > > > > > will this be handled? > > > > > > > > > > David > > > > > > > > > > > > > > > On Wed, Mar 11, 2020 at 8:31 AM Arvid Heise <[hidden email]> > > > wrote: > > > > > > > > > >> Hi Thomas, > > > > >> > > > > >> it's like you said. The first version will not support rescaling > and > > > > mostly > > > > >> addresses the concerns about making little to no progress because > of > > > > >> frequent crashes. > > > > >> > > > > >> The main reason is that we cannot guarantee the ordering of > > non-keyed > > > > data > > > > >> (and even keyed data in some weird cases) when rescaling > currently. > > We > > > > have > > > > >> a general concept to address that, which would also enable dynamic > > > > >> rescaling in the future, but that would make the changes even > bigger > > > > and we > > > > >> would not have any version ready for 1.11. > > > > >> > > > > >> The current plan, of course, is to continue improving unaligned > > > > checkpoints > > > > >> immediately after release, such that we have the full feature set > > for > > > > 1.12. > > > > >> Potentially, unaligned checkpoints (with timeouts) would even > become > > > the > > > > >> default option. > > > > >> > > > > >> On Tue, Mar 10, 2020 at 11:14 PM Thomas Weise <[hidden email]> > > wrote: > > > > >> > > > > >>> +1 > > > > >>> > > > > >>> Thanks for putting this together, looking forward to the > > experimental > > > > >>> support in the next release. > > > > >>> > > > > >>> One clarification: since the MVP won't support rescaling, does it > > > imply > > > > >>> that savepoints will always use aligned checkpointing? If so, > this > > > > would > > > > >>> still block the user from taking a savepoint and resume with > > > increased > > > > >>> parallelism to resolve a prolonged/permanent backpressure > > condition? > > > > >>> > > > > >>> Thanks, > > > > >>> Thomas > > > > >>> > > > > >>> > > > > >>> On Tue, Mar 10, 2020 at 6:33 AM Arvid Heise <[hidden email] > > > > > > wrote: > > > > >>> > > > > >>>> Hi all, > > > > >>>> > > > > >>>> I would like to start the vote for FLIP-76 [1], which is > discussed > > > and > > > > >>>> reached a consensus in the discussion thread [2]. > > > > >>>> > > > > >>>> The vote will be open until March. 13th (72h), unless there is > an > > > > >>> objection > > > > >>>> or not enough votes. > > > > >>>> > > > > >>>> Thanks, > > > > >>>> Arvid > > > > >>>> > > > > >>>> [1] > > > > >>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-76%3A+Unaligned+Checkpoints > > > > >>>> [2] > > > > >>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-76-Unaligned-checkpoints-td33651.html > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > > > -- > > > Regards, > > > Roman > > > > > > > > > |
Free forum by Nabble | Edit this page |