Dear users,
Unfortunately, the bug in the unaligned checkpoint that we fixed in 1.12.1 still occurs under certain circumstances, such that we recommend to not use unaligned checkpoints in production until 1.12.2. While the normal processing is not affected by this bug, a recovery with corrupted checkpoints will not succeed. If you have used unaligned checkpoints, you can change back to aligned checkpoint when starting from an uncorrupted unaligned checkpoint. There is no easy way to check if a checkpoint is corrupted or not, however, the rare corruption happens most likely when you have short checkpointing intervals (<1s), high backpressure, and the previous checkpoint was declined for some reason. So to be safe, before switching back, make sure that the last handful of checkpoints all succeeded. We have already prepared a fix that we will merge into the release branch today, but the discussion on when to release 1.12.2 has not started yet. Best, Arvid |
Hi Arvid,
Thanks for the announcement. I think we'd better also update the 1.12 release notes[1] and 1.12.1 release blog post[2]. Would you have time to help prepare the warning messages? Thank you~ Xintong Song [1] https://github.com/apache/flink/blob/master/docs/release-notes/flink-1.12.md [2] https://github.com/apache/flink-web/blob/asf-site/_posts/2021-01-19-release-1.12.1.md On Fri, Jan 22, 2021 at 7:40 PM Arvid Heise <[hidden email]> wrote: > Dear users, > > Unfortunately, the bug in the unaligned checkpoint that we fixed in 1.12.1 > still occurs under certain circumstances, such that we recommend to not use > unaligned checkpoints in production until 1.12.2. While the normal > processing is not affected by this bug, a recovery with corrupted > checkpoints will not succeed. > > If you have used unaligned checkpoints, you can change back to aligned > checkpoint when starting from an uncorrupted unaligned checkpoint. There is > no easy way to check if a checkpoint is corrupted or not, however, the rare > corruption happens most likely when you have short checkpointing intervals > (<1s), high backpressure, and the previous checkpoint was declined for some > reason. So to be safe, before switching back, make sure that the last > handful of checkpoints all succeeded. > > We have already prepared a fix that we will merge into the release branch > today, but the discussion on when to release 1.12.2 has not started yet. > > Best, > > Arvid > |
Hi Xintong,
yes, I'm on it. Best, Arvid On Fri, Jan 22, 2021 at 1:01 PM Xintong Song <[hidden email]> wrote: > Hi Arvid, > > Thanks for the announcement. > > I think we'd better also update the 1.12 release notes[1] and 1.12.1 > release blog post[2]. > Would you have time to help prepare the warning messages? > > Thank you~ > > Xintong Song > > > [1] > > https://github.com/apache/flink/blob/master/docs/release-notes/flink-1.12.md > > [2] > > https://github.com/apache/flink-web/blob/asf-site/_posts/2021-01-19-release-1.12.1.md > > > > On Fri, Jan 22, 2021 at 7:40 PM Arvid Heise <[hidden email]> wrote: > > > Dear users, > > > > Unfortunately, the bug in the unaligned checkpoint that we fixed in > 1.12.1 > > still occurs under certain circumstances, such that we recommend to not > use > > unaligned checkpoints in production until 1.12.2. While the normal > > processing is not affected by this bug, a recovery with corrupted > > checkpoints will not succeed. > > > > If you have used unaligned checkpoints, you can change back to aligned > > checkpoint when starting from an uncorrupted unaligned checkpoint. There > is > > no easy way to check if a checkpoint is corrupted or not, however, the > rare > > corruption happens most likely when you have short checkpointing > intervals > > (<1s), high backpressure, and the previous checkpoint was declined for > some > > reason. So to be safe, before switching back, make sure that the last > > handful of checkpoints all succeeded. > > > > We have already prepared a fix that we will merge into the release branch > > today, but the discussion on when to release 1.12.2 has not started yet. > > > > Best, > > > > Arvid > > > -- Arvid Heise | Senior Java Developer <https://www.ververica.com/> Follow us @VervericaData -- Join Flink Forward <https://flink-forward.org/> - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbH Registered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
Thanks for the update Arvid. This fix warrants a quick 1.12.2 release imo.
Cheers, Till On Fri, Jan 22, 2021 at 1:42 PM Arvid Heise <[hidden email]> wrote: > Hi Xintong, > > yes, I'm on it. > > Best, > > Arvid > > On Fri, Jan 22, 2021 at 1:01 PM Xintong Song <[hidden email]> > wrote: > > > Hi Arvid, > > > > Thanks for the announcement. > > > > I think we'd better also update the 1.12 release notes[1] and 1.12.1 > > release blog post[2]. > > Would you have time to help prepare the warning messages? > > > > Thank you~ > > > > Xintong Song > > > > > > [1] > > > > > https://github.com/apache/flink/blob/master/docs/release-notes/flink-1.12.md > > > > [2] > > > > > https://github.com/apache/flink-web/blob/asf-site/_posts/2021-01-19-release-1.12.1.md > > > > > > > > On Fri, Jan 22, 2021 at 7:40 PM Arvid Heise <[hidden email]> wrote: > > > > > Dear users, > > > > > > Unfortunately, the bug in the unaligned checkpoint that we fixed in > > 1.12.1 > > > still occurs under certain circumstances, such that we recommend to not > > use > > > unaligned checkpoints in production until 1.12.2. While the normal > > > processing is not affected by this bug, a recovery with corrupted > > > checkpoints will not succeed. > > > > > > If you have used unaligned checkpoints, you can change back to aligned > > > checkpoint when starting from an uncorrupted unaligned checkpoint. > There > > is > > > no easy way to check if a checkpoint is corrupted or not, however, the > > rare > > > corruption happens most likely when you have short checkpointing > > intervals > > > (<1s), high backpressure, and the previous checkpoint was declined for > > some > > > reason. So to be safe, before switching back, make sure that the last > > > handful of checkpoints all succeeded. > > > > > > We have already prepared a fix that we will merge into the release > branch > > > today, but the discussion on when to release 1.12.2 has not started > yet. > > > > > > Best, > > > > > > Arvid > > > > > > > > -- > > Arvid Heise | Senior Java Developer > > <https://www.ververica.com/> > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > (Toni) Cheng > |
Hi Till,
I completely agree with you. Best, Arvid On Fri, Jan 22, 2021 at 1:46 PM Till Rohrmann <[hidden email]> wrote: > Thanks for the update Arvid. This fix warrants a quick 1.12.2 release imo. > > Cheers, > Till > > On Fri, Jan 22, 2021 at 1:42 PM Arvid Heise <[hidden email]> wrote: > > > Hi Xintong, > > > > yes, I'm on it. > > > > Best, > > > > Arvid > > > > On Fri, Jan 22, 2021 at 1:01 PM Xintong Song <[hidden email]> > > wrote: > > > > > Hi Arvid, > > > > > > Thanks for the announcement. > > > > > > I think we'd better also update the 1.12 release notes[1] and 1.12.1 > > > release blog post[2]. > > > Would you have time to help prepare the warning messages? > > > > > > Thank you~ > > > > > > Xintong Song > > > > > > > > > [1] > > > > > > > > > https://github.com/apache/flink/blob/master/docs/release-notes/flink-1.12.md > > > > > > [2] > > > > > > > > > https://github.com/apache/flink-web/blob/asf-site/_posts/2021-01-19-release-1.12.1.md > > > > > > > > > > > > On Fri, Jan 22, 2021 at 7:40 PM Arvid Heise <[hidden email]> wrote: > > > > > > > Dear users, > > > > > > > > Unfortunately, the bug in the unaligned checkpoint that we fixed in > > > 1.12.1 > > > > still occurs under certain circumstances, such that we recommend to > not > > > use > > > > unaligned checkpoints in production until 1.12.2. While the normal > > > > processing is not affected by this bug, a recovery with corrupted > > > > checkpoints will not succeed. > > > > > > > > If you have used unaligned checkpoints, you can change back to > aligned > > > > checkpoint when starting from an uncorrupted unaligned checkpoint. > > There > > > is > > > > no easy way to check if a checkpoint is corrupted or not, however, > the > > > rare > > > > corruption happens most likely when you have short checkpointing > > > intervals > > > > (<1s), high backpressure, and the previous checkpoint was declined > for > > > some > > > > reason. So to be safe, before switching back, make sure that the last > > > > handful of checkpoints all succeeded. > > > > > > > > We have already prepared a fix that we will merge into the release > > branch > > > > today, but the discussion on when to release 1.12.2 has not started > > yet. > > > > > > > > Best, > > > > > > > > Arvid > > > > > > > > > > > > > -- > > > > Arvid Heise | Senior Java Developer > > > > <https://www.ververica.com/> > > > > Follow us @VervericaData > > > > -- > > > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > > Conference > > > > Stream Processing | Event Driven | Real Time > > > > -- > > > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > > > -- > > Ververica GmbH > > Registered at Amtsgericht Charlottenburg: HRB 158244 B > > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > > (Toni) Cheng > > > |
Free forum by Nabble | Edit this page |