Hello,
I have a slight doubt on checkpointing in Flink and wanted to clarify my understanding. Flink uses barriers internally to keep track of the records that were processed. The documentation[1] describes it as the checkpoint was only happening when the barriers are transferred to the sink. So let's consider a toy example of `TumblingEventTimeWindow` set to 5 hours and `CheckpointInterval` set to 10 minutes. So, if the documentation is correct, the checkpoint should occur only when the window is processed and gets to sink (which can take several hours) , which is not true as far as I know. I am surely wrong somewhere, could someone explain where is the error in my logic ? [1] https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html |
When a WindowOperator receives all the barrier from the upstream, it will forward the barrier to downstream operator and perform the checkpoint asynchronously.
It doesn't have to wait the window to trigger before sending out the barrier. Regards, Dian > 在 2019年9月6日,下午8:02,Dominik Wosiński <[hidden email]> 写道: > > Hello, > I have a slight doubt on checkpointing in Flink and wanted to clarify my > understanding. Flink uses barriers internally to keep track of the records > that were processed. The documentation[1] describes it as the checkpoint > was only happening when the barriers are transferred to the sink. So let's > consider a toy example of `TumblingEventTimeWindow` set to 5 hours and > `CheckpointInterval` set to 10 minutes. So, if the documentation is > correct, the checkpoint should occur only when the window is processed and > gets to sink (which can take several hours) , which is not true as far as I > know. I am surely wrong somewhere, could someone explain where is the error > in my logic ? > > > [1] > https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html |
Hi Dominik,
A record is already processed once it enters the window. Thus the checkpoint barrier does not get blocked before the window containing the leading records is triggered. A window is actually part of the states of the WindowOperator and the data records processing is to build up this state. Thanks, Zhu Zhu Dian Fu <[hidden email]> 于2019年9月6日周五 下午8:17写道: > When a WindowOperator receives all the barrier from the upstream, it will > forward the barrier to downstream operator and perform the checkpoint > asynchronously. > It doesn't have to wait the window to trigger before sending out the > barrier. > > Regards, > Dian > > > 在 2019年9月6日,下午8:02,Dominik Wosiński <[hidden email]> 写道: > > > > Hello, > > I have a slight doubt on checkpointing in Flink and wanted to clarify my > > understanding. Flink uses barriers internally to keep track of the > records > > that were processed. The documentation[1] describes it as the checkpoint > > was only happening when the barriers are transferred to the sink. So > let's > > consider a toy example of `TumblingEventTimeWindow` set to 5 hours and > > `CheckpointInterval` set to 10 minutes. So, if the documentation is > > correct, the checkpoint should occur only when the window is processed > and > > gets to sink (which can take several hours) , which is not true as far > as I > > know. I am surely wrong somewhere, could someone explain where is the > error > > in my logic ? > > > > > > [1] > > > https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html > > |
Okay, thanks for clarifying. I have some followup question here. If we
consider Kafka offsets commits, this basically means that the offsets committed during the checkpoint are not necessarily the offsets that were really processed by the pipeline and written to sink ? I mean If there is a window in the pipeline, then the records are saved in the window state if the window was not emitted yet, but they are considered as processed, thus will not be replayed in case of restart, because Flink considers them as processed when committing offsets. Am I correct ? Best, Dom. |
Hi Dom,
There are sync phase and async phase in checkpointing. When a operator receives a barrier, it performs snapshot aka the sync phase. And when the barriers pass through all the operators including sinks, the operators will get a notification, after which they do the async part, like committing the Kafka offsets. WRT your question, the offsets would only be committed when the whole checkpoint is successfully finished. For more information, you can refer to this post[1]. [1] https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html <https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html> Best, Paul Lam > 在 2019年9月9日,07:06,Dominik Wosiński <[hidden email]> 写道: > > Okay, thanks for clarifying. I have some followup question here. If we > consider Kafka offsets commits, this basically means that > the offsets committed during the checkpoint are not necessarily the > offsets that were really processed by the pipeline and written to sink ? I > mean If there is a window in the pipeline, then the records are saved in > the window state if the window was not emitted yet, but they are considered > as processed, thus will not be replayed in case of restart, because Flink > considers them as processed when committing offsets. Am I correct ? > > Best, > Dom. |
Yes you are correct Dominik. The committed Kafka offsets tell you what the
program has read as input from the Kafka topic. But depending on the actual program logic this does not mean that you have output the results of processing these input events up to this point. As you have said, there are Flink operations such as window calculations which need to buffer events for a certain period before they can emit the results. Cheers, Till On Mon, Sep 9, 2019 at 4:54 AM Paul Lam <[hidden email]> wrote: > Hi Dom, > > There are sync phase and async phase in checkpointing. When a operator > receives a barrier, it performs snapshot aka the sync phase. And when the > barriers pass through all the operators including sinks, the operators will > get a notification, after which they do the async part, like committing the > Kafka offsets. WRT your question, the offsets would only be committed when > the whole checkpoint is successfully finished. For more information, you > can refer to this post[1]. > > [1] > https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html > < > https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html > > > > Best, > Paul Lam > > > 在 2019年9月9日,07:06,Dominik Wosiński <[hidden email]> 写道: > > > > Okay, thanks for clarifying. I have some followup question here. If we > > consider Kafka offsets commits, this basically means that > > the offsets committed during the checkpoint are not necessarily the > > offsets that were really processed by the pipeline and written to sink ? > I > > mean If there is a window in the pipeline, then the records are saved in > > the window state if the window was not emitted yet, but they are > considered > > as processed, thus will not be replayed in case of restart, because Flink > > considers them as processed when committing offsets. Am I correct ? > > > > Best, > > Dom. > > |
Free forum by Nabble | Edit this page |