Checkpointing clarification

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpointing clarification

Dominik Wosiński
Hello,
I have a slight doubt on checkpointing in Flink and wanted to clarify my
understanding. Flink uses barriers internally to keep track of the records
that were processed. The documentation[1] describes it as the checkpoint
was only happening when the barriers are transferred to the sink. So  let's
consider a toy example of `TumblingEventTimeWindow` set to 5 hours and
`CheckpointInterval` set to 10 minutes. So, if the documentation is
correct, the checkpoint should occur only when the window is processed and
gets to sink (which can take several hours) , which is not true as far as I
know. I am surely wrong somewhere, could someone explain where is the error
in my logic ?


[1]
https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing clarification

Dian Fu-2
When a WindowOperator receives all the barrier from the upstream, it will forward the barrier to downstream operator and perform the checkpoint asynchronously.
It doesn't have to wait the window to trigger before sending out the barrier.

Regards,
Dian

> 在 2019年9月6日,下午8:02,Dominik Wosiński <[hidden email]> 写道:
>
> Hello,
> I have a slight doubt on checkpointing in Flink and wanted to clarify my
> understanding. Flink uses barriers internally to keep track of the records
> that were processed. The documentation[1] describes it as the checkpoint
> was only happening when the barriers are transferred to the sink. So  let's
> consider a toy example of `TumblingEventTimeWindow` set to 5 hours and
> `CheckpointInterval` set to 10 minutes. So, if the documentation is
> correct, the checkpoint should occur only when the window is processed and
> gets to sink (which can take several hours) , which is not true as far as I
> know. I am surely wrong somewhere, could someone explain where is the error
> in my logic ?
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html

Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing clarification

Zhu Zhu
Hi Dominik,

A record is already processed once it enters the window. Thus the
checkpoint barrier does not get blocked before the window containing the
leading records is triggered.
A window is actually part of the states of the WindowOperator and the data
records processing is to build up this state.

Thanks,
Zhu Zhu

Dian Fu <[hidden email]> 于2019年9月6日周五 下午8:17写道:

> When a WindowOperator receives all the barrier from the upstream, it will
> forward the barrier to downstream operator and perform the checkpoint
> asynchronously.
> It doesn't have to wait the window to trigger before sending out the
> barrier.
>
> Regards,
> Dian
>
> > 在 2019年9月6日,下午8:02,Dominik Wosiński <[hidden email]> 写道:
> >
> > Hello,
> > I have a slight doubt on checkpointing in Flink and wanted to clarify my
> > understanding. Flink uses barriers internally to keep track of the
> records
> > that were processed. The documentation[1] describes it as the checkpoint
> > was only happening when the barriers are transferred to the sink. So
> let's
> > consider a toy example of `TumblingEventTimeWindow` set to 5 hours and
> > `CheckpointInterval` set to 10 minutes. So, if the documentation is
> > correct, the checkpoint should occur only when the window is processed
> and
> > gets to sink (which can take several hours) , which is not true as far
> as I
> > know. I am surely wrong somewhere, could someone explain where is the
> error
> > in my logic ?
> >
> >
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/internals/stream_checkpointing.html
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing clarification

Dominik Wosiński
Okay, thanks for clarifying. I have some followup question here. If we
consider Kafka offsets commits, this basically means that
the offsets committed during the checkpoint are not necessarily the
offsets that were really processed by the pipeline and written to sink ? I
mean If there is a window in the pipeline, then the records are saved in
the window state if the window was not emitted yet, but they are considered
as processed, thus will not be replayed in case of restart, because Flink
considers them as processed when committing offsets. Am I correct ?

Best,
Dom.
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing clarification

PaulLam
Hi Dom,

There are sync phase and async phase in checkpointing. When a operator receives a barrier, it performs snapshot aka the sync phase. And when the barriers pass through all the operators including sinks, the operators will get a notification, after which they do the async part, like committing the Kafka offsets. WRT your question, the offsets would only be committed when the whole checkpoint is successfully finished. For more information, you can refer to this post[1].

[1] https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html <https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html>

Best,
Paul Lam

> 在 2019年9月9日,07:06,Dominik Wosiński <[hidden email]> 写道:
>
> Okay, thanks for clarifying. I have some followup question here. If we
> consider Kafka offsets commits, this basically means that
> the offsets committed during the checkpoint are not necessarily the
> offsets that were really processed by the pipeline and written to sink ? I
> mean If there is a window in the pipeline, then the records are saved in
> the window state if the window was not emitted yet, but they are considered
> as processed, thus will not be replayed in case of restart, because Flink
> considers them as processed when committing offsets. Am I correct ?
>
> Best,
> Dom.

Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing clarification

Till Rohrmann
Yes you are correct Dominik. The committed Kafka offsets tell you what the
program has read as input from the Kafka topic. But depending on the actual
program logic this does not mean that you have output the results of
processing these input events up to this point. As you have said, there are
Flink operations such as window calculations which need to buffer events
for a certain period before they can emit the results.

Cheers,
Till

On Mon, Sep 9, 2019 at 4:54 AM Paul Lam <[hidden email]> wrote:

> Hi Dom,
>
> There are sync phase and async phase in checkpointing. When a operator
> receives a barrier, it performs snapshot aka the sync phase. And when the
> barriers pass through all the operators including sinks, the operators will
> get a notification, after which they do the async part, like committing the
> Kafka offsets. WRT your question, the offsets would only be committed when
> the whole checkpoint is successfully finished. For more information, you
> can refer to this post[1].
>
> [1]
> https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
> <
> https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
> >
>
> Best,
> Paul Lam
>
> > 在 2019年9月9日,07:06,Dominik Wosiński <[hidden email]> 写道:
> >
> > Okay, thanks for clarifying. I have some followup question here. If we
> > consider Kafka offsets commits, this basically means that
> > the offsets committed during the checkpoint are not necessarily the
> > offsets that were really processed by the pipeline and written to sink ?
> I
> > mean If there is a window in the pipeline, then the records are saved in
> > the window state if the window was not emitted yet, but they are
> considered
> > as processed, thus will not be replayed in case of restart, because Flink
> > considers them as processed when committing offsets. Am I correct ?
> >
> > Best,
> > Dom.
>
>