Checkpointing Event Time Watermarks

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpointing Event Time Watermarks

vijay kansal
Hi All

Is there a way to checkpoint event time watermarks in Flink ?

I tries searching for this, but could not figure out...


Vijay Kansal
Software Development Engineer
LimeRoad
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing Event Time Watermarks

xingcanc
Hi Vijay,

normally, maybe there’s no need to checkpoint the event times / watermarks since they are automatically generated based on the records. What’s your intention?

Best,
Xingcan

> On 27 Feb 2018, at 8:50 PM, vijay kansal <[hidden email]> wrote:
>
> Hi All
>
> Is there a way to checkpoint event time watermarks in Flink ?
>
> I tries searching for this, but could not figure out...
>
>
> Vijay Kansal
> Software Development Engineer
> LimeRoad

Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing Event Time Watermarks

vijay kansal
Hi Xingcan

We are receiving events from a no. of independent data sources and hence,
data arriving into our Flink topology (via Kafka) would be out of order.

We are creating 1-min event time windows in our Flink topology and
generating event time watermarks as (current event time - some threshold
(30 seconds)) at the source operator.

In case a few events arrive after the set threshold, those events are
simply ignored (which is ok in our case, because most of the events
belonging to that minute would have already arrived and got processed in
the corresponding window).

Now, the problem is that in case the program crashes (for whatever reason)
and is then resumed again from the last successful checkpoint, out of order
arriving events would trigger execution of past (already processed) windows
(with only a minuscule of events in that window) overriding results of
prev. computation of that window.

In case Flink had checkpointed event time watermarks, this problem would
not have occurred.

So, I am wondering if there is a way to enforce event time watermarks'
checkpointing in Flink...











Vijay Kansal
Software Development Engineer
LimeRoad

On Tue, Feb 27, 2018 at 6:54 PM, Xingcan Cui <[hidden email]> wrote:

> Hi Vijay,
>
> normally, maybe there’s no need to checkpoint the event times / watermarks
> since they are automatically generated based on the records. What’s your
> intention?
>
> Best,
> Xingcan
>
> > On 27 Feb 2018, at 8:50 PM, vijay kansal <[hidden email]> wrote:
> >
> > Hi All
> >
> > Is there a way to checkpoint event time watermarks in Flink ?
> >
> > I tries searching for this, but could not figure out...
> >
> >
> > Vijay Kansal
> > Software Development Engineer
> > LimeRoad
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Checkpointing Event Time Watermarks

Fabian Hueske-2
Hi Vijay,

I think the easiest solution would be to inject a ProcessFunction after the
window operator.
The ProcessFunction has access to the current watermark via its Context
object and can store it in union operator state.
In case of a failure, the ProcessFunction recovers the watermark from its
state and filters all records with smaller timestamps than the watermark
(timestamps are also accessible via the Context obejct).

Best, Fabain


2018-03-02 9:06 GMT+01:00 vijay kansal <[hidden email]>:

> Hi Xingcan
>
> We are receiving events from a no. of independent data sources and hence,
> data arriving into our Flink topology (via Kafka) would be out of order.
>
> We are creating 1-min event time windows in our Flink topology and
> generating event time watermarks as (current event time - some threshold
> (30 seconds)) at the source operator.
>
> In case a few events arrive after the set threshold, those events are
> simply ignored (which is ok in our case, because most of the events
> belonging to that minute would have already arrived and got processed in
> the corresponding window).
>
> Now, the problem is that in case the program crashes (for whatever reason)
> and is then resumed again from the last successful checkpoint, out of order
> arriving events would trigger execution of past (already processed) windows
> (with only a minuscule of events in that window) overriding results of
> prev. computation of that window.
>
> In case Flink had checkpointed event time watermarks, this problem would
> not have occurred.
>
> So, I am wondering if there is a way to enforce event time watermarks'
> checkpointing in Flink...
>
>
>
>
>
>
>
>
>
>
>
> Vijay Kansal
> Software Development Engineer
> LimeRoad
>
> On Tue, Feb 27, 2018 at 6:54 PM, Xingcan Cui <[hidden email]> wrote:
>
> > Hi Vijay,
> >
> > normally, maybe there’s no need to checkpoint the event times /
> watermarks
> > since they are automatically generated based on the records. What’s your
> > intention?
> >
> > Best,
> > Xingcan
> >
> > > On 27 Feb 2018, at 8:50 PM, vijay kansal <[hidden email]>
> wrote:
> > >
> > > Hi All
> > >
> > > Is there a way to checkpoint event time watermarks in Flink ?
> > >
> > > I tries searching for this, but could not figure out...
> > >
> > >
> > > Vijay Kansal
> > > Software Development Engineer
> > > LimeRoad
> >
> >
>