Hi All
Is there a way to checkpoint event time watermarks in Flink ? I tries searching for this, but could not figure out... Vijay Kansal Software Development Engineer LimeRoad |
Hi Vijay,
normally, maybe there’s no need to checkpoint the event times / watermarks since they are automatically generated based on the records. What’s your intention? Best, Xingcan > On 27 Feb 2018, at 8:50 PM, vijay kansal <[hidden email]> wrote: > > Hi All > > Is there a way to checkpoint event time watermarks in Flink ? > > I tries searching for this, but could not figure out... > > > Vijay Kansal > Software Development Engineer > LimeRoad |
Hi Xingcan
We are receiving events from a no. of independent data sources and hence, data arriving into our Flink topology (via Kafka) would be out of order. We are creating 1-min event time windows in our Flink topology and generating event time watermarks as (current event time - some threshold (30 seconds)) at the source operator. In case a few events arrive after the set threshold, those events are simply ignored (which is ok in our case, because most of the events belonging to that minute would have already arrived and got processed in the corresponding window). Now, the problem is that in case the program crashes (for whatever reason) and is then resumed again from the last successful checkpoint, out of order arriving events would trigger execution of past (already processed) windows (with only a minuscule of events in that window) overriding results of prev. computation of that window. In case Flink had checkpointed event time watermarks, this problem would not have occurred. So, I am wondering if there is a way to enforce event time watermarks' checkpointing in Flink... Vijay Kansal Software Development Engineer LimeRoad On Tue, Feb 27, 2018 at 6:54 PM, Xingcan Cui <[hidden email]> wrote: > Hi Vijay, > > normally, maybe there’s no need to checkpoint the event times / watermarks > since they are automatically generated based on the records. What’s your > intention? > > Best, > Xingcan > > > On 27 Feb 2018, at 8:50 PM, vijay kansal <[hidden email]> wrote: > > > > Hi All > > > > Is there a way to checkpoint event time watermarks in Flink ? > > > > I tries searching for this, but could not figure out... > > > > > > Vijay Kansal > > Software Development Engineer > > LimeRoad > > |
Hi Vijay,
I think the easiest solution would be to inject a ProcessFunction after the window operator. The ProcessFunction has access to the current watermark via its Context object and can store it in union operator state. In case of a failure, the ProcessFunction recovers the watermark from its state and filters all records with smaller timestamps than the watermark (timestamps are also accessible via the Context obejct). Best, Fabain 2018-03-02 9:06 GMT+01:00 vijay kansal <[hidden email]>: > Hi Xingcan > > We are receiving events from a no. of independent data sources and hence, > data arriving into our Flink topology (via Kafka) would be out of order. > > We are creating 1-min event time windows in our Flink topology and > generating event time watermarks as (current event time - some threshold > (30 seconds)) at the source operator. > > In case a few events arrive after the set threshold, those events are > simply ignored (which is ok in our case, because most of the events > belonging to that minute would have already arrived and got processed in > the corresponding window). > > Now, the problem is that in case the program crashes (for whatever reason) > and is then resumed again from the last successful checkpoint, out of order > arriving events would trigger execution of past (already processed) windows > (with only a minuscule of events in that window) overriding results of > prev. computation of that window. > > In case Flink had checkpointed event time watermarks, this problem would > not have occurred. > > So, I am wondering if there is a way to enforce event time watermarks' > checkpointing in Flink... > > > > > > > > > > > > Vijay Kansal > Software Development Engineer > LimeRoad > > On Tue, Feb 27, 2018 at 6:54 PM, Xingcan Cui <[hidden email]> wrote: > > > Hi Vijay, > > > > normally, maybe there’s no need to checkpoint the event times / > watermarks > > since they are automatically generated based on the records. What’s your > > intention? > > > > Best, > > Xingcan > > > > > On 27 Feb 2018, at 8:50 PM, vijay kansal <[hidden email]> > wrote: > > > > > > Hi All > > > > > > Is there a way to checkpoint event time watermarks in Flink ? > > > > > > I tries searching for this, but could not figure out... > > > > > > > > > Vijay Kansal > > > Software Development Engineer > > > LimeRoad > > > > > |
Free forum by Nabble | Edit this page |