Custom Tumbling Window Assigner

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Custom Tumbling Window Assigner

dan bress
I am interesting in writing a custom tumbling window assigner to do the
following:

As soon as the first event for that keyed stream comes in, create a window
from that point going forward N milli seconds.  Is it possible to do that
using flink?  What I want to do is get all events that occurred in a 5
minute period, where the period starts with the time of the first event.
Flink's current tumbling time windows would be locked to the top of 0, 5,
10, 15 minutes, which doesn't work well for me if the first event for a key
comes in at 14 minutes, because the window will be triggered at 15 minutes,
instead of at 19 minutes.

Looking at the interface for WindowAssigner it doesn't look like I have
enough information to do what I need.

I would need to know the key, since each keyed stream could have a
different start time, and I would need to be able to store some state about
what the start time is for each key, like a Map of
StreamKey->FirstEventTime.

Do you have any suggestions for assigning a window that starts with the
time of the first event in that keyed stream, and ends N minutes after that?

Thanks!

Dan
Reply | Threaded
Open this post in threaded view
|

Re: Custom Tumbling Window Assigner

Ufuk Celebi-2
If I'm understanding correctly, you want something liked
"tumbling-session" windows. Session windows [1] do exactly what you
describe but are only evaluated after a certain period of inactivity,
but you want to evaluate the window every X minutes after the first
element for a key arrived, right? I think Gyula (cc'd) did something
similiar for [2]. Maybe he can share his code?

[1] https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/windows.html#session-windows
[2] https://techblog.king.com/rbea-scalable-real-time-analytics-king/

On Mon, Aug 15, 2016 at 7:05 PM, dan bress <[hidden email]> wrote:

> I am interesting in writing a custom tumbling window assigner to do the
> following:
>
> As soon as the first event for that keyed stream comes in, create a window
> from that point going forward N milli seconds.  Is it possible to do that
> using flink?  What I want to do is get all events that occurred in a 5
> minute period, where the period starts with the time of the first event.
> Flink's current tumbling time windows would be locked to the top of 0, 5,
> 10, 15 minutes, which doesn't work well for me if the first event for a key
> comes in at 14 minutes, because the window will be triggered at 15 minutes,
> instead of at 19 minutes.
>
> Looking at the interface for WindowAssigner it doesn't look like I have
> enough information to do what I need.
>
> I would need to know the key, since each keyed stream could have a
> different start time, and I would need to be able to store some state about
> what the start time is for each key, like a Map of
> StreamKey->FirstEventTime.
>
> Do you have any suggestions for assigning a window that starts with the
> time of the first event in that keyed stream, and ends N minutes after that?
>
> Thanks!
>
> Dan
Reply | Threaded
Open this post in threaded view
|

Re: Custom Tumbling Window Assigner

Aljoscha Krettek-2
Hi,
the problem with using session windows is that you don't know which one is
the "first element" unless you keep some sort of state in the
WindowAssigner, right? Otherwise, the first element could spawn a window
and the non-first elements could have windows of length 0 that get merged
into the first window of the length that you desired.

Cheers,
Aljoschae

On Tue, 16 Aug 2016 at 11:54 Ufuk Celebi <[hidden email]> wrote:

> If I'm understanding correctly, you want something liked
> "tumbling-session" windows. Session windows [1] do exactly what you
> describe but are only evaluated after a certain period of inactivity,
> but you want to evaluate the window every X minutes after the first
> element for a key arrived, right? I think Gyula (cc'd) did something
> similiar for [2]. Maybe he can share his code?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/windows.html#session-windows
> [2] https://techblog.king.com/rbea-scalable-real-time-analytics-king/
>
> On Mon, Aug 15, 2016 at 7:05 PM, dan bress <[hidden email]> wrote:
> > I am interesting in writing a custom tumbling window assigner to do the
> > following:
> >
> > As soon as the first event for that keyed stream comes in, create a
> window
> > from that point going forward N milli seconds.  Is it possible to do that
> > using flink?  What I want to do is get all events that occurred in a 5
> > minute period, where the period starts with the time of the first event.
> > Flink's current tumbling time windows would be locked to the top of 0, 5,
> > 10, 15 minutes, which doesn't work well for me if the first event for a
> key
> > comes in at 14 minutes, because the window will be triggered at 15
> minutes,
> > instead of at 19 minutes.
> >
> > Looking at the interface for WindowAssigner it doesn't look like I have
> > enough information to do what I need.
> >
> > I would need to know the key, since each keyed stream could have a
> > different start time, and I would need to be able to store some state
> about
> > what the start time is for each key, like a Map of
> > StreamKey->FirstEventTime.
> >
> > Do you have any suggestions for assigning a window that starts with the
> > time of the first event in that keyed stream, and ends N minutes after
> that?
> >
> > Thanks!
> >
> > Dan
>
Reply | Threaded
Open this post in threaded view
|

Re: Custom Tumbling Window Assigner

dan bress
Aljoschae, exactly.

I'm assuming I don't want to maintain state in the window assigner? What
would you do if you wanted to solve this problem?
On Tue, Aug 16, 2016 at 7:29 AM Aljoscha Krettek <[hidden email]>
wrote:

> Hi,
> the problem with using session windows is that you don't know which one is
> the "first element" unless you keep some sort of state in the
> WindowAssigner, right? Otherwise, the first element could spawn a window
> and the non-first elements could have windows of length 0 that get merged
> into the first window of the length that you desired.
>
> Cheers,
> Aljoschae
>
> On Tue, 16 Aug 2016 at 11:54 Ufuk Celebi <[hidden email]> wrote:
>
> > If I'm understanding correctly, you want something liked
> > "tumbling-session" windows. Session windows [1] do exactly what you
> > describe but are only evaluated after a certain period of inactivity,
> > but you want to evaluate the window every X minutes after the first
> > element for a key arrived, right? I think Gyula (cc'd) did something
> > similiar for [2]. Maybe he can share his code?
> >
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/windows.html#session-windows
> > [2] https://techblog.king.com/rbea-scalable-real-time-analytics-king/
> >
> > On Mon, Aug 15, 2016 at 7:05 PM, dan bress <[hidden email]> wrote:
> > > I am interesting in writing a custom tumbling window assigner to do the
> > > following:
> > >
> > > As soon as the first event for that keyed stream comes in, create a
> > window
> > > from that point going forward N milli seconds.  Is it possible to do
> that
> > > using flink?  What I want to do is get all events that occurred in a 5
> > > minute period, where the period starts with the time of the first
> event.
> > > Flink's current tumbling time windows would be locked to the top of 0,
> 5,
> > > 10, 15 minutes, which doesn't work well for me if the first event for a
> > key
> > > comes in at 14 minutes, because the window will be triggered at 15
> > minutes,
> > > instead of at 19 minutes.
> > >
> > > Looking at the interface for WindowAssigner it doesn't look like I have
> > > enough information to do what I need.
> > >
> > > I would need to know the key, since each keyed stream could have a
> > > different start time, and I would need to be able to store some state
> > about
> > > what the start time is for each key, like a Map of
> > > StreamKey->FirstEventTime.
> > >
> > > Do you have any suggestions for assigning a window that starts with the
> > > time of the first event in that keyed stream, and ends N minutes after
> > that?
> > >
> > > Thanks!
> > >
> > > Dan
> >
>