[Discuss] FLINK-16039 Add API method to get last element in session window

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Discuss] FLINK-16039 Add API method to get last element in session window

Manas Kale
Hi all,
I would like to start a discussion on this feature request (JIRA link).
<https://issues.apache.org/jira/browse/FLINK-16039>

Consider the events :

[1, event], [2, event]

where first element is event timestamp in seconds and second element is
event code/name.

Also consider that an Event time session window with inactivityGap = 2
seconds is acting on above stream.

When the first event arrives, a session window should be created that is
[1,1].

When the second event arrives, a new session window should be created that
is [2,2]. Since this falls within firstWindowTimestamp+inactivityGap, it
should be merged into session window [1,2] and  [2,2] should be deleted.


*This is my understanding of how session windows are created. Please
correct me if wrong.*
However, Flink does not follow such a definition of windows semantically.
If I call the  getEnd() method of the TimeWindow() class, I get back
timestamp + inactivityGap.

For the above example, after processing the first element, I would get 1 +
2 = 3 seconds as the window "end".

The actual window end should be the timestamp 1, which is the last event in
the session window.

A solution would be to change the "end" definition of all windows, but I
suppose this would be breaking and would need some debate.

Therefore, I propose an intermediate solution : add a new API method that
keeps track of the last element added in the session window.

If there is agreement on this, I would like to start drafting a change
document and implement this.
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] FLINK-16039 Add API method to get last element in session window

dwysakowicz
Hi Manas,

First of all I think your understanding of how the session windows work
is correct.

I tend to slightly disagree that the end for a session window is wrong.
It is my personal opinion though. I see it this way that a TimeWindow in
case of a session window is the session itself. The session always ends
after a period of inactivity. Take a user session on a webpage. Such a
session does not end/isn't brought down at the time of a last event. It
is closed after a period of inactivity. In such scenario I think the
behavior of the session window is correct.

Moreover you can achieve what you are describing with an aggregate[1]
function. You can easily maintain the biggest number seen for a window.

Lastly, I think the overall feeling in the community is that we are very
skeptical towards extending the Windows API. From what I've heard and
experienced the ProcessFunction[2] is a much better principle to build
custom solutions upon, as in fact its easier to control and even
understand. That said I am rather against introducing that change.

Best,

Dawid

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/windows.html#aggregatefunction

[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/process_function.html#process-function-low-level-operations

On 13/03/2020 09:46, Manas Kale wrote:

> Hi all,
> I would like to start a discussion on this feature request (JIRA link).
> <https://issues.apache.org/jira/browse/FLINK-16039>
>
> Consider the events :
>
> [1, event], [2, event]
>
> where first element is event timestamp in seconds and second element is
> event code/name.
>
> Also consider that an Event time session window with inactivityGap = 2
> seconds is acting on above stream.
>
> When the first event arrives, a session window should be created that is
> [1,1].
>
> When the second event arrives, a new session window should be created that
> is [2,2]. Since this falls within firstWindowTimestamp+inactivityGap, it
> should be merged into session window [1,2] and  [2,2] should be deleted.
>
>
> *This is my understanding of how session windows are created. Please
> correct me if wrong.*
> However, Flink does not follow such a definition of windows semantically.
> If I call the  getEnd() method of the TimeWindow() class, I get back
> timestamp + inactivityGap.
>
> For the above example, after processing the first element, I would get 1 +
> 2 = 3 seconds as the window "end".
>
> The actual window end should be the timestamp 1, which is the last event in
> the session window.
>
> A solution would be to change the "end" definition of all windows, but I
> suppose this would be breaking and would need some debate.
>
> Therefore, I propose an intermediate solution : add a new API method that
> keeps track of the last element added in the session window.
>
> If there is agreement on this, I would like to start drafting a change
> document and implement this.
>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [Discuss] FLINK-16039 Add API method to get last element in session window

Manas Kale
Hi Dawid,
Thank you for the response, I see your point. I was perhaps thinking only
from the perspective of my use case where I think such a definition makes
sense and did not account for the general case.

Regards,
Manas


On Thu, Mar 26, 2020 at 8:40 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi Manas,
>
> First of all I think your understanding of how the session windows work
> is correct.
>
> I tend to slightly disagree that the end for a session window is wrong.
> It is my personal opinion though. I see it this way that a TimeWindow in
> case of a session window is the session itself. The session always ends
> after a period of inactivity. Take a user session on a webpage. Such a
> session does not end/isn't brought down at the time of a last event. It
> is closed after a period of inactivity. In such scenario I think the
> behavior of the session window is correct.
>
> Moreover you can achieve what you are describing with an aggregate[1]
> function. You can easily maintain the biggest number seen for a window.
>
> Lastly, I think the overall feeling in the community is that we are very
> skeptical towards extending the Windows API. From what I've heard and
> experienced the ProcessFunction[2] is a much better principle to build
> custom solutions upon, as in fact its easier to control and even
> understand. That said I am rather against introducing that change.
>
> Best,
>
> Dawid
>
> [1]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/windows.html#aggregatefunction
>
> [2]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/process_function.html#process-function-low-level-operations
>
> On 13/03/2020 09:46, Manas Kale wrote:
> > Hi all,
> > I would like to start a discussion on this feature request (JIRA link).
> > <https://issues.apache.org/jira/browse/FLINK-16039>
> >
> > Consider the events :
> >
> > [1, event], [2, event]
> >
> > where first element is event timestamp in seconds and second element is
> > event code/name.
> >
> > Also consider that an Event time session window with inactivityGap = 2
> > seconds is acting on above stream.
> >
> > When the first event arrives, a session window should be created that is
> > [1,1].
> >
> > When the second event arrives, a new session window should be created
> that
> > is [2,2]. Since this falls within firstWindowTimestamp+inactivityGap, it
> > should be merged into session window [1,2] and  [2,2] should be deleted.
> >
> >
> > *This is my understanding of how session windows are created. Please
> > correct me if wrong.*
> > However, Flink does not follow such a definition of windows semantically.
> > If I call the  getEnd() method of the TimeWindow() class, I get back
> > timestamp + inactivityGap.
> >
> > For the above example, after processing the first element, I would get 1
> +
> > 2 = 3 seconds as the window "end".
> >
> > The actual window end should be the timestamp 1, which is the last event
> in
> > the session window.
> >
> > A solution would be to change the "end" definition of all windows, but I
> > suppose this would be breaking and would need some debate.
> >
> > Therefore, I propose an intermediate solution : add a new API method that
> > keeps track of the last element added in the session window.
> >
> > If there is agreement on this, I would like to start drafting a change
> > document and implement this.
> >
>
>