[DISCUSS] Correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL

Jark Wu-2
Hi everyone,

Currently, in the Table API & SQL documentation[1], we call the joins with
time conditions as "Time-windowed Join". However, the same feature is
called "Interval Join" in DataStream[2]. We should align the terminology in
Flink project.

From my point of view, "Interval Join" is more suitable, because it joins a
time interval range of right stream. And "Windowed Join" should be joining
data in the same window, this is also described in DataStream API[3].

For Table API & SQL, the "Time-windowed Join" is the "Interval Join" in
DataStream. And the "Windowed Join" feature is missed in Table API & SQL.

I would propose to correct the terminology in docs before 1.10 is release.

What do you think?

Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins
[2]:
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/joining.html#interval-join
[3]:
https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/joining.html#window-join
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL

Jingsong Li
Thanks Jark for bringing this.

+1 to use a unify name: "Interval Join" before 1.10 is release.

I think maybe "Interval Join" was come from SQL world too in [1].
Another candidate is to use "Range Join", But considering DataStream, I am
OK with "Interval".

[1] https://issues.apache.org/jira/browse/FLINK-8478

Best,
Jingsong Lee

On Mon, Dec 23, 2019 at 11:42 AM Jark Wu <[hidden email]> wrote:

> Hi everyone,
>
> Currently, in the Table API & SQL documentation[1], we call the joins with
> time conditions as "Time-windowed Join". However, the same feature is
> called "Interval Join" in DataStream[2]. We should align the terminology in
> Flink project.
>
> From my point of view, "Interval Join" is more suitable, because it joins a
> time interval range of right stream. And "Windowed Join" should be joining
> data in the same window, this is also described in DataStream API[3].
>
> For Table API & SQL, the "Time-windowed Join" is the "Interval Join" in
> DataStream. And the "Windowed Join" feature is missed in Table API & SQL.
>
> I would propose to correct the terminology in docs before 1.10 is release.
>
> What do you think?
>
> Best,
> Jark
>
> [1]:
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins
> [2]:
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/joining.html#interval-join
> [3]:
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/joining.html#window-join
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL

Danny Chan
In reply to this post by Jark Wu-2
Thanks Jark for bringing up this discussion, just look at the api definitions,
it seems that Flink DatasStream interval join and Table/SQL Time-windowed Join are
not equivalent for the join conditions:

The Interval Join only supports event time columnbs comparison of the
joined streams[1]; while the Time-windowed Join allows any type columns to be within the
equi-join part with required additional join condition that bounds the time on both sides.

So from the limitations of the implementations, it seems that DatasStream interval join is purely
joingng streams with times whose functionality is subset of Time-windowed Join which boulds two streams with times first then apply more complex join condition predicates.

I'm also inclined we should unify the terminology, but just curious, why not choose Time-windowed Join because it's
functionality is more "complete" ?

[1] https://github.com/apache/flink/blob/cce1cef50d993aba5060ea5ac597174525ae895e/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/KeyedStream.java#L449

Best,
Danny Chan
在 2019年12月23日 +0800 AM11:42,Jark Wu <[hidden email]>,写道:

> Hi everyone,
>
> Currently, in the Table API & SQL documentation[1], we call the joins with
> time conditions as "Time-windowed Join". However, the same feature is
> called "Interval Join" in DataStream[2]. We should align the terminology in
> Flink project.
>
> From my point of view, "Interval Join" is more suitable, because it joins a
> time interval range of right stream. And "Windowed Join" should be joining
> data in the same window, this is also described in DataStream API[3].
>
> For Table API & SQL, the "Time-windowed Join" is the "Interval Join" in
> DataStream. And the "Windowed Join" feature is missed in Table API & SQL.
>
> I would propose to correct the terminology in docs before 1.10 is release.
>
> What do you think?
>
> Best,
> Jark
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/tableApi.html#joins
> [2]:
> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/joining.html#interval-join
> [3]:
> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/joining.html#window-join
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL

Jingsong Li
Hi Danny,

> DatasStream interval join and Table/SQL Time-windowed Join are
not equivalent

In my opinion, there is no difference between table and DataStream except
that outer join is not implemented in DataStream.
KeyedStream has defined equivalent conditions.
Other conditions can be completed in the subsequent IntervalJoined.process.
And the interval join of DataStream is implemented according to the feature
of SQL.[1] You can see the references in the description.

> why not choose Time-windowed Join

As Jark said, there is a "Window Join" in DataStream, we can support it in
table too in future. It is very easy to misunderstand with "Time-windowed
Join".
So, in my opinion, "Interval join" or "Range join" are the "complete" word
to describe this kind of join.  But better not "Time-windowed Join".

[1] https://issues.apache.org/jira/browse/FLINK-8478

Best,
Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL

Jark Wu-2
I agree with Jingsong, we are discussing to align the "concepts", not align
the "implementations".

For the "concepts", the "Time-windowed Join" in SQL and "Interval Join" in
DataStream are the same thing.

Best,
Jark

On Mon, 23 Dec 2019 at 15:16, Jingsong Li <[hidden email]> wrote:

> Hi Danny,
>
> > DatasStream interval join and Table/SQL Time-windowed Join are
> not equivalent
>
> In my opinion, there is no difference between table and DataStream except
> that outer join is not implemented in DataStream.
> KeyedStream has defined equivalent conditions.
> Other conditions can be completed in the subsequent IntervalJoined.process.
> And the interval join of DataStream is implemented according to the feature
> of SQL.[1] You can see the references in the description.
>
> > why not choose Time-windowed Join
>
> As Jark said, there is a "Window Join" in DataStream, we can support it in
> table too in future. It is very easy to misunderstand with "Time-windowed
> Join".
> So, in my opinion, "Interval join" or "Range join" are the "complete" word
> to describe this kind of join.  But better not "Time-windowed Join".
>
> [1] https://issues.apache.org/jira/browse/FLINK-8478
>
> Best,
> Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Correct the terminology of "Time-windowed Join" to "Interval Join" in Table API & SQL

Jark Wu-2
Thanks all for the feedbacks. I will start a VOTE soon.

Best,
Jark

On Mon, 23 Dec 2019 at 15:45, Jark Wu <[hidden email]> wrote:

> I agree with Jingsong, we are discussing to align the "concepts", not
> align the "implementations".
>
> For the "concepts", the "Time-windowed Join" in SQL and "Interval Join" in
> DataStream are the same thing.
>
> Best,
> Jark
>
> On Mon, 23 Dec 2019 at 15:16, Jingsong Li <[hidden email]> wrote:
>
>> Hi Danny,
>>
>> > DatasStream interval join and Table/SQL Time-windowed Join are
>> not equivalent
>>
>> In my opinion, there is no difference between table and DataStream except
>> that outer join is not implemented in DataStream.
>> KeyedStream has defined equivalent conditions.
>> Other conditions can be completed in the subsequent
>> IntervalJoined.process.
>> And the interval join of DataStream is implemented according to the
>> feature
>> of SQL.[1] You can see the references in the description.
>>
>> > why not choose Time-windowed Join
>>
>> As Jark said, there is a "Window Join" in DataStream, we can support it in
>> table too in future. It is very easy to misunderstand with "Time-windowed
>> Join".
>> So, in my opinion, "Interval join" or "Range join" are the "complete" word
>> to describe this kind of join.  But better not "Time-windowed Join".
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-8478
>>
>> Best,
>> Jingsong Lee
>>
>