[DISCUSS] Make Temporal Join syntax easier to use

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Make Temporal Join syntax easier to use

cshuo
Hi community,

Temporal table join in Flink is a powerful feature, especially processing
time temporal table join, which is very commonly used in production.
However, the syntax of temporal table join only supports 'FOR SYSTEM_TIME
AS OF' left table's time attribute field currently. e.g.,

```
SELECT *
FROM Orders AS o
JOIN Products FOR SYSTEM_TIME AS OF o.proctime AS p
ON o.productId = p.productId
```

That means the users have to explicitly define a the proctime field either
in the left table DDL or the left temporary view, which is inconvenient to
use.

>> Preferred solution
I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to
avoid conflict) to the table by default, just like the pseudo-column of
classic database, e.g., `ROWID` in Oracle. In this way, we can use Temporal
table join very conveniently.

>> Less elegant solution
Actually, we have another simpler solution, i.e., enrich the syntax for
temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. It is
also very convenient, However, the `PROCTIME()` in 'FOR SYSTEM_TIME AS OF
PROCTIME()' is ambiguous, because it cannot tell where the version time of
temporal table comes from, left table or right table? The former is what we
want. So I think this solution is not preferred.

Looking forward to your feedback~

Best,
Shuo
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Make Temporal Join syntax easier to use

Danny Chan-2
Thanks Shuo Cheng for driving this discussion ~

Generally i have the same feeling that our temporal table syntax is a
little verbose compared to KSQL or other RDBMS. (they just use the normal
join syntax for the temporal join).

>I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to
avoid conflict

I don't think add a pseudo-column is a good solution because of these
reasons:

- The normal pseudo-column or system column like _rowID_ has a
underneath storage, user can select the column from a table [1] , and each
row has a deterministic value bind to it for the pseudo-column (although it
may change when the row is deleted and inserted again), but the PROCTIME
for Flink behaves more like a row level runtime attribute which is
different for different queries and even different SQL contexts.

- The pseudo-column make the table schema more complex but they are only
useful when we want to use the time-attributes.

> Actually, we have another simpler solution, i.e., enrich the syntax for
temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'.

Maybe our biggest concern is that the syntax does not make it clear where
the who triggers the PROCTIME() computation, similar with
`current_timestamp`, from my understanding, the `PROCTIME()` is computed in
row level by the system, when a record from the LHS is used to join the RHS
table. So generally i'm +1 for 'FOR SYSTEM_TIME AS OF PROCTIME()'.

BTW, in the long term, we should think how to simplifies the `FOR
SYSTEM_TIME AS OF ` syntax, because the proc time temporal table join is
the most common case and we should make the temporal table join default to
be 'PROCTIME'. Ideally a normal `A JOIN B` can describe a PROCTIME temporal
table join. The solution to add pseudo-column seems deviate further and
further from this path.

[1]
https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm

Shuo Cheng <[hidden email]> 于2020年12月21日周一 上午10:16写道:

> Hi community,
>
> Temporal table join in Flink is a powerful feature, especially processing
> time temporal table join, which is very commonly used in production.
> However, the syntax of temporal table join only supports 'FOR SYSTEM_TIME
> AS OF' left table's time attribute field currently. e.g.,
>
> ```
> SELECT *
> FROM Orders AS o
> JOIN Products FOR SYSTEM_TIME AS OF o.proctime AS p
> ON o.productId = p.productId
> ```
>
> That means the users have to explicitly define a the proctime field either
> in the left table DDL or the left temporary view, which is inconvenient to
> use.
>
> >> Preferred solution
> I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to
> avoid conflict) to the table by default, just like the pseudo-column of
> classic database, e.g., `ROWID` in Oracle. In this way, we can use Temporal
> table join very conveniently.
>
> >> Less elegant solution
> Actually, we have another simpler solution, i.e., enrich the syntax for
> temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. It is
> also very convenient, However, the `PROCTIME()` in 'FOR SYSTEM_TIME AS OF
> PROCTIME()' is ambiguous, because it cannot tell where the version time of
> temporal table comes from, left table or right table? The former is what we
> want. So I think this solution is not preferred.
>
> Looking forward to your feedback~
>
> Best,
> Shuo
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Make Temporal Join syntax easier to use

Leonard Xu
Hi, Shuo Cheng

Thanks for bringing this topic, I think it’s a good idea to simplify current temporal join syntax.  
But I think the most valuable thing is to simplify the long keyword FOR SYSTEM_TIME AS OF  lying in FOR SYSTEM_TIME AS OF L.PROCTIME  syntax  rather than simplify the  L.PROCTIME reference to a function PROCTIME() or  pseudo-column _PROCTIME.

> I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to avoid conflict) to the table by default, just like the pseudo-column of classic database
-1 for pseudo-column, I didn’t see any benefit for importing a  pretty pseudo-column concept and the motivation only comes from a minor syntax simplification purpose.  

For PROCTIME(), I’ve same concern it’s not clear who keeps the function() ? LHS table or RHS table or event JOIN ? I think Danny’s explanation the PROCTIME() computation should happen in row level makes sense, but I think the computation should be triggered when the LHS table’s record correlates the version of RHS table. And for event time temporal join, the syntax  FOR SYSTEM_TIME AS OF l.rowtime also follows similar semantic that use the rowtime in LHS table’s record to correlates the version of RHS table.

Thus I tend to keep current temporal join syntax, I won’t say +1 or -1 for  'FOR SYSTEM_TIME AS OF PROCTIME()’ simplification.


Best,
Leonard

 

> I don't think add a pseudo-column is a good solution because of these
> reasons:
>
> - The normal pseudo-column or system column like _rowID_ has a
> underneath storage, user can select the column from a table [1] , and each
> row has a deterministic value bind to it for the pseudo-column (although it
> may change when the row is deleted and inserted again), but the PROCTIME
> for Flink behaves more like a row level runtime attribute which is
> different for different queries and even different SQL contexts.
>
> - The pseudo-column make the table schema more complex but they are only
> useful when we want to use the time-attributes.
>
>> Actually, we have another simpler solution, i.e., enrich the syntax for
> temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'.
>
> Maybe our biggest concern is that the syntax does not make it clear where
> the who triggers the PROCTIME() computation, similar with
> `current_timestamp`, from my understanding, the `PROCTIME()` is computed in
> row level by the system, when a record from the LHS is used to join the RHS
> table. So generally i'm +1 for 'FOR SYSTEM_TIME AS OF PROCTIME()'.
>
> BTW, in the long term, we should think how to simplifies the `FOR
> SYSTEM_TIME AS OF ` syntax, because the proc time temporal table join is
> the most common case and we should make the temporal table join default to
> be 'PROCTIME'. Ideally a normal `A JOIN B` can describe a PROCTIME temporal
> table join. The solution to add pseudo-column seems deviate further and
> further from this path.
>
> [1]
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm
>
> Shuo Cheng <[hidden email]> 于2020年12月21日周一 上午10:16写道:
>>  

>>>>  
>> I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to
>> avoid conflict) to the table by default, just like the pseudo-column of
>> classic database, e.g., `ROWID` in Oracle.




>>>> Less elegant solution
>> Actually, we have another simpler solution, i.e., enrich the syntax for
>> temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. It is
>> also very convenient, However, the `PROCTIME()` in 'FOR SYSTEM_TIME AS OF
>> PROCTIME()' is ambiguous, because it cannot tell where the version time of
>> temporal table comes from, left table or right table? The former is what we
>> want. So I think this solution is not preferred.
>>
>> Looking forward to your feedback~
>>
>> Best,
>> Shuo
>>