Hi community,
Temporal table join in Flink is a powerful feature, especially processing time temporal table join, which is very commonly used in production. However, the syntax of temporal table join only supports 'FOR SYSTEM_TIME AS OF' left table's time attribute field currently. e.g., ``` SELECT * FROM Orders AS o JOIN Products FOR SYSTEM_TIME AS OF o.proctime AS p ON o.productId = p.productId ``` That means the users have to explicitly define a the proctime field either in the left table DDL or the left temporary view, which is inconvenient to use. >> Preferred solution I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to avoid conflict) to the table by default, just like the pseudo-column of classic database, e.g., `ROWID` in Oracle. In this way, we can use Temporal table join very conveniently. >> Less elegant solution Actually, we have another simpler solution, i.e., enrich the syntax for temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. It is also very convenient, However, the `PROCTIME()` in 'FOR SYSTEM_TIME AS OF PROCTIME()' is ambiguous, because it cannot tell where the version time of temporal table comes from, left table or right table? The former is what we want. So I think this solution is not preferred. Looking forward to your feedback~ Best, Shuo |
Thanks Shuo Cheng for driving this discussion ~
Generally i have the same feeling that our temporal table syntax is a little verbose compared to KSQL or other RDBMS. (they just use the normal join syntax for the temporal join). >I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to avoid conflict I don't think add a pseudo-column is a good solution because of these reasons: - The normal pseudo-column or system column like _rowID_ has a underneath storage, user can select the column from a table [1] , and each row has a deterministic value bind to it for the pseudo-column (although it may change when the row is deleted and inserted again), but the PROCTIME for Flink behaves more like a row level runtime attribute which is different for different queries and even different SQL contexts. - The pseudo-column make the table schema more complex but they are only useful when we want to use the time-attributes. > Actually, we have another simpler solution, i.e., enrich the syntax for temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. Maybe our biggest concern is that the syntax does not make it clear where the who triggers the PROCTIME() computation, similar with `current_timestamp`, from my understanding, the `PROCTIME()` is computed in row level by the system, when a record from the LHS is used to join the RHS table. So generally i'm +1 for 'FOR SYSTEM_TIME AS OF PROCTIME()'. BTW, in the long term, we should think how to simplifies the `FOR SYSTEM_TIME AS OF ` syntax, because the proc time temporal table join is the most common case and we should make the temporal table join default to be 'PROCTIME'. Ideally a normal `A JOIN B` can describe a PROCTIME temporal table join. The solution to add pseudo-column seems deviate further and further from this path. [1] https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm Shuo Cheng <[hidden email]> 于2020年12月21日周一 上午10:16写道: > Hi community, > > Temporal table join in Flink is a powerful feature, especially processing > time temporal table join, which is very commonly used in production. > However, the syntax of temporal table join only supports 'FOR SYSTEM_TIME > AS OF' left table's time attribute field currently. e.g., > > ``` > SELECT * > FROM Orders AS o > JOIN Products FOR SYSTEM_TIME AS OF o.proctime AS p > ON o.productId = p.productId > ``` > > That means the users have to explicitly define a the proctime field either > in the left table DDL or the left temporary view, which is inconvenient to > use. > > >> Preferred solution > I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to > avoid conflict) to the table by default, just like the pseudo-column of > classic database, e.g., `ROWID` in Oracle. In this way, we can use Temporal > table join very conveniently. > > >> Less elegant solution > Actually, we have another simpler solution, i.e., enrich the syntax for > temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. It is > also very convenient, However, the `PROCTIME()` in 'FOR SYSTEM_TIME AS OF > PROCTIME()' is ambiguous, because it cannot tell where the version time of > temporal table comes from, left table or right table? The former is what we > want. So I think this solution is not preferred. > > Looking forward to your feedback~ > > Best, > Shuo > |
Hi, Shuo Cheng
Thanks for bringing this topic, I think it’s a good idea to simplify current temporal join syntax. But I think the most valuable thing is to simplify the long keyword FOR SYSTEM_TIME AS OF lying in FOR SYSTEM_TIME AS OF L.PROCTIME syntax rather than simplify the L.PROCTIME reference to a function PROCTIME() or pseudo-column _PROCTIME. > I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to avoid conflict) to the table by default, just like the pseudo-column of classic database -1 for pseudo-column, I didn’t see any benefit for importing a pretty pseudo-column concept and the motivation only comes from a minor syntax simplification purpose. For PROCTIME(), I’ve same concern it’s not clear who keeps the function() ? LHS table or RHS table or event JOIN ? I think Danny’s explanation the PROCTIME() computation should happen in row level makes sense, but I think the computation should be triggered when the LHS table’s record correlates the version of RHS table. And for event time temporal join, the syntax FOR SYSTEM_TIME AS OF l.rowtime also follows similar semantic that use the rowtime in LHS table’s record to correlates the version of RHS table. Thus I tend to keep current temporal join syntax, I won’t say +1 or -1 for 'FOR SYSTEM_TIME AS OF PROCTIME()’ simplification. Best, Leonard > I don't think add a pseudo-column is a good solution because of these > reasons: > > - The normal pseudo-column or system column like _rowID_ has a > underneath storage, user can select the column from a table [1] , and each > row has a deterministic value bind to it for the pseudo-column (although it > may change when the row is deleted and inserted again), but the PROCTIME > for Flink behaves more like a row level runtime attribute which is > different for different queries and even different SQL contexts. > > - The pseudo-column make the table schema more complex but they are only > useful when we want to use the time-attributes. > >> Actually, we have another simpler solution, i.e., enrich the syntax for > temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. > > Maybe our biggest concern is that the syntax does not make it clear where > the who triggers the PROCTIME() computation, similar with > `current_timestamp`, from my understanding, the `PROCTIME()` is computed in > row level by the system, when a record from the LHS is used to join the RHS > table. So generally i'm +1 for 'FOR SYSTEM_TIME AS OF PROCTIME()'. > > BTW, in the long term, we should think how to simplifies the `FOR > SYSTEM_TIME AS OF ` syntax, because the proc time temporal table join is > the most common case and we should make the temporal table join default to > be 'PROCTIME'. Ideally a normal `A JOIN B` can describe a PROCTIME temporal > table join. The solution to add pseudo-column seems deviate further and > further from this path. > > [1] > https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm > > Shuo Cheng <[hidden email]> 于2020年12月21日周一 上午10:16写道: >> >>>> >> I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to >> avoid conflict) to the table by default, just like the pseudo-column of >> classic database, e.g., `ROWID` in Oracle. >>>> Less elegant solution >> Actually, we have another simpler solution, i.e., enrich the syntax for >> temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. It is >> also very convenient, However, the `PROCTIME()` in 'FOR SYSTEM_TIME AS OF >> PROCTIME()' is ambiguous, because it cannot tell where the version time of >> temporal table comes from, left table or right table? The former is what we >> want. So I think this solution is not preferred. >> >> Looking forward to your feedback~ >> >> Best, >> Shuo >> |
Free forum by Nabble | Edit this page |