Hi everyone,
I would like to start discussion about how to support time attribute in SQL DDL. In Flink 1.9, we already introduced a basic SQL DDL to create a table. However, it doesn't support to define time attributes. This makes users can't apply window operations on the tables created by DDL which is a bad experience. In FLIP-66, we propose a syntax for watermark to define rowtime attribute and propose to use computed column syntax to define proctime attribute. But computed column is another big topic and should deserve a separate FLIP. If we have a consensus on the computed column approach, we will start computed column FLIP soon. FLIP-66: https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# Thanks for any feedback! Best, Jark |
Thanks Jark for bring up this topic, this is definitely an import feature for the SQL, especially the DDL users.
I would spend some time to review this design doc, really thanks. Best, Danny Chan 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > Hi everyone, > > I would like to start discussion about how to support time attribute in SQL > DDL. > In Flink 1.9, we already introduced a basic SQL DDL to create a table. > However, it doesn't support to define time attributes. This makes users > can't > apply window operations on the tables created by DDL which is a bad > experience. > > In FLIP-66, we propose a syntax for watermark to define rowtime attribute > and propose to use computed column syntax to define proctime attribute. > But computed column is another big topic and should deserve a separate > FLIP. > If we have a consensus on the computed column approach, we will start > computed column FLIP soon. > > FLIP-66: > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > Thanks for any feedback! > > Best, > Jark |
Thanks Jark for this topic, This will be very useful.
Best, ForwardXu Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > Thanks Jark for bring up this topic, this is definitely an import feature > for the SQL, especially the DDL users. > > I would spend some time to review this design doc, really thanks. > > Best, > Danny Chan > 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > > Hi everyone, > > > > I would like to start discussion about how to support time attribute in > SQL > > DDL. > > In Flink 1.9, we already introduced a basic SQL DDL to create a table. > > However, it doesn't support to define time attributes. This makes users > > can't > > apply window operations on the tables created by DDL which is a bad > > experience. > > > > In FLIP-66, we propose a syntax for watermark to define rowtime attribute > > and propose to use computed column syntax to define proctime attribute. > > But computed column is another big topic and should deserve a separate > > FLIP. > > If we have a consensus on the computed column approach, we will start > > computed column FLIP soon. > > > > FLIP-66: > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > > > Thanks for any feedback! > > > > Best, > > Jark > |
Hi Jark,
Thanks for bringing up this discussion and the detailed design doc. This is definitely a critical feature for streaming SQL jobs. I have left a few comments in the design doc. Thanks, Dian > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> 写道: > > Thanks Jark for this topic, This will be very useful. > > > Best, > > ForwardXu > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > >> Thanks Jark for bring up this topic, this is definitely an import feature >> for the SQL, especially the DDL users. >> >> I would spend some time to review this design doc, really thanks. >> >> Best, >> Danny Chan >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: >>> Hi everyone, >>> >>> I would like to start discussion about how to support time attribute in >> SQL >>> DDL. >>> In Flink 1.9, we already introduced a basic SQL DDL to create a table. >>> However, it doesn't support to define time attributes. This makes users >>> can't >>> apply window operations on the tables created by DDL which is a bad >>> experience. >>> >>> In FLIP-66, we propose a syntax for watermark to define rowtime attribute >>> and propose to use computed column syntax to define proctime attribute. >>> But computed column is another big topic and should deserve a separate >>> FLIP. >>> If we have a consensus on the computed column approach, we will start >>> computed column FLIP soon. >>> >>> FLIP-66: >>> >> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# >>> >>> Thanks for any feedback! >>> >>> Best, >>> Jark >> |
Hi all,
Thanks all for so much feedbacks received in the doc so far. I saw a general agreement on using computed column to support proctime attribute and extract timestamps. So we will prepare a computed column FLIP and share in the dev ML soon. Feel free to leave more comments! Best, Jark On Fri, 6 Sep 2019 at 13:50, Dian Fu <[hidden email]> wrote: > Hi Jark, > > Thanks for bringing up this discussion and the detailed design doc. This > is definitely a critical feature for streaming SQL jobs. I have left a few > comments in the design doc. > > Thanks, > Dian > > > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> 写道: > > > > Thanks Jark for this topic, This will be very useful. > > > > > > Best, > > > > ForwardXu > > > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > > > >> Thanks Jark for bring up this topic, this is definitely an import > feature > >> for the SQL, especially the DDL users. > >> > >> I would spend some time to review this design doc, really thanks. > >> > >> Best, > >> Danny Chan > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > >>> Hi everyone, > >>> > >>> I would like to start discussion about how to support time attribute in > >> SQL > >>> DDL. > >>> In Flink 1.9, we already introduced a basic SQL DDL to create a table. > >>> However, it doesn't support to define time attributes. This makes users > >>> can't > >>> apply window operations on the tables created by DDL which is a bad > >>> experience. > >>> > >>> In FLIP-66, we propose a syntax for watermark to define rowtime > attribute > >>> and propose to use computed column syntax to define proctime attribute. > >>> But computed column is another big topic and should deserve a separate > >>> FLIP. > >>> If we have a consensus on the computed column approach, we will start > >>> computed column FLIP soon. > >>> > >>> FLIP-66: > >>> > >> > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > >>> > >>> Thanks for any feedback! > >>> > >>> Best, > >>> Jark > >> > > |
After some review and discussion in the google document, I think it's time
to convert this design to a cwiki flip page and start voting process. Best, Kurt On Mon, Sep 9, 2019 at 7:46 PM Jark Wu <[hidden email]> wrote: > Hi all, > > Thanks all for so much feedbacks received in the doc so far. > I saw a general agreement on using computed column to support proctime > attribute and extract timestamps. > So we will prepare a computed column FLIP and share in the dev ML soon. > > Feel free to leave more comments! > > Best, > Jark > > > > On Fri, 6 Sep 2019 at 13:50, Dian Fu <[hidden email]> wrote: > > > Hi Jark, > > > > Thanks for bringing up this discussion and the detailed design doc. This > > is definitely a critical feature for streaming SQL jobs. I have left a > few > > comments in the design doc. > > > > Thanks, > > Dian > > > > > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> 写道: > > > > > > Thanks Jark for this topic, This will be very useful. > > > > > > > > > Best, > > > > > > ForwardXu > > > > > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > > > > > >> Thanks Jark for bring up this topic, this is definitely an import > > feature > > >> for the SQL, especially the DDL users. > > >> > > >> I would spend some time to review this design doc, really thanks. > > >> > > >> Best, > > >> Danny Chan > > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > > >>> Hi everyone, > > >>> > > >>> I would like to start discussion about how to support time attribute > in > > >> SQL > > >>> DDL. > > >>> In Flink 1.9, we already introduced a basic SQL DDL to create a > table. > > >>> However, it doesn't support to define time attributes. This makes > users > > >>> can't > > >>> apply window operations on the tables created by DDL which is a bad > > >>> experience. > > >>> > > >>> In FLIP-66, we propose a syntax for watermark to define rowtime > > attribute > > >>> and propose to use computed column syntax to define proctime > attribute. > > >>> But computed column is another big topic and should deserve a > separate > > >>> FLIP. > > >>> If we have a consensus on the computed column approach, we will start > > >>> computed column FLIP soon. > > >>> > > >>> FLIP-66: > > >>> > > >> > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > >>> > > >>> Thanks for any feedback! > > >>> > > >>> Best, > > >>> Jark > > >> > > > > > |
Hi everyone,
Thanks all for joining the discussion in the doc[1]. It seems that the discussion is converged and there is a consensus on the current FLIP document. If there is no objection, I would like to convert it into cwiki FLIP page and start voting process. For more details, please refer to the design doc (it is slightly changed since the initial proposal). Thanks, Jark [1]: https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd On Mon, 16 Sep 2019 at 16:12, Kurt Young <[hidden email]> wrote: > After some review and discussion in the google document, I think it's time > to > convert this design to a cwiki flip page and start voting process. > > Best, > Kurt > > > On Mon, Sep 9, 2019 at 7:46 PM Jark Wu <[hidden email]> wrote: > > > Hi all, > > > > Thanks all for so much feedbacks received in the doc so far. > > I saw a general agreement on using computed column to support proctime > > attribute and extract timestamps. > > So we will prepare a computed column FLIP and share in the dev ML soon. > > > > Feel free to leave more comments! > > > > Best, > > Jark > > > > > > > > On Fri, 6 Sep 2019 at 13:50, Dian Fu <[hidden email]> wrote: > > > > > Hi Jark, > > > > > > Thanks for bringing up this discussion and the detailed design doc. > This > > > is definitely a critical feature for streaming SQL jobs. I have left a > > few > > > comments in the design doc. > > > > > > Thanks, > > > Dian > > > > > > > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> 写道: > > > > > > > > Thanks Jark for this topic, This will be very useful. > > > > > > > > > > > > Best, > > > > > > > > ForwardXu > > > > > > > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > > > > > > > >> Thanks Jark for bring up this topic, this is definitely an import > > > feature > > > >> for the SQL, especially the DDL users. > > > >> > > > >> I would spend some time to review this design doc, really thanks. > > > >> > > > >> Best, > > > >> Danny Chan > > > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > > > >>> Hi everyone, > > > >>> > > > >>> I would like to start discussion about how to support time > attribute > > in > > > >> SQL > > > >>> DDL. > > > >>> In Flink 1.9, we already introduced a basic SQL DDL to create a > > table. > > > >>> However, it doesn't support to define time attributes. This makes > > users > > > >>> can't > > > >>> apply window operations on the tables created by DDL which is a bad > > > >>> experience. > > > >>> > > > >>> In FLIP-66, we propose a syntax for watermark to define rowtime > > > attribute > > > >>> and propose to use computed column syntax to define proctime > > attribute. > > > >>> But computed column is another big topic and should deserve a > > separate > > > >>> FLIP. > > > >>> If we have a consensus on the computed column approach, we will > start > > > >>> computed column FLIP soon. > > > >>> > > > >>> FLIP-66: > > > >>> > > > >> > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > > >>> > > > >>> Thanks for any feedback! > > > >>> > > > >>> Best, > > > >>> Jark > > > >> > > > > > > > > > |
+1 to start vote process.
Best, Kurt On Thu, Sep 19, 2019 at 10:54 AM Jark Wu <[hidden email]> wrote: > Hi everyone, > > Thanks all for joining the discussion in the doc[1]. > It seems that the discussion is converged and there is a consensus on the > current FLIP document. > If there is no objection, I would like to convert it into cwiki FLIP page > and start voting process. > > For more details, please refer to the design doc (it is slightly changed > since the initial proposal). > > Thanks, > Jark > > [1]: > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd > > On Mon, 16 Sep 2019 at 16:12, Kurt Young <[hidden email]> wrote: > > > After some review and discussion in the google document, I think it's > time > > to > > convert this design to a cwiki flip page and start voting process. > > > > Best, > > Kurt > > > > > > On Mon, Sep 9, 2019 at 7:46 PM Jark Wu <[hidden email]> wrote: > > > > > Hi all, > > > > > > Thanks all for so much feedbacks received in the doc so far. > > > I saw a general agreement on using computed column to support proctime > > > attribute and extract timestamps. > > > So we will prepare a computed column FLIP and share in the dev ML soon. > > > > > > Feel free to leave more comments! > > > > > > Best, > > > Jark > > > > > > > > > > > > On Fri, 6 Sep 2019 at 13:50, Dian Fu <[hidden email]> wrote: > > > > > > > Hi Jark, > > > > > > > > Thanks for bringing up this discussion and the detailed design doc. > > This > > > > is definitely a critical feature for streaming SQL jobs. I have left > a > > > few > > > > comments in the design doc. > > > > > > > > Thanks, > > > > Dian > > > > > > > > > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> 写道: > > > > > > > > > > Thanks Jark for this topic, This will be very useful. > > > > > > > > > > > > > > > Best, > > > > > > > > > > ForwardXu > > > > > > > > > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > > > > > > > > > >> Thanks Jark for bring up this topic, this is definitely an import > > > > feature > > > > >> for the SQL, especially the DDL users. > > > > >> > > > > >> I would spend some time to review this design doc, really thanks. > > > > >> > > > > >> Best, > > > > >> Danny Chan > > > > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > > > > >>> Hi everyone, > > > > >>> > > > > >>> I would like to start discussion about how to support time > > attribute > > > in > > > > >> SQL > > > > >>> DDL. > > > > >>> In Flink 1.9, we already introduced a basic SQL DDL to create a > > > table. > > > > >>> However, it doesn't support to define time attributes. This makes > > > users > > > > >>> can't > > > > >>> apply window operations on the tables created by DDL which is a > bad > > > > >>> experience. > > > > >>> > > > > >>> In FLIP-66, we propose a syntax for watermark to define rowtime > > > > attribute > > > > >>> and propose to use computed column syntax to define proctime > > > attribute. > > > > >>> But computed column is another big topic and should deserve a > > > separate > > > > >>> FLIP. > > > > >>> If we have a consensus on the computed column approach, we will > > start > > > > >>> computed column FLIP soon. > > > > >>> > > > > >>> FLIP-66: > > > > >>> > > > > >> > > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > > > >>> > > > > >>> Thanks for any feedback! > > > > >>> > > > > >>> Best, > > > > >>> Jark > > > > >> > > > > > > > > > > > > > > |
Hi everyone,
Thanks all for the valuable suggestions and feedbacks so far. Before starting the vote, I would like to summarize the proposed DDL syntax in the mailing list. ## Rowtime Attribute (Watermark Syntax) CREATE TABLE table_name ( WATERMARK FOR <columnName> AS <watermark_strategy_expression> ) WITH ( ... ) It marks an existing field <columnName> as the rowtime attribute, and the watermark is generated by the expression <watermark_strategy_expression>. <watermark_strategy_expression> can be arbitrary expression which returns a nullable BIGINT or TIMESTAMP as the watermark value. For common cases, users can use the following expressions to define a strategy. 1. Bounded Out of Orderness, the strategy can be "rowtimeField - INTERVAL 'string' timeUnit". 2. Preserve Watermark From Source, the strategy can be "SYSTEM_WATERMARK()". ## Proctime Attribute CREATE TABLE table_name ( ... proc AS SYSTEM_PROCTIME() ) WITH ( ... ) It uses the computed column syntax to add an additional column with proctime attribute. Here SYSTEM_PROCTIME() is a built-in function. For more details and the implementations, please refer to the design doc: https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d822dba Feel free to leave your further feedbacks! Thanks, Jark On Thu, 19 Sep 2019 at 11:23, Kurt Young <[hidden email]> wrote: > +1 to start vote process. > > Best, > Kurt > > > On Thu, Sep 19, 2019 at 10:54 AM Jark Wu <[hidden email]> wrote: > > > Hi everyone, > > > > Thanks all for joining the discussion in the doc[1]. > > It seems that the discussion is converged and there is a consensus on the > > current FLIP document. > > If there is no objection, I would like to convert it into cwiki FLIP page > > and start voting process. > > > > For more details, please refer to the design doc (it is slightly changed > > since the initial proposal). > > > > Thanks, > > Jark > > > > [1]: > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd > > > > On Mon, 16 Sep 2019 at 16:12, Kurt Young <[hidden email]> wrote: > > > > > After some review and discussion in the google document, I think it's > > time > > > to > > > convert this design to a cwiki flip page and start voting process. > > > > > > Best, > > > Kurt > > > > > > > > > On Mon, Sep 9, 2019 at 7:46 PM Jark Wu <[hidden email]> wrote: > > > > > > > Hi all, > > > > > > > > Thanks all for so much feedbacks received in the doc so far. > > > > I saw a general agreement on using computed column to support > proctime > > > > attribute and extract timestamps. > > > > So we will prepare a computed column FLIP and share in the dev ML > soon. > > > > > > > > Feel free to leave more comments! > > > > > > > > Best, > > > > Jark > > > > > > > > > > > > > > > > On Fri, 6 Sep 2019 at 13:50, Dian Fu <[hidden email]> wrote: > > > > > > > > > Hi Jark, > > > > > > > > > > Thanks for bringing up this discussion and the detailed design doc. > > > This > > > > > is definitely a critical feature for streaming SQL jobs. I have > left > > a > > > > few > > > > > comments in the design doc. > > > > > > > > > > Thanks, > > > > > Dian > > > > > > > > > > > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> 写道: > > > > > > > > > > > > Thanks Jark for this topic, This will be very useful. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > ForwardXu > > > > > > > > > > > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > > > > > > > > > > > >> Thanks Jark for bring up this topic, this is definitely an > import > > > > > feature > > > > > >> for the SQL, especially the DDL users. > > > > > >> > > > > > >> I would spend some time to review this design doc, really > thanks. > > > > > >> > > > > > >> Best, > > > > > >> Danny Chan > > > > > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > > > > > >>> Hi everyone, > > > > > >>> > > > > > >>> I would like to start discussion about how to support time > > > attribute > > > > in > > > > > >> SQL > > > > > >>> DDL. > > > > > >>> In Flink 1.9, we already introduced a basic SQL DDL to create a > > > > table. > > > > > >>> However, it doesn't support to define time attributes. This > makes > > > > users > > > > > >>> can't > > > > > >>> apply window operations on the tables created by DDL which is a > > bad > > > > > >>> experience. > > > > > >>> > > > > > >>> In FLIP-66, we propose a syntax for watermark to define rowtime > > > > > attribute > > > > > >>> and propose to use computed column syntax to define proctime > > > > attribute. > > > > > >>> But computed column is another big topic and should deserve a > > > > separate > > > > > >>> FLIP. > > > > > >>> If we have a consensus on the computed column approach, we will > > > start > > > > > >>> computed column FLIP soon. > > > > > >>> > > > > > >>> FLIP-66: > > > > > >>> > > > > > >> > > > > > > > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > > > > >>> > > > > > >>> Thanks for any feedback! > > > > > >>> > > > > > >>> Best, > > > > > >>> Jark > > > > > >> > > > > > > > > > > > > > > > > > > > > |
Hi Jark,
Thanks for the summary! I like the proposal! It makes it very clear that an event time attribute is an existing column on which watermark metadata is defined whereas a processing time attribute is a computed field. I have one comment regarding the section on "Complex Watermark Strategies". The proposal says that you can also use a scalar function. I don't think that a "text book" scalar function would be sufficient for more advanced strategies. For example a histogram-based approach would need to remember the values of the last x records. The interface of a scalar function would still work for that, but it would be a stateful function (which would not be OK for a scalar function). I don't think it's a problem, but wanted to mention it here. Best, Fabian Am Do., 19. Sept. 2019 um 18:05 Uhr schrieb Jark Wu <[hidden email]>: > Hi everyone, > > Thanks all for the valuable suggestions and feedbacks so far. > Before starting the vote, I would like to summarize the proposed DDL syntax > in the mailing list. > > ## Rowtime Attribute (Watermark Syntax) > > CREATE TABLE table_name ( > WATERMARK FOR <columnName> AS <watermark_strategy_expression> > ) WITH ( > ... > ) > > It marks an existing field <columnName> as the rowtime attribute, and the > watermark is generated by the expression <watermark_strategy_expression>. > <watermark_strategy_expression> can be arbitrary expression which returns a > nullable BIGINT or TIMESTAMP as the watermark value. > > For common cases, users can use the following expressions to define a > strategy. > 1. Bounded Out of Orderness, the strategy can be "rowtimeField - INTERVAL > 'string' timeUnit". > 2. Preserve Watermark From Source, the strategy can be > "SYSTEM_WATERMARK()". > > ## Proctime Attribute > > CREATE TABLE table_name ( > ... > proc AS SYSTEM_PROCTIME() > ) WITH ( > ... > ) > > It uses the computed column syntax to add an additional column with > proctime attribute. Here SYSTEM_PROCTIME() is a built-in function. > > For more details and the implementations, please refer to the design doc: > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d822dba > > Feel free to leave your further feedbacks! > > Thanks, > Jark > > On Thu, 19 Sep 2019 at 11:23, Kurt Young <[hidden email]> wrote: > > > +1 to start vote process. > > > > Best, > > Kurt > > > > > > On Thu, Sep 19, 2019 at 10:54 AM Jark Wu <[hidden email]> wrote: > > > > > Hi everyone, > > > > > > Thanks all for joining the discussion in the doc[1]. > > > It seems that the discussion is converged and there is a consensus on > the > > > current FLIP document. > > > If there is no objection, I would like to convert it into cwiki FLIP > page > > > and start voting process. > > > > > > For more details, please refer to the design doc (it is slightly > changed > > > since the initial proposal). > > > > > > Thanks, > > > Jark > > > > > > [1]: > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd > > > > > > On Mon, 16 Sep 2019 at 16:12, Kurt Young <[hidden email]> wrote: > > > > > > > After some review and discussion in the google document, I think it's > > > time > > > > to > > > > convert this design to a cwiki flip page and start voting process. > > > > > > > > Best, > > > > Kurt > > > > > > > > > > > > On Mon, Sep 9, 2019 at 7:46 PM Jark Wu <[hidden email]> wrote: > > > > > > > > > Hi all, > > > > > > > > > > Thanks all for so much feedbacks received in the doc so far. > > > > > I saw a general agreement on using computed column to support > > proctime > > > > > attribute and extract timestamps. > > > > > So we will prepare a computed column FLIP and share in the dev ML > > soon. > > > > > > > > > > Feel free to leave more comments! > > > > > > > > > > Best, > > > > > Jark > > > > > > > > > > > > > > > > > > > > On Fri, 6 Sep 2019 at 13:50, Dian Fu <[hidden email]> > wrote: > > > > > > > > > > > Hi Jark, > > > > > > > > > > > > Thanks for bringing up this discussion and the detailed design > doc. > > > > This > > > > > > is definitely a critical feature for streaming SQL jobs. I have > > left > > > a > > > > > few > > > > > > comments in the design doc. > > > > > > > > > > > > Thanks, > > > > > > Dian > > > > > > > > > > > > > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> 写道: > > > > > > > > > > > > > > Thanks Jark for this topic, This will be very useful. > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > ForwardXu > > > > > > > > > > > > > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > > > > > > > > > > > > > >> Thanks Jark for bring up this topic, this is definitely an > > import > > > > > > feature > > > > > > >> for the SQL, especially the DDL users. > > > > > > >> > > > > > > >> I would spend some time to review this design doc, really > > thanks. > > > > > > >> > > > > > > >> Best, > > > > > > >> Danny Chan > > > > > > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > > > > > > >>> Hi everyone, > > > > > > >>> > > > > > > >>> I would like to start discussion about how to support time > > > > attribute > > > > > in > > > > > > >> SQL > > > > > > >>> DDL. > > > > > > >>> In Flink 1.9, we already introduced a basic SQL DDL to > create a > > > > > table. > > > > > > >>> However, it doesn't support to define time attributes. This > > makes > > > > > users > > > > > > >>> can't > > > > > > >>> apply window operations on the tables created by DDL which > is a > > > bad > > > > > > >>> experience. > > > > > > >>> > > > > > > >>> In FLIP-66, we propose a syntax for watermark to define > rowtime > > > > > > attribute > > > > > > >>> and propose to use computed column syntax to define proctime > > > > > attribute. > > > > > > >>> But computed column is another big topic and should deserve a > > > > > separate > > > > > > >>> FLIP. > > > > > > >>> If we have a consensus on the computed column approach, we > will > > > > start > > > > > > >>> computed column FLIP soon. > > > > > > >>> > > > > > > >>> FLIP-66: > > > > > > >>> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > > > > > >>> > > > > > > >>> Thanks for any feedback! > > > > > > >>> > > > > > > >>> Best, > > > > > > >>> Jark > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Hi,
Thanks Fabian for your reply. I agree with your point that the histogram-based case need the function to be stateful which is not supported currently and in this design. Maybe we can support stateful scalar function like TableAggregateFunction. We can further discuss how to support this in the future. I added this limitation in the "Complex Watermark Strategies" section. Btw, I also updated how to automatically apply the watermark assigner by the planner at the end of "Implementation" section [1]. This can avoid every TableSource extending DefinedProctimeAttribute to carry time attribute information. If there is no objection, I would like to update the cwiki FLIP page and start a new voting process in the next days. Best, Jark [1]: https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#heading=h.qx7j56dotywd On Fri, 20 Sep 2019 at 22:18, Fabian Hueske <[hidden email]> wrote: > Hi Jark, > > Thanks for the summary! > I like the proposal! > > It makes it very clear that an event time attribute is an existing column > on which watermark metadata is defined whereas a processing time attribute > is a computed field. > > I have one comment regarding the section on "Complex Watermark Strategies". > The proposal says that you can also use a scalar function. > I don't think that a "text book" scalar function would be sufficient for > more advanced strategies. > For example a histogram-based approach would need to remember the values of > the last x records. > The interface of a scalar function would still work for that, but it would > be a stateful function (which would not be OK for a scalar function). > I don't think it's a problem, but wanted to mention it here. > > Best, Fabian > > Am Do., 19. Sept. 2019 um 18:05 Uhr schrieb Jark Wu <[hidden email]>: > > > Hi everyone, > > > > Thanks all for the valuable suggestions and feedbacks so far. > > Before starting the vote, I would like to summarize the proposed DDL > syntax > > in the mailing list. > > > > ## Rowtime Attribute (Watermark Syntax) > > > > CREATE TABLE table_name ( > > WATERMARK FOR <columnName> AS <watermark_strategy_expression> > > ) WITH ( > > ... > > ) > > > > It marks an existing field <columnName> as the rowtime attribute, and the > > watermark is generated by the expression <watermark_strategy_expression>. > > <watermark_strategy_expression> can be arbitrary expression which > returns a > > nullable BIGINT or TIMESTAMP as the watermark value. > > > > For common cases, users can use the following expressions to define a > > strategy. > > 1. Bounded Out of Orderness, the strategy can be "rowtimeField - INTERVAL > > 'string' timeUnit". > > 2. Preserve Watermark From Source, the strategy can be > > "SYSTEM_WATERMARK()". > > > > ## Proctime Attribute > > > > CREATE TABLE table_name ( > > ... > > proc AS SYSTEM_PROCTIME() > > ) WITH ( > > ... > > ) > > > > It uses the computed column syntax to add an additional column with > > proctime attribute. Here SYSTEM_PROCTIME() is a built-in function. > > > > For more details and the implementations, please refer to the design doc: > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d822dba > > > > Feel free to leave your further feedbacks! > > > > Thanks, > > Jark > > > > On Thu, 19 Sep 2019 at 11:23, Kurt Young <[hidden email]> wrote: > > > > > +1 to start vote process. > > > > > > Best, > > > Kurt > > > > > > > > > On Thu, Sep 19, 2019 at 10:54 AM Jark Wu <[hidden email]> wrote: > > > > > > > Hi everyone, > > > > > > > > Thanks all for joining the discussion in the doc[1]. > > > > It seems that the discussion is converged and there is a consensus on > > the > > > > current FLIP document. > > > > If there is no objection, I would like to convert it into cwiki FLIP > > page > > > > and start voting process. > > > > > > > > For more details, please refer to the design doc (it is slightly > > changed > > > > since the initial proposal). > > > > > > > > Thanks, > > > > Jark > > > > > > > > [1]: > > > > > > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd > > > > > > > > On Mon, 16 Sep 2019 at 16:12, Kurt Young <[hidden email]> wrote: > > > > > > > > > After some review and discussion in the google document, I think > it's > > > > time > > > > > to > > > > > convert this design to a cwiki flip page and start voting process. > > > > > > > > > > Best, > > > > > Kurt > > > > > > > > > > > > > > > On Mon, Sep 9, 2019 at 7:46 PM Jark Wu <[hidden email]> wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > Thanks all for so much feedbacks received in the doc so far. > > > > > > I saw a general agreement on using computed column to support > > > proctime > > > > > > attribute and extract timestamps. > > > > > > So we will prepare a computed column FLIP and share in the dev ML > > > soon. > > > > > > > > > > > > Feel free to leave more comments! > > > > > > > > > > > > Best, > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 6 Sep 2019 at 13:50, Dian Fu <[hidden email]> > > wrote: > > > > > > > > > > > > > Hi Jark, > > > > > > > > > > > > > > Thanks for bringing up this discussion and the detailed design > > doc. > > > > > This > > > > > > > is definitely a critical feature for streaming SQL jobs. I have > > > left > > > > a > > > > > > few > > > > > > > comments in the design doc. > > > > > > > > > > > > > > Thanks, > > > > > > > Dian > > > > > > > > > > > > > > > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> 写道: > > > > > > > > > > > > > > > > Thanks Jark for this topic, This will be very useful. > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > ForwardXu > > > > > > > > > > > > > > > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > > > > > > > > > > > > > > > >> Thanks Jark for bring up this topic, this is definitely an > > > import > > > > > > > feature > > > > > > > >> for the SQL, especially the DDL users. > > > > > > > >> > > > > > > > >> I would spend some time to review this design doc, really > > > thanks. > > > > > > > >> > > > > > > > >> Best, > > > > > > > >> Danny Chan > > > > > > > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > > > > > > > >>> Hi everyone, > > > > > > > >>> > > > > > > > >>> I would like to start discussion about how to support time > > > > > attribute > > > > > > in > > > > > > > >> SQL > > > > > > > >>> DDL. > > > > > > > >>> In Flink 1.9, we already introduced a basic SQL DDL to > > create a > > > > > > table. > > > > > > > >>> However, it doesn't support to define time attributes. This > > > makes > > > > > > users > > > > > > > >>> can't > > > > > > > >>> apply window operations on the tables created by DDL which > > is a > > > > bad > > > > > > > >>> experience. > > > > > > > >>> > > > > > > > >>> In FLIP-66, we propose a syntax for watermark to define > > rowtime > > > > > > > attribute > > > > > > > >>> and propose to use computed column syntax to define > proctime > > > > > > attribute. > > > > > > > >>> But computed column is another big topic and should > deserve a > > > > > > separate > > > > > > > >>> FLIP. > > > > > > > >>> If we have a consensus on the computed column approach, we > > will > > > > > start > > > > > > > >>> computed column FLIP soon. > > > > > > > >>> > > > > > > > >>> FLIP-66: > > > > > > > >>> > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > > > > > > >>> > > > > > > > >>> Thanks for any feedback! > > > > > > > >>> > > > > > > > >>> Best, > > > > > > > >>> Jark > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
For histogram-based watermark strategy, one possible solution is that we
still use the stateless scalar function, and keep the stateful objects directly in the function. By doing that we will loose some information after the job get restarted, but I think it might acceptable because histogram-based is an approximate algorithm after all. But I agree we will meet some troubles if we want to have some accurate watermark computation logic. In this case, I would suggest to create a dedicated upstream job to do the watermark calculation, save the value into a field. Then in current job, we can just reference to the calculated field and specify it as this job's watermark. Best, Kurt On Mon, Sep 23, 2019 at 8:49 PM Jark Wu <[hidden email]> wrote: > Hi, > > Thanks Fabian for your reply. I agree with your point that the > histogram-based case need the function to be stateful which is not > supported currently and in this design. > Maybe we can support stateful scalar function like TableAggregateFunction. > We can further discuss how to support this in the future. > I added this limitation in the "Complex Watermark Strategies" section. > > Btw, I also updated how to automatically apply the watermark assigner by > the planner at the end of "Implementation" section [1]. > This can avoid every TableSource extending DefinedProctimeAttribute to > carry time attribute information. > > If there is no objection, I would like to update the cwiki FLIP page and > start a new voting process in the next days. > > Best, > Jark > > [1]: > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#heading=h.qx7j56dotywd > > > On Fri, 20 Sep 2019 at 22:18, Fabian Hueske <[hidden email]> wrote: > > > Hi Jark, > > > > Thanks for the summary! > > I like the proposal! > > > > It makes it very clear that an event time attribute is an existing column > > on which watermark metadata is defined whereas a processing time > attribute > > is a computed field. > > > > I have one comment regarding the section on "Complex Watermark > Strategies". > > The proposal says that you can also use a scalar function. > > I don't think that a "text book" scalar function would be sufficient for > > more advanced strategies. > > For example a histogram-based approach would need to remember the values > of > > the last x records. > > The interface of a scalar function would still work for that, but it > would > > be a stateful function (which would not be OK for a scalar function). > > I don't think it's a problem, but wanted to mention it here. > > > > Best, Fabian > > > > Am Do., 19. Sept. 2019 um 18:05 Uhr schrieb Jark Wu <[hidden email]>: > > > > > Hi everyone, > > > > > > Thanks all for the valuable suggestions and feedbacks so far. > > > Before starting the vote, I would like to summarize the proposed DDL > > syntax > > > in the mailing list. > > > > > > ## Rowtime Attribute (Watermark Syntax) > > > > > > CREATE TABLE table_name ( > > > WATERMARK FOR <columnName> AS <watermark_strategy_expression> > > > ) WITH ( > > > ... > > > ) > > > > > > It marks an existing field <columnName> as the rowtime attribute, and > the > > > watermark is generated by the expression > <watermark_strategy_expression>. > > > <watermark_strategy_expression> can be arbitrary expression which > > returns a > > > nullable BIGINT or TIMESTAMP as the watermark value. > > > > > > For common cases, users can use the following expressions to define a > > > strategy. > > > 1. Bounded Out of Orderness, the strategy can be "rowtimeField - > INTERVAL > > > 'string' timeUnit". > > > 2. Preserve Watermark From Source, the strategy can be > > > "SYSTEM_WATERMARK()". > > > > > > ## Proctime Attribute > > > > > > CREATE TABLE table_name ( > > > ... > > > proc AS SYSTEM_PROCTIME() > > > ) WITH ( > > > ... > > > ) > > > > > > It uses the computed column syntax to add an additional column with > > > proctime attribute. Here SYSTEM_PROCTIME() is a built-in function. > > > > > > For more details and the implementations, please refer to the design > doc: > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d822dba > > > > > > Feel free to leave your further feedbacks! > > > > > > Thanks, > > > Jark > > > > > > On Thu, 19 Sep 2019 at 11:23, Kurt Young <[hidden email]> wrote: > > > > > > > +1 to start vote process. > > > > > > > > Best, > > > > Kurt > > > > > > > > > > > > On Thu, Sep 19, 2019 at 10:54 AM Jark Wu <[hidden email]> wrote: > > > > > > > > > Hi everyone, > > > > > > > > > > Thanks all for joining the discussion in the doc[1]. > > > > > It seems that the discussion is converged and there is a consensus > on > > > the > > > > > current FLIP document. > > > > > If there is no objection, I would like to convert it into cwiki > FLIP > > > page > > > > > and start voting process. > > > > > > > > > > For more details, please refer to the design doc (it is slightly > > > changed > > > > > since the initial proposal). > > > > > > > > > > Thanks, > > > > > Jark > > > > > > > > > > [1]: > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd > > > > > > > > > > On Mon, 16 Sep 2019 at 16:12, Kurt Young <[hidden email]> wrote: > > > > > > > > > > > After some review and discussion in the google document, I think > > it's > > > > > time > > > > > > to > > > > > > convert this design to a cwiki flip page and start voting > process. > > > > > > > > > > > > Best, > > > > > > Kurt > > > > > > > > > > > > > > > > > > On Mon, Sep 9, 2019 at 7:46 PM Jark Wu <[hidden email]> wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > Thanks all for so much feedbacks received in the doc so far. > > > > > > > I saw a general agreement on using computed column to support > > > > proctime > > > > > > > attribute and extract timestamps. > > > > > > > So we will prepare a computed column FLIP and share in the dev > ML > > > > soon. > > > > > > > > > > > > > > Feel free to leave more comments! > > > > > > > > > > > > > > Best, > > > > > > > Jark > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, 6 Sep 2019 at 13:50, Dian Fu <[hidden email]> > > > wrote: > > > > > > > > > > > > > > > Hi Jark, > > > > > > > > > > > > > > > > Thanks for bringing up this discussion and the detailed > design > > > doc. > > > > > > This > > > > > > > > is definitely a critical feature for streaming SQL jobs. I > have > > > > left > > > > > a > > > > > > > few > > > > > > > > comments in the design doc. > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Dian > > > > > > > > > > > > > > > > > 在 2019年9月6日,上午11:48,Forward Xu <[hidden email]> > 写道: > > > > > > > > > > > > > > > > > > Thanks Jark for this topic, This will be very useful. > > > > > > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > > > > > ForwardXu > > > > > > > > > > > > > > > > > > Danny Chan <[hidden email]> 于2019年9月6日周五 上午11:26写道: > > > > > > > > > > > > > > > > > >> Thanks Jark for bring up this topic, this is definitely an > > > > import > > > > > > > > feature > > > > > > > > >> for the SQL, especially the DDL users. > > > > > > > > >> > > > > > > > > >> I would spend some time to review this design doc, really > > > > thanks. > > > > > > > > >> > > > > > > > > >> Best, > > > > > > > > >> Danny Chan > > > > > > > > >> 在 2019年9月6日 +0800 AM11:19,Jark Wu <[hidden email]>,写道: > > > > > > > > >>> Hi everyone, > > > > > > > > >>> > > > > > > > > >>> I would like to start discussion about how to support > time > > > > > > attribute > > > > > > > in > > > > > > > > >> SQL > > > > > > > > >>> DDL. > > > > > > > > >>> In Flink 1.9, we already introduced a basic SQL DDL to > > > create a > > > > > > > table. > > > > > > > > >>> However, it doesn't support to define time attributes. > This > > > > makes > > > > > > > users > > > > > > > > >>> can't > > > > > > > > >>> apply window operations on the tables created by DDL > which > > > is a > > > > > bad > > > > > > > > >>> experience. > > > > > > > > >>> > > > > > > > > >>> In FLIP-66, we propose a syntax for watermark to define > > > rowtime > > > > > > > > attribute > > > > > > > > >>> and propose to use computed column syntax to define > > proctime > > > > > > > attribute. > > > > > > > > >>> But computed column is another big topic and should > > deserve a > > > > > > > separate > > > > > > > > >>> FLIP. > > > > > > > > >>> If we have a consensus on the computed column approach, > we > > > will > > > > > > start > > > > > > > > >>> computed column FLIP soon. > > > > > > > > >>> > > > > > > > > >>> FLIP-66: > > > > > > > > >>> > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit# > > > > > > > > >>> > > > > > > > > >>> Thanks for any feedback! > > > > > > > > >>> > > > > > > > > >>> Best, > > > > > > > > >>> Jark > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
Free forum by Nabble | Edit this page |