Big +1 for this FLIP.
Recently I'm working on some Kafka topics that have timestamps as metadata, not in the message body. I want to declare a table from the topics with DDL but "rowtime_column_name" in <watermark_definition> seems to accept only existing columns. > <watermark_definition>: > WATERMARK FOR rowtime_column_name AS watermark_strategy_expression > > I raised an issue in user@ list but committers advise to use alternative approaches that call for detailed knowledge of Flink like custom decoding format or conversion between DataStream API and TableEnvironment. It is definitely against the main advantage of Flink SQL, simplicity and ease of use. This FLIP must be implemented IMHO in order for users to derive tables freely from any Kafka topic without having to involve DataStream API. Best, Dongwon On 2020/03/01 14:30:31, Dawid Wysakowicz <[hidden email]> wrote: > Hi,> > > I would like to propose an improvement that would enable reading table> > columns from different parts of source records. Besides the main payload> > majority (if not all of the sources) expose additional information. It> > can be simply a read-only metadata such as offset, ingestion time or a> > read and write parts of the record that contain data but additionally> > serve different purposes (partitioning, compaction etc.), e.g. key or> > timestamp in Kafka.> > > We should make it possible to read and write data from all of those> > locations. In this proposal I discuss reading partitioning data, for> > completeness this proposal discusses also the partitioning when writing> > data out.> > > I am looking forward to your comments.> > > You can access the FLIP here:> > > > Best,> > > Dawid> > > > |
The content length of FLIP-107 is relatively short but the scope and
implications it will cause is actually very big. From what I can tell now, I think there is a good chance that we can deliver part of this FLIP in 1.12, e.g. accessing the metadata field just like you mentioned. Best, Kurt On Tue, Aug 11, 2020 at 7:18 PM Dongwon Kim <[hidden email]> wrote: > Big +1 for this FLIP. > > Recently I'm working on some Kafka topics that have timestamps as > metadata, not in the message body. I want to declare a table from the > topics with DDL but "rowtime_column_name" in <watermark_definition> seems > to accept only existing columns. > > > <watermark_definition>: > > WATERMARK FOR rowtime_column_name AS watermark_strategy_expression > > > > > I raised an issue in user@ list but committers advise to use alternative > approaches that call for detailed knowledge of Flink like custom decoding > format or conversion between DataStream API and TableEnvironment. It is > definitely against the main advantage of Flink SQL, simplicity and ease of > use. This FLIP must be implemented IMHO in order for users to derive tables > freely from any Kafka topic without having to involve DataStream API. > > Best, > > Dongwon > > On 2020/03/01 14:30:31, Dawid Wysakowicz <[hidden email]> wrote: > > Hi,> > > > > I would like to propose an improvement that would enable reading table> > > columns from different parts of source records. Besides the main payload> > > majority (if not all of the sources) expose additional information. It> > > can be simply a read-only metadata such as offset, ingestion time or a> > > read and write parts of the record that contain data but additionally> > > serve different purposes (partitioning, compaction etc.), e.g. key or> > > timestamp in Kafka.> > > > > We should make it possible to read and write data from all of those> > > locations. In this proposal I discuss reading partitioning data, for> > > completeness this proposal discusses also the partitioning when writing> > > data out.> > > > > I am looking forward to your comments.> > > > > You can access the FLIP here:> > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode > > > > > > > Best,> > > > > Dawid> > > > > > > > |
+1 for FLIP-107
Reading different parts of source code should be the key feature for Flink SQL, like metadata in CDC data, key and timestamp in Kafka records. The scope of FLIP-107 is too big to finish in one version IMO, maybe we can start part work in 1.12. Best Leonard > 在 2020年8月11日,19:51,Kurt Young <[hidden email]> 写道: > > The content length of FLIP-107 is relatively short but the scope and > implications it will cause is actually very big. > From what I can tell now, I think there is a good chance that we can > deliver part of this FLIP in 1.12, e.g. > accessing the metadata field just like you mentioned. > > Best, > Kurt > > > On Tue, Aug 11, 2020 at 7:18 PM Dongwon Kim <[hidden email]> wrote: > >> Big +1 for this FLIP. >> >> Recently I'm working on some Kafka topics that have timestamps as >> metadata, not in the message body. I want to declare a table from the >> topics with DDL but "rowtime_column_name" in <watermark_definition> seems >> to accept only existing columns. >> >>> <watermark_definition>: >>> WATERMARK FOR rowtime_column_name AS watermark_strategy_expression >>> >>> >> I raised an issue in user@ list but committers advise to use alternative >> approaches that call for detailed knowledge of Flink like custom decoding >> format or conversion between DataStream API and TableEnvironment. It is >> definitely against the main advantage of Flink SQL, simplicity and ease of >> use. This FLIP must be implemented IMHO in order for users to derive tables >> freely from any Kafka topic without having to involve DataStream API. >> >> Best, >> >> Dongwon >> >> On 2020/03/01 14:30:31, Dawid Wysakowicz <[hidden email]> wrote: >>> Hi,> >>> >>> I would like to propose an improvement that would enable reading table> >>> columns from different parts of source records. Besides the main payload> >>> majority (if not all of the sources) expose additional information. It> >>> can be simply a read-only metadata such as offset, ingestion time or a> >>> read and write parts of the record that contain data but additionally> >>> serve different purposes (partitioning, compaction etc.), e.g. key or> >>> timestamp in Kafka.> >>> >>> We should make it possible to read and write data from all of those> >>> locations. In this proposal I discuss reading partitioning data, for> >>> completeness this proposal discusses also the partitioning when writing> >>> data out.> >>> >>> I am looking forward to your comments.> >>> >>> You can access the FLIP here:> >>> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode >>> >> >>> >>> Best,> >>> >>> Dawid> >>> >>> >>> >> |
Free forum by Nabble | Edit this page |