Hi all,
I would like to start the vote for FLIP-95 [1], which is discussed and reached a consensus in the discussion thread [2]. The vote will be open until April 2nd (72h), unless there is an objection or not enough votes. Thanks, Timo [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces [2] https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E |
+1 from my side.
Thanks Timo for driving this. Best, Jark On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote: > Hi all, > > I would like to start the vote for FLIP-95 [1], which is discussed and > reached a consensus in the discussion thread [2]. > > The vote will be open until April 2nd (72h), unless there is an > objection or not enough votes. > > Thanks, > Timo > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > [2] > > https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E > |
+1 (non-binding)
Jark Wu <[hidden email]> 于2020年3月30日周一 下午3:57写道: > +1 from my side. > > Thanks Timo for driving this. > > Best, > Jark > > On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote: > > > Hi all, > > > > I would like to start the vote for FLIP-95 [1], which is discussed and > > reached a consensus in the discussion thread [2]. > > > > The vote will be open until April 2nd (72h), unless there is an > > objection or not enough votes. > > > > Thanks, > > Timo > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > > [2] > > > > > https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E > > > -- Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
+1
Best, Kurt On Mon, Mar 30, 2020 at 4:08 PM Benchao Li <[hidden email]> wrote: > +1 (non-binding) > > Jark Wu <[hidden email]> 于2020年3月30日周一 下午3:57写道: > > > +1 from my side. > > > > Thanks Timo for driving this. > > > > Best, > > Jark > > > > On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote: > > > > > Hi all, > > > > > > I would like to start the vote for FLIP-95 [1], which is discussed and > > > reached a consensus in the discussion thread [2]. > > > > > > The vote will be open until April 2nd (72h), unless there is an > > > objection or not enough votes. > > > > > > Thanks, > > > Timo > > > > > > [1] > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > > > [2] > > > > > > > > > https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E > > > > > > > > -- > > Benchao Li > School of Electronics Engineering and Computer Science, Peking University > Tel:+86-15650713730 > Email: [hidden email]; [hidden email] > |
+1
Best, Jingsong Lee On Mon, Mar 30, 2020 at 4:41 PM Kurt Young <[hidden email]> wrote: > +1 > > Best, > Kurt > > > On Mon, Mar 30, 2020 at 4:08 PM Benchao Li <[hidden email]> wrote: > > > +1 (non-binding) > > > > Jark Wu <[hidden email]> 于2020年3月30日周一 下午3:57写道: > > > > > +1 from my side. > > > > > > Thanks Timo for driving this. > > > > > > Best, > > > Jark > > > > > > On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote: > > > > > > > Hi all, > > > > > > > > I would like to start the vote for FLIP-95 [1], which is discussed > and > > > > reached a consensus in the discussion thread [2]. > > > > > > > > The vote will be open until April 2nd (72h), unless there is an > > > > objection or not enough votes. > > > > > > > > Thanks, > > > > Timo > > > > > > > > [1] > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > > > > [2] > > > > > > > > > > > > > > https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E > > > > > > > > > > > > > -- > > > > Benchao Li > > School of Electronics Engineering and Computer Science, Peking University > > Tel:+86-15650713730 > > Email: [hidden email]; [hidden email] > > > -- Best, Jingsong Lee |
+1(non-binding)
Best, Leonard Xu > 在 2020年3月30日,16:43,Jingsong Li <[hidden email]> 写道: > > +1 > > Best, > Jingsong Lee > > On Mon, Mar 30, 2020 at 4:41 PM Kurt Young <[hidden email]> wrote: > >> +1 >> >> Best, >> Kurt >> >> >> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li <[hidden email]> wrote: >> >>> +1 (non-binding) >>> >>> Jark Wu <[hidden email]> 于2020年3月30日周一 下午3:57写道: >>> >>>> +1 from my side. >>>> >>>> Thanks Timo for driving this. >>>> >>>> Best, >>>> Jark >>>> >>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote: >>>> >>>>> Hi all, >>>>> >>>>> I would like to start the vote for FLIP-95 [1], which is discussed >> and >>>>> reached a consensus in the discussion thread [2]. >>>>> >>>>> The vote will be open until April 2nd (72h), unless there is an >>>>> objection or not enough votes. >>>>> >>>>> Thanks, >>>>> Timo >>>>> >>>>> [1] >>>>> >>>>> >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces >>>>> [2] >>>>> >>>>> >>>> >>> >> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E >>>>> >>>> >>> >>> >>> -- >>> >>> Benchao Li >>> School of Electronics Engineering and Computer Science, Peking University >>> Tel:+86-15650713730 >>> Email: [hidden email]; [hidden email] >>> >> > > > -- > Best, Jingsong Lee |
Generally +1 One slight concern I have is about the I don't think this changes the core design presented in the FLIP, therefore I'm fine with accepting the FLIP. I wanted to mention my concerns, so that maybe we can adjust the passed around structures slightly. Best, Dawid On 30/03/2020 14:42, Leonard Xu wrote:
+1(non-binding) Best, Leonard Xu在 2020年3月30日,16:43,Jingsong Li [hidden email] 写道: +1 Best, Jingsong Lee On Mon, Mar 30, 2020 at 4:41 PM Kurt Young [hidden email] wrote:+1 Best, Kurt On Mon, Mar 30, 2020 at 4:08 PM Benchao Li [hidden email] wrote:+1 (non-binding) Jark Wu [hidden email] 于2020年3月30日周一 下午3:57写道:+1 from my side. Thanks Timo for driving this. Best, Jark On Mon, 30 Mar 2020 at 15:36, Timo Walther [hidden email] wrote:Hi all, I would like to start the vote for FLIP-95 [1], which is discussedandreached a consensus in the discussion thread [2]. The vote will be open until April 2nd (72h), unless there is an objection or not enough votes. Thanks, Timo [1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces[2]https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E-- Benchao Li School of Electronics Engineering and Computer Science, Peking University <a class="moz-txt-link-freetext" href="Tel:+86-15650713730">Tel:+86-15650713730 Email: [hidden email]; [hidden email]-- Best, Jingsong Lee signature.asc (849 bytes) Download Attachment |
Hi Dawid,
thanks for your feedback. I agree with your concerns. I also observed that TableSchema might be used at too many different places for different responsibilities. How about we introduce a helper class for `SupportsProjectionPushDown` and also `LookupTableSource#Context#getKeys()` to represent nested structure of names. Data types, constraints, or computed columns are not necessary at those locations. We can also add utility methods for connectors to this helper class there to quickly figuring out differences between the original table schema and the new one. SelectedFields { private LogicalType orignalRowType; // set by the planner private int[][] indices; getNames(int... at): String[] getNames(String... at): String[] getIndices(int... at): int[] getNames(String... at): String[] toTableSchema(): TableSchema } What do others think? Thanks, Timo On 02.04.20 12:28, Dawid Wysakowicz wrote: > Generally +1 > > One slight concern I have is about the |SupportsProjectionPushDown.|I > don't necessarily understand how can we express projections with > TableSchema. It's unclear for me what happens when a type of a field > changes, fields are in a different order, when types do not match. How > do we express projection of a nested field with TableSchema? > > I don't think this changes the core design presented in the FLIP, > therefore I'm fine with accepting the FLIP. I wanted to mention my > concerns, so that maybe we can adjust the passed around structures slightly. > > Best, > > Dawid > || > > On 30/03/2020 14:42, Leonard Xu wrote: >> +1(non-binding) >> >> Best, >> Leonard Xu >> >>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]> 写道: >>> >>> +1 >>> >>> Best, >>> Jingsong Lee >>> >>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]> wrote: >>> >>>> +1 >>>> >>>> Best, >>>> Kurt >>>> >>>> >>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> Jark Wu<[hidden email]> 于2020年3月30日周一 下午3:57写道: >>>>> >>>>>> +1 from my side. >>>>>> >>>>>> Thanks Timo for driving this. >>>>>> >>>>>> Best, >>>>>> Jark >>>>>> >>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I would like to start the vote for FLIP-95 [1], which is discussed >>>> and >>>>>>> reached a consensus in the discussion thread [2]. >>>>>>> >>>>>>> The vote will be open until April 2nd (72h), unless there is an >>>>>>> objection or not enough votes. >>>>>>> >>>>>>> Thanks, >>>>>>> Timo >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces >>>>>>> [2] >>>>>>> >>>>>>> >>>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E >>>>> -- >>>>> >>>>> Benchao Li >>>>> School of Electronics Engineering and Computer Science, Peking University >>>>> Tel:+86-15650713730 >>>>> Email:[hidden email];[hidden email] >>>>> >>> -- >>> Best, Jingsong Lee |
Hi Dawid,
> How to express projections with TableSchema? The TableSource holds the original TableSchema (i.e. from DDL) and the pushed TableSchema represents the schema after projection. Thus the table source can compare them to figure out changed field orders or not matched types. For most sources who maps physical storage by field names (e.g. jdbc, hbase, json) they can just simply apply the pushed TableSchema. But sources who maps by field indexes (e.g. csv), they need to figure out the projected indexes by comparing the original and projected schema. For example, the original schema is [a: String, b: Int, c: Timestamp], and b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the source can figure out index=1 is pruned. > How do we express projection of a nested field with TableSchema? This is the same to the above one. For example, the original schema is [rk: String, f1 Row<q1 Int, q2 Double>]. If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2 Double>]. > TableSchema might be used at too many different places for different responsibilities. Agree. We have recognized that a structure and builder for pure table schema is required in many places. But we mixed many concepts of catalog table schema in TableSchema. IIRC, in an offline discussion of FLIP-84, we want to introduce a new `CatalogTableSchema` to represent the schema part of a DDL, and remove all the watermark, computed column information from TableSchema? Then `TableSchema` can continue to serve as a pure table schema and it stays in a good package. Best, Jark On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote: > Hi Dawid, > > thanks for your feedback. I agree with your concerns. I also observed > that TableSchema might be used at too many different places for > different responsibilities. > > How about we introduce a helper class for `SupportsProjectionPushDown` > and also `LookupTableSource#Context#getKeys()` to represent nested > structure of names. Data types, constraints, or computed columns are not > necessary at those locations. > > We can also add utility methods for connectors to this helper class > there to quickly figuring out differences between the original table > schema and the new one. > > SelectedFields { > > private LogicalType orignalRowType; // set by the planner > > private int[][] indices; > > getNames(int... at): String[] > > getNames(String... at): String[] > > getIndices(int... at): int[] > > getNames(String... at): String[] > > toTableSchema(): TableSchema > } > > What do others think? > > Thanks, > Timo > > > > On 02.04.20 12:28, Dawid Wysakowicz wrote: > > Generally +1 > > > > One slight concern I have is about the |SupportsProjectionPushDown.|I > > don't necessarily understand how can we express projections with > > TableSchema. It's unclear for me what happens when a type of a field > > changes, fields are in a different order, when types do not match. How > > do we express projection of a nested field with TableSchema? > > > > I don't think this changes the core design presented in the FLIP, > > therefore I'm fine with accepting the FLIP. I wanted to mention my > > concerns, so that maybe we can adjust the passed around structures > slightly. > > > > Best, > > > > Dawid > > || > > > > On 30/03/2020 14:42, Leonard Xu wrote: > >> +1(non-binding) > >> > >> Best, > >> Leonard Xu > >> > >>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]> 写道: > >>> > >>> +1 > >>> > >>> Best, > >>> Jingsong Lee > >>> > >>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]> wrote: > >>> > >>>> +1 > >>>> > >>>> Best, > >>>> Kurt > >>>> > >>>> > >>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]> > wrote: > >>>> > >>>>> +1 (non-binding) > >>>>> > >>>>> Jark Wu<[hidden email]> 于2020年3月30日周一 下午3:57写道: > >>>>> > >>>>>> +1 from my side. > >>>>>> > >>>>>> Thanks Timo for driving this. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]> > wrote: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I would like to start the vote for FLIP-95 [1], which is discussed > >>>> and > >>>>>>> reached a consensus in the discussion thread [2]. > >>>>>>> > >>>>>>> The vote will be open until April 2nd (72h), unless there is an > >>>>>>> objection or not enough votes. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Timo > >>>>>>> > >>>>>>> [1] > >>>>>>> > >>>>>>> > >>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > >>>>>>> [2] > >>>>>>> > >>>>>>> > >>>> > https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E > >>>>> -- > >>>>> > >>>>> Benchao Li > >>>>> School of Electronics Engineering and Computer Science, Peking > University > >>>>> Tel:+86-15650713730 > >>>>> Email:[hidden email];[hidden email] > >>>>> > >>> -- > >>> Best, Jingsong Lee > > |
Hi Jark,
if catalogs use `CatalogTableSchema` in the future. The source would internally also work with `CatalogTableSchema`. I'm fine with cleaning up the `TableSchema` class but should a source deal with two different schema classes then? Another problem that I see is that connectors usually need to perform some index arithmetics. Dealing with TableSchema and additionally within a field with DataType might be a bit inconvenient. A dedicated class with utilities might be helpful such that not every source needs to implement the same intersection/diff logic again. Regards, Timo On 02.04.20 14:06, Jark Wu wrote: > Hi Dawid, > >> How to express projections with TableSchema? > The TableSource holds the original TableSchema (i.e. from DDL) and the > pushed TableSchema represents the schema after projection. > Thus the table source can compare them to figure out changed field orders > or not matched types. > For most sources who maps physical storage by field names (e.g. jdbc, > hbase, json) they can just simply apply the pushed TableSchema. > But sources who maps by field indexes (e.g. csv), they need to figure out > the projected indexes by comparing the original and projected schema. > For example, the original schema is [a: String, b: Int, c: Timestamp], and > b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the > source can figure out index=1 is pruned. > >> How do we express projection of a nested field with TableSchema? > This is the same to the above one. For example, the original schema is [rk: > String, f1 Row<q1 Int, q2 Double>]. > If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2 > Double>]. > >> TableSchema might be used at too many different places for different > responsibilities. > Agree. We have recognized that a structure and builder for pure table > schema is required in many places. But we mixed many concepts of catalog > table schema in TableSchema. > IIRC, in an offline discussion of FLIP-84, we want to introduce a new > `CatalogTableSchema` to represent the schema part of a DDL, > and remove all the watermark, computed column information from TableSchema? > Then `TableSchema` can continue to serve as a pure table schema and it > stays in a good package. > > Best, > Jark > > > > > On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote: > >> Hi Dawid, >> >> thanks for your feedback. I agree with your concerns. I also observed >> that TableSchema might be used at too many different places for >> different responsibilities. >> >> How about we introduce a helper class for `SupportsProjectionPushDown` >> and also `LookupTableSource#Context#getKeys()` to represent nested >> structure of names. Data types, constraints, or computed columns are not >> necessary at those locations. >> >> We can also add utility methods for connectors to this helper class >> there to quickly figuring out differences between the original table >> schema and the new one. >> >> SelectedFields { >> >> private LogicalType orignalRowType; // set by the planner >> >> private int[][] indices; >> >> getNames(int... at): String[] >> >> getNames(String... at): String[] >> >> getIndices(int... at): int[] >> >> getNames(String... at): String[] >> >> toTableSchema(): TableSchema >> } >> >> What do others think? >> >> Thanks, >> Timo >> >> >> >> On 02.04.20 12:28, Dawid Wysakowicz wrote: >>> Generally +1 >>> >>> One slight concern I have is about the |SupportsProjectionPushDown.|I >>> don't necessarily understand how can we express projections with >>> TableSchema. It's unclear for me what happens when a type of a field >>> changes, fields are in a different order, when types do not match. How >>> do we express projection of a nested field with TableSchema? >>> >>> I don't think this changes the core design presented in the FLIP, >>> therefore I'm fine with accepting the FLIP. I wanted to mention my >>> concerns, so that maybe we can adjust the passed around structures >> slightly. >>> >>> Best, >>> >>> Dawid >>> || >>> >>> On 30/03/2020 14:42, Leonard Xu wrote: >>>> +1(non-binding) >>>> >>>> Best, >>>> Leonard Xu >>>> >>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]> 写道: >>>>> >>>>> +1 >>>>> >>>>> Best, >>>>> Jingsong Lee >>>>> >>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> Best, >>>>>> Kurt >>>>>> >>>>>> >>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]> >> wrote: >>>>>> >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> Jark Wu<[hidden email]> 于2020年3月30日周一 下午3:57写道: >>>>>>> >>>>>>>> +1 from my side. >>>>>>>> >>>>>>>> Thanks Timo for driving this. >>>>>>>> >>>>>>>> Best, >>>>>>>> Jark >>>>>>>> >>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]> >> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I would like to start the vote for FLIP-95 [1], which is discussed >>>>>> and >>>>>>>>> reached a consensus in the discussion thread [2]. >>>>>>>>> >>>>>>>>> The vote will be open until April 2nd (72h), unless there is an >>>>>>>>> objection or not enough votes. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>>> >>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces >>>>>>>>> [2] >>>>>>>>> >>>>>>>>> >>>>>> >> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E >>>>>>> -- >>>>>>> >>>>>>> Benchao Li >>>>>>> School of Electronics Engineering and Computer Science, Peking >> University >>>>>>> Tel:+86-15650713730 >>>>>>> Email:[hidden email];[hidden email] >>>>>>> >>>>> -- >>>>> Best, Jingsong Lee >> >> > |
Hi Timo,
I don't think source should work with `CatalogTableSchema`. So far, a table source doesn't need to know the logic information of computed column and watermark. IMO, we should provide a method to convert from `CatalogTableSchema` into `TableSchema` without computed columns in source factory, and a source should just hold the `TableSchema`. I agree doing the intersection/diff logic is trivial, but maybe we can provide utilities to do that? So that we can keep the interface clean. Best, Jark On Thu, 2 Apr 2020 at 20:17, Timo Walther <[hidden email]> wrote: > Hi Jark, > > if catalogs use `CatalogTableSchema` in the future. The source would > internally also work with `CatalogTableSchema`. I'm fine with cleaning > up the `TableSchema` class but should a source deal with two different > schema classes then? > > Another problem that I see is that connectors usually need to perform > some index arithmetics. Dealing with TableSchema and additionally within > a field with DataType might be a bit inconvenient. A dedicated class > with utilities might be helpful such that not every source needs to > implement the same intersection/diff logic again. > > Regards, > Timo > > > On 02.04.20 14:06, Jark Wu wrote: > > Hi Dawid, > > > >> How to express projections with TableSchema? > > The TableSource holds the original TableSchema (i.e. from DDL) and the > > pushed TableSchema represents the schema after projection. > > Thus the table source can compare them to figure out changed field orders > > or not matched types. > > For most sources who maps physical storage by field names (e.g. jdbc, > > hbase, json) they can just simply apply the pushed TableSchema. > > But sources who maps by field indexes (e.g. csv), they need to figure out > > the projected indexes by comparing the original and projected schema. > > For example, the original schema is [a: String, b: Int, c: Timestamp], > and > > b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the > > source can figure out index=1 is pruned. > > > >> How do we express projection of a nested field with TableSchema? > > This is the same to the above one. For example, the original schema is > [rk: > > String, f1 Row<q1 Int, q2 Double>]. > > If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2 > > Double>]. > > > >> TableSchema might be used at too many different places for different > > responsibilities. > > Agree. We have recognized that a structure and builder for pure table > > schema is required in many places. But we mixed many concepts of catalog > > table schema in TableSchema. > > IIRC, in an offline discussion of FLIP-84, we want to introduce a new > > `CatalogTableSchema` to represent the schema part of a DDL, > > and remove all the watermark, computed column information from > TableSchema? > > Then `TableSchema` can continue to serve as a pure table schema and it > > stays in a good package. > > > > Best, > > Jark > > > > > > > > > > On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote: > > > >> Hi Dawid, > >> > >> thanks for your feedback. I agree with your concerns. I also observed > >> that TableSchema might be used at too many different places for > >> different responsibilities. > >> > >> How about we introduce a helper class for `SupportsProjectionPushDown` > >> and also `LookupTableSource#Context#getKeys()` to represent nested > >> structure of names. Data types, constraints, or computed columns are not > >> necessary at those locations. > >> > >> We can also add utility methods for connectors to this helper class > >> there to quickly figuring out differences between the original table > >> schema and the new one. > >> > >> SelectedFields { > >> > >> private LogicalType orignalRowType; // set by the planner > >> > >> private int[][] indices; > >> > >> getNames(int... at): String[] > >> > >> getNames(String... at): String[] > >> > >> getIndices(int... at): int[] > >> > >> getNames(String... at): String[] > >> > >> toTableSchema(): TableSchema > >> } > >> > >> What do others think? > >> > >> Thanks, > >> Timo > >> > >> > >> > >> On 02.04.20 12:28, Dawid Wysakowicz wrote: > >>> Generally +1 > >>> > >>> One slight concern I have is about the |SupportsProjectionPushDown.|I > >>> don't necessarily understand how can we express projections with > >>> TableSchema. It's unclear for me what happens when a type of a field > >>> changes, fields are in a different order, when types do not match. How > >>> do we express projection of a nested field with TableSchema? > >>> > >>> I don't think this changes the core design presented in the FLIP, > >>> therefore I'm fine with accepting the FLIP. I wanted to mention my > >>> concerns, so that maybe we can adjust the passed around structures > >> slightly. > >>> > >>> Best, > >>> > >>> Dawid > >>> || > >>> > >>> On 30/03/2020 14:42, Leonard Xu wrote: > >>>> +1(non-binding) > >>>> > >>>> Best, > >>>> Leonard Xu > >>>> > >>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]> 写道: > >>>>> > >>>>> +1 > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]> wrote: > >>>>> > >>>>>> +1 > >>>>>> > >>>>>> Best, > >>>>>> Kurt > >>>>>> > >>>>>> > >>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]> > >> wrote: > >>>>>> > >>>>>>> +1 (non-binding) > >>>>>>> > >>>>>>> Jark Wu<[hidden email]> 于2020年3月30日周一 下午3:57写道: > >>>>>>> > >>>>>>>> +1 from my side. > >>>>>>>> > >>>>>>>> Thanks Timo for driving this. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jark > >>>>>>>> > >>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]> > >> wrote: > >>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I would like to start the vote for FLIP-95 [1], which is > discussed > >>>>>> and > >>>>>>>>> reached a consensus in the discussion thread [2]. > >>>>>>>>> > >>>>>>>>> The vote will be open until April 2nd (72h), unless there is an > >>>>>>>>> objection or not enough votes. > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> [1] > >>>>>>>>> > >>>>>>>>> > >>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces > >>>>>>>>> [2] > >>>>>>>>> > >>>>>>>>> > >>>>>> > >> > https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E > >>>>>>> -- > >>>>>>> > >>>>>>> Benchao Li > >>>>>>> School of Electronics Engineering and Computer Science, Peking > >> University > >>>>>>> Tel:+86-15650713730 > >>>>>>> Email:[hidden email];[hidden email] > >>>>>>> > >>>>> -- > >>>>> Best, Jingsong Lee > >> > >> > > > > |
Hi all,
Just to make it clear. I don't want to block the whole effort. I'm big +1 on the whole document and -0 on using the TableSchema for a projection pushdown. My personal opinion was that TableSchema is misleading for Projection pushdown and I still think that way. That's why I wanted to bring it up to see if I am the only one. If that's the case, let's just proceed with the TableSchema. Best, Dawid On 02/04/2020 17:19, Jark Wu wrote: > Hi Timo, > > I don't think source should work with `CatalogTableSchema`. So far, a table > source doesn't need to know the logic information of computed column and > watermark. > IMO, we should provide a method to convert from `CatalogTableSchema` into > `TableSchema` without computed columns in source factory, > and a source should just hold the `TableSchema`. > > I agree doing the intersection/diff logic is trivial, but maybe we can > provide utilities to do that? So that we can keep the interface clean. > > > Best, > Jark > > > On Thu, 2 Apr 2020 at 20:17, Timo Walther <[hidden email]> wrote: > >> Hi Jark, >> >> if catalogs use `CatalogTableSchema` in the future. The source would >> internally also work with `CatalogTableSchema`. I'm fine with cleaning >> up the `TableSchema` class but should a source deal with two different >> schema classes then? >> >> Another problem that I see is that connectors usually need to perform >> some index arithmetics. Dealing with TableSchema and additionally within >> a field with DataType might be a bit inconvenient. A dedicated class >> with utilities might be helpful such that not every source needs to >> implement the same intersection/diff logic again. >> >> Regards, >> Timo >> >> >> On 02.04.20 14:06, Jark Wu wrote: >>> Hi Dawid, >>> >>>> How to express projections with TableSchema? >>> The TableSource holds the original TableSchema (i.e. from DDL) and the >>> pushed TableSchema represents the schema after projection. >>> Thus the table source can compare them to figure out changed field orders >>> or not matched types. >>> For most sources who maps physical storage by field names (e.g. jdbc, >>> hbase, json) they can just simply apply the pushed TableSchema. >>> But sources who maps by field indexes (e.g. csv), they need to figure out >>> the projected indexes by comparing the original and projected schema. >>> For example, the original schema is [a: String, b: Int, c: Timestamp], >> and >>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the >>> source can figure out index=1 is pruned. >>> >>>> How do we express projection of a nested field with TableSchema? >>> This is the same to the above one. For example, the original schema is >> [rk: >>> String, f1 Row<q1 Int, q2 Double>]. >>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2 >>> Double>]. >>> >>>> TableSchema might be used at too many different places for different >>> responsibilities. >>> Agree. We have recognized that a structure and builder for pure table >>> schema is required in many places. But we mixed many concepts of catalog >>> table schema in TableSchema. >>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new >>> `CatalogTableSchema` to represent the schema part of a DDL, >>> and remove all the watermark, computed column information from >> TableSchema? >>> Then `TableSchema` can continue to serve as a pure table schema and it >>> stays in a good package. >>> >>> Best, >>> Jark >>> >>> >>> >>> >>> On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote: >>> >>>> Hi Dawid, >>>> >>>> thanks for your feedback. I agree with your concerns. I also observed >>>> that TableSchema might be used at too many different places for >>>> different responsibilities. >>>> >>>> How about we introduce a helper class for `SupportsProjectionPushDown` >>>> and also `LookupTableSource#Context#getKeys()` to represent nested >>>> structure of names. Data types, constraints, or computed columns are not >>>> necessary at those locations. >>>> >>>> We can also add utility methods for connectors to this helper class >>>> there to quickly figuring out differences between the original table >>>> schema and the new one. >>>> >>>> SelectedFields { >>>> >>>> private LogicalType orignalRowType; // set by the planner >>>> >>>> private int[][] indices; >>>> >>>> getNames(int... at): String[] >>>> >>>> getNames(String... at): String[] >>>> >>>> getIndices(int... at): int[] >>>> >>>> getNames(String... at): String[] >>>> >>>> toTableSchema(): TableSchema >>>> } >>>> >>>> What do others think? >>>> >>>> Thanks, >>>> Timo >>>> >>>> >>>> >>>> On 02.04.20 12:28, Dawid Wysakowicz wrote: >>>>> Generally +1 >>>>> >>>>> One slight concern I have is about the |SupportsProjectionPushDown.|I >>>>> don't necessarily understand how can we express projections with >>>>> TableSchema. It's unclear for me what happens when a type of a field >>>>> changes, fields are in a different order, when types do not match. How >>>>> do we express projection of a nested field with TableSchema? >>>>> >>>>> I don't think this changes the core design presented in the FLIP, >>>>> therefore I'm fine with accepting the FLIP. I wanted to mention my >>>>> concerns, so that maybe we can adjust the passed around structures >>>> slightly. >>>>> Best, >>>>> >>>>> Dawid >>>>> || >>>>> >>>>> On 30/03/2020 14:42, Leonard Xu wrote: >>>>>> +1(non-binding) >>>>>> >>>>>> Best, >>>>>> Leonard Xu >>>>>> >>>>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]> 写道: >>>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Best, >>>>>>> Jingsong Lee >>>>>>> >>>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]> wrote: >>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> Best, >>>>>>>> Kurt >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]> >>>> wrote: >>>>>>>>> +1 (non-binding) >>>>>>>>> >>>>>>>>> Jark Wu<[hidden email]> 于2020年3月30日周一 下午3:57写道: >>>>>>>>> >>>>>>>>>> +1 from my side. >>>>>>>>>> >>>>>>>>>> Thanks Timo for driving this. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jark >>>>>>>>>> >>>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]> >>>> wrote: >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is >> discussed >>>>>>>> and >>>>>>>>>>> reached a consensus in the discussion thread [2]. >>>>>>>>>>> >>>>>>>>>>> The vote will be open until April 2nd (72h), unless there is an >>>>>>>>>>> objection or not enough votes. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> [1] >>>>>>>>>>> >>>>>>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces >>>>>>>>>>> [2] >>>>>>>>>>> >>>>>>>>>>> >> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E >>>>>>>>> -- >>>>>>>>> >>>>>>>>> Benchao Li >>>>>>>>> School of Electronics Engineering and Computer Science, Peking >>>> University >>>>>>>>> Tel:+86-15650713730 >>>>>>>>> Email:[hidden email];[hidden email] >>>>>>>>> >>>>>>> -- >>>>>>> Best, Jingsong Lee >>>> >> signature.asc (849 bytes) Download Attachment |
Hi Dawid,
thanks for your response. Let's postpone the discussion around a single interface into a later issue. It seems that we reached a consensus on the remaining interfaces. Given the size of the FLIP, I think it is fine to conclude the voting now. Thanks, Timo On 03.04.20 10:42, Dawid Wysakowicz wrote: > Hi all, > > Just to make it clear. I don't want to block the whole effort. I'm big > +1 on the whole document and -0 on using the TableSchema for a > projection pushdown. > > My personal opinion was that TableSchema is misleading for Projection > pushdown and I still think that way. That's why I wanted to bring it up > to see if I am the only one. If that's the case, let's just proceed with > the TableSchema. > > Best, > > Dawid > > On 02/04/2020 17:19, Jark Wu wrote: >> Hi Timo, >> >> I don't think source should work with `CatalogTableSchema`. So far, a table >> source doesn't need to know the logic information of computed column and >> watermark. >> IMO, we should provide a method to convert from `CatalogTableSchema` into >> `TableSchema` without computed columns in source factory, >> and a source should just hold the `TableSchema`. >> >> I agree doing the intersection/diff logic is trivial, but maybe we can >> provide utilities to do that? So that we can keep the interface clean. >> >> >> Best, >> Jark >> >> >> On Thu, 2 Apr 2020 at 20:17, Timo Walther <[hidden email]> wrote: >> >>> Hi Jark, >>> >>> if catalogs use `CatalogTableSchema` in the future. The source would >>> internally also work with `CatalogTableSchema`. I'm fine with cleaning >>> up the `TableSchema` class but should a source deal with two different >>> schema classes then? >>> >>> Another problem that I see is that connectors usually need to perform >>> some index arithmetics. Dealing with TableSchema and additionally within >>> a field with DataType might be a bit inconvenient. A dedicated class >>> with utilities might be helpful such that not every source needs to >>> implement the same intersection/diff logic again. >>> >>> Regards, >>> Timo >>> >>> >>> On 02.04.20 14:06, Jark Wu wrote: >>>> Hi Dawid, >>>> >>>>> How to express projections with TableSchema? >>>> The TableSource holds the original TableSchema (i.e. from DDL) and the >>>> pushed TableSchema represents the schema after projection. >>>> Thus the table source can compare them to figure out changed field orders >>>> or not matched types. >>>> For most sources who maps physical storage by field names (e.g. jdbc, >>>> hbase, json) they can just simply apply the pushed TableSchema. >>>> But sources who maps by field indexes (e.g. csv), they need to figure out >>>> the projected indexes by comparing the original and projected schema. >>>> For example, the original schema is [a: String, b: Int, c: Timestamp], >>> and >>>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the >>>> source can figure out index=1 is pruned. >>>> >>>>> How do we express projection of a nested field with TableSchema? >>>> This is the same to the above one. For example, the original schema is >>> [rk: >>>> String, f1 Row<q1 Int, q2 Double>]. >>>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2 >>>> Double>]. >>>> >>>>> TableSchema might be used at too many different places for different >>>> responsibilities. >>>> Agree. We have recognized that a structure and builder for pure table >>>> schema is required in many places. But we mixed many concepts of catalog >>>> table schema in TableSchema. >>>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new >>>> `CatalogTableSchema` to represent the schema part of a DDL, >>>> and remove all the watermark, computed column information from >>> TableSchema? >>>> Then `TableSchema` can continue to serve as a pure table schema and it >>>> stays in a good package. >>>> >>>> Best, >>>> Jark >>>> >>>> >>>> >>>> >>>> On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote: >>>> >>>>> Hi Dawid, >>>>> >>>>> thanks for your feedback. I agree with your concerns. I also observed >>>>> that TableSchema might be used at too many different places for >>>>> different responsibilities. >>>>> >>>>> How about we introduce a helper class for `SupportsProjectionPushDown` >>>>> and also `LookupTableSource#Context#getKeys()` to represent nested >>>>> structure of names. Data types, constraints, or computed columns are not >>>>> necessary at those locations. >>>>> >>>>> We can also add utility methods for connectors to this helper class >>>>> there to quickly figuring out differences between the original table >>>>> schema and the new one. >>>>> >>>>> SelectedFields { >>>>> >>>>> private LogicalType orignalRowType; // set by the planner >>>>> >>>>> private int[][] indices; >>>>> >>>>> getNames(int... at): String[] >>>>> >>>>> getNames(String... at): String[] >>>>> >>>>> getIndices(int... at): int[] >>>>> >>>>> getNames(String... at): String[] >>>>> >>>>> toTableSchema(): TableSchema >>>>> } >>>>> >>>>> What do others think? >>>>> >>>>> Thanks, >>>>> Timo >>>>> >>>>> >>>>> >>>>> On 02.04.20 12:28, Dawid Wysakowicz wrote: >>>>>> Generally +1 >>>>>> >>>>>> One slight concern I have is about the |SupportsProjectionPushDown.|I >>>>>> don't necessarily understand how can we express projections with >>>>>> TableSchema. It's unclear for me what happens when a type of a field >>>>>> changes, fields are in a different order, when types do not match. How >>>>>> do we express projection of a nested field with TableSchema? >>>>>> >>>>>> I don't think this changes the core design presented in the FLIP, >>>>>> therefore I'm fine with accepting the FLIP. I wanted to mention my >>>>>> concerns, so that maybe we can adjust the passed around structures >>>>> slightly. >>>>>> Best, >>>>>> >>>>>> Dawid >>>>>> || >>>>>> >>>>>> On 30/03/2020 14:42, Leonard Xu wrote: >>>>>>> +1(non-binding) >>>>>>> >>>>>>> Best, >>>>>>> Leonard Xu >>>>>>> >>>>>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]> 写道: >>>>>>>> >>>>>>>> +1 >>>>>>>> >>>>>>>> Best, >>>>>>>> Jingsong Lee >>>>>>>> >>>>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]> wrote: >>>>>>>> >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Kurt >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]> >>>>> wrote: >>>>>>>>>> +1 (non-binding) >>>>>>>>>> >>>>>>>>>> Jark Wu<[hidden email]> 于2020年3月30日周一 下午3:57写道: >>>>>>>>>> >>>>>>>>>>> +1 from my side. >>>>>>>>>>> >>>>>>>>>>> Thanks Timo for driving this. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Jark >>>>>>>>>>> >>>>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]> >>>>> wrote: >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is >>> discussed >>>>>>>>> and >>>>>>>>>>>> reached a consensus in the discussion thread [2]. >>>>>>>>>>>> >>>>>>>>>>>> The vote will be open until April 2nd (72h), unless there is an >>>>>>>>>>>> objection or not enough votes. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>>>>>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces >>>>>>>>>>>> [2] >>>>>>>>>>>> >>>>>>>>>>>> >>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E >>>>>>>>>> -- >>>>>>>>>> >>>>>>>>>> Benchao Li >>>>>>>>>> School of Electronics Engineering and Computer Science, Peking >>>>> University >>>>>>>>>> Tel:+86-15650713730 >>>>>>>>>> Email:[hidden email];[hidden email] >>>>>>>>>> >>>>>>>> -- >>>>>>>> Best, Jingsong Lee >>>>> >>> > |
Hi all,
The voting time for FLIP-95 has passed. I'm closing the vote now. There were 6 +1 votes, 4 of which are binding: - Jark (binding) - Benchao (non-binding) - Kurt (binding) - Jingsong (binding) - Leonard (non-binding) - Dawid (binding) There were no disapproving votes. Thus, FLIP-95 has been accepted. Thanks everyone for joining the discussion and giving feedback! Best, Timo On 03.04.20 10:52, Timo Walther wrote: > Hi Dawid, > > thanks for your response. Let's postpone the discussion around a single > interface into a later issue. It seems that we reached a consensus on > the remaining interfaces. Given the size of the FLIP, I think it is fine > to conclude the voting now. > > Thanks, > Timo > > On 03.04.20 10:42, Dawid Wysakowicz wrote: >> Hi all, >> >> Just to make it clear. I don't want to block the whole effort. I'm big >> +1 on the whole document and -0 on using the TableSchema for a >> projection pushdown. >> >> My personal opinion was that TableSchema is misleading for Projection >> pushdown and I still think that way. That's why I wanted to bring it up >> to see if I am the only one. If that's the case, let's just proceed with >> the TableSchema. >> >> Best, >> >> Dawid >> >> On 02/04/2020 17:19, Jark Wu wrote: >>> Hi Timo, >>> >>> I don't think source should work with `CatalogTableSchema`. So far, a >>> table >>> source doesn't need to know the logic information of computed column and >>> watermark. >>> IMO, we should provide a method to convert from `CatalogTableSchema` >>> into >>> `TableSchema` without computed columns in source factory, >>> and a source should just hold the `TableSchema`. >>> >>> I agree doing the intersection/diff logic is trivial, but maybe we can >>> provide utilities to do that? So that we can keep the interface clean. >>> >>> >>> Best, >>> Jark >>> >>> >>> On Thu, 2 Apr 2020 at 20:17, Timo Walther <[hidden email]> wrote: >>> >>>> Hi Jark, >>>> >>>> if catalogs use `CatalogTableSchema` in the future. The source would >>>> internally also work with `CatalogTableSchema`. I'm fine with cleaning >>>> up the `TableSchema` class but should a source deal with two different >>>> schema classes then? >>>> >>>> Another problem that I see is that connectors usually need to perform >>>> some index arithmetics. Dealing with TableSchema and additionally >>>> within >>>> a field with DataType might be a bit inconvenient. A dedicated class >>>> with utilities might be helpful such that not every source needs to >>>> implement the same intersection/diff logic again. >>>> >>>> Regards, >>>> Timo >>>> >>>> >>>> On 02.04.20 14:06, Jark Wu wrote: >>>>> Hi Dawid, >>>>> >>>>>> How to express projections with TableSchema? >>>>> The TableSource holds the original TableSchema (i.e. from DDL) and the >>>>> pushed TableSchema represents the schema after projection. >>>>> Thus the table source can compare them to figure out changed field >>>>> orders >>>>> or not matched types. >>>>> For most sources who maps physical storage by field names (e.g. jdbc, >>>>> hbase, json) they can just simply apply the pushed TableSchema. >>>>> But sources who maps by field indexes (e.g. csv), they need to >>>>> figure out >>>>> the projected indexes by comparing the original and projected schema. >>>>> For example, the original schema is [a: String, b: Int, c: Timestamp], >>>> and >>>>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. >>>>> So the >>>>> source can figure out index=1 is pruned. >>>>> >>>>>> How do we express projection of a nested field with TableSchema? >>>>> This is the same to the above one. For example, the original schema is >>>> [rk: >>>>> String, f1 Row<q1 Int, q2 Double>]. >>>>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2 >>>>> Double>]. >>>>> >>>>>> TableSchema might be used at too many different places for different >>>>> responsibilities. >>>>> Agree. We have recognized that a structure and builder for pure table >>>>> schema is required in many places. But we mixed many concepts of >>>>> catalog >>>>> table schema in TableSchema. >>>>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new >>>>> `CatalogTableSchema` to represent the schema part of a DDL, >>>>> and remove all the watermark, computed column information from >>>> TableSchema? >>>>> Then `TableSchema` can continue to serve as a pure table schema and it >>>>> stays in a good package. >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote: >>>>> >>>>>> Hi Dawid, >>>>>> >>>>>> thanks for your feedback. I agree with your concerns. I also observed >>>>>> that TableSchema might be used at too many different places for >>>>>> different responsibilities. >>>>>> >>>>>> How about we introduce a helper class for >>>>>> `SupportsProjectionPushDown` >>>>>> and also `LookupTableSource#Context#getKeys()` to represent nested >>>>>> structure of names. Data types, constraints, or computed columns >>>>>> are not >>>>>> necessary at those locations. >>>>>> >>>>>> We can also add utility methods for connectors to this helper class >>>>>> there to quickly figuring out differences between the original table >>>>>> schema and the new one. >>>>>> >>>>>> SelectedFields { >>>>>> >>>>>> private LogicalType orignalRowType; // set by the planner >>>>>> >>>>>> private int[][] indices; >>>>>> >>>>>> getNames(int... at): String[] >>>>>> >>>>>> getNames(String... at): String[] >>>>>> >>>>>> getIndices(int... at): int[] >>>>>> >>>>>> getNames(String... at): String[] >>>>>> >>>>>> toTableSchema(): TableSchema >>>>>> } >>>>>> >>>>>> What do others think? >>>>>> >>>>>> Thanks, >>>>>> Timo >>>>>> >>>>>> >>>>>> >>>>>> On 02.04.20 12:28, Dawid Wysakowicz wrote: >>>>>>> Generally +1 >>>>>>> >>>>>>> One slight concern I have is about the >>>>>>> |SupportsProjectionPushDown.|I >>>>>>> don't necessarily understand how can we express projections with >>>>>>> TableSchema. It's unclear for me what happens when a type of a field >>>>>>> changes, fields are in a different order, when types do not >>>>>>> match. How >>>>>>> do we express projection of a nested field with TableSchema? >>>>>>> >>>>>>> I don't think this changes the core design presented in the FLIP, >>>>>>> therefore I'm fine with accepting the FLIP. I wanted to mention my >>>>>>> concerns, so that maybe we can adjust the passed around structures >>>>>> slightly. >>>>>>> Best, >>>>>>> >>>>>>> Dawid >>>>>>> || >>>>>>> >>>>>>> On 30/03/2020 14:42, Leonard Xu wrote: >>>>>>>> +1(non-binding) >>>>>>>> >>>>>>>> Best, >>>>>>>> Leonard Xu >>>>>>>> >>>>>>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]> >>>>>>>>> 写道: >>>>>>>>> >>>>>>>>> +1 >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jingsong Lee >>>>>>>>> >>>>>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Kurt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]> >>>>>> wrote: >>>>>>>>>>> +1 (non-binding) >>>>>>>>>>> >>>>>>>>>>> Jark Wu<[hidden email]> 于2020年3月30日周一 下午3:57写道: >>>>>>>>>>> >>>>>>>>>>>> +1 from my side. >>>>>>>>>>>> >>>>>>>>>>>> Thanks Timo for driving this. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Jark >>>>>>>>>>>> >>>>>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]> >>>>>> wrote: >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is >>>> discussed >>>>>>>>>> and >>>>>>>>>>>>> reached a consensus in the discussion thread [2]. >>>>>>>>>>>>> >>>>>>>>>>>>> The vote will be open until April 2nd (72h), unless there >>>>>>>>>>>>> is an >>>>>>>>>>>>> objection or not enough votes. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Timo >>>>>>>>>>>>> >>>>>>>>>>>>> [1] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces >>>> >>>>>>>>>>>>> [2] >>>>>>>>>>>>> >>>>>>>>>>>>> >>>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E >>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> Benchao Li >>>>>>>>>>> School of Electronics Engineering and Computer Science, Peking >>>>>> University >>>>>>>>>>> Tel:+86-15650713730 >>>>>>>>>>> Email:[hidden email];[hidden email] >>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best, Jingsong Lee >>>>>> >>>> >> |
Free forum by Nabble | Edit this page |