[VOTE] FLIP-95: New TableSource and TableSink interfaces

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[VOTE] FLIP-95: New TableSource and TableSink interfaces

Timo Walther-2
Hi all,

I would like to start the vote for FLIP-95 [1], which is discussed and
reached a consensus in the discussion thread [2].

The vote will be open until April 2nd (72h), unless there is an
objection or not enough votes.

Thanks,
Timo

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
[2]
https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Jark Wu-2
+1 from my side.

Thanks Timo for driving this.

Best,
Jark

On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote:

> Hi all,
>
> I would like to start the vote for FLIP-95 [1], which is discussed and
> reached a consensus in the discussion thread [2].
>
> The vote will be open until April 2nd (72h), unless there is an
> objection or not enough votes.
>
> Thanks,
> Timo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> [2]
>
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Benchao Li
+1 (non-binding)

Jark Wu <[hidden email]> 于2020年3月30日周一 下午3:57写道:

> +1 from my side.
>
> Thanks Timo for driving this.
>
> Best,
> Jark
>
> On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote:
>
> > Hi all,
> >
> > I would like to start the vote for FLIP-95 [1], which is discussed and
> > reached a consensus in the discussion thread [2].
> >
> > The vote will be open until April 2nd (72h), unless there is an
> > objection or not enough votes.
> >
> > Thanks,
> > Timo
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > [2]
> >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> >
>


--

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Kurt Young-2
+1

Best,
Kurt


On Mon, Mar 30, 2020 at 4:08 PM Benchao Li <[hidden email]> wrote:

> +1 (non-binding)
>
> Jark Wu <[hidden email]> 于2020年3月30日周一 下午3:57写道:
>
> > +1 from my side.
> >
> > Thanks Timo for driving this.
> >
> > Best,
> > Jark
> >
> > On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote:
> >
> > > Hi all,
> > >
> > > I would like to start the vote for FLIP-95 [1], which is discussed and
> > > reached a consensus in the discussion thread [2].
> > >
> > > The vote will be open until April 2nd (72h), unless there is an
> > > objection or not enough votes.
> > >
> > > Thanks,
> > > Timo
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > > [2]
> > >
> > >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> > >
> >
>
>
> --
>
> Benchao Li
> School of Electronics Engineering and Computer Science, Peking University
> Tel:+86-15650713730
> Email: [hidden email]; [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Jingsong Li
+1

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 4:41 PM Kurt Young <[hidden email]> wrote:

> +1
>
> Best,
> Kurt
>
>
> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li <[hidden email]> wrote:
>
> > +1 (non-binding)
> >
> > Jark Wu <[hidden email]> 于2020年3月30日周一 下午3:57写道:
> >
> > > +1 from my side.
> > >
> > > Thanks Timo for driving this.
> > >
> > > Best,
> > > Jark
> > >
> > > On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to start the vote for FLIP-95 [1], which is discussed
> and
> > > > reached a consensus in the discussion thread [2].
> > > >
> > > > The vote will be open until April 2nd (72h), unless there is an
> > > > objection or not enough votes.
> > > >
> > > > Thanks,
> > > > Timo
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> > > > [2]
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> > > >
> > >
> >
> >
> > --
> >
> > Benchao Li
> > School of Electronics Engineering and Computer Science, Peking University
> > Tel:+86-15650713730
> > Email: [hidden email]; [hidden email]
> >
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Leonard Xu
+1(non-binding)

Best,
Leonard Xu

> 在 2020年3月30日,16:43,Jingsong Li <[hidden email]> 写道:
>
> +1
>
> Best,
> Jingsong Lee
>
> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young <[hidden email]> wrote:
>
>> +1
>>
>> Best,
>> Kurt
>>
>>
>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li <[hidden email]> wrote:
>>
>>> +1 (non-binding)
>>>
>>> Jark Wu <[hidden email]> 于2020年3月30日周一 下午3:57写道:
>>>
>>>> +1 from my side.
>>>>
>>>> Thanks Timo for driving this.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther <[hidden email]> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I would like to start the vote for FLIP-95 [1], which is discussed
>> and
>>>>> reached a consensus in the discussion thread [2].
>>>>>
>>>>> The vote will be open until April 2nd (72h), unless there is an
>>>>> objection or not enough votes.
>>>>>
>>>>> Thanks,
>>>>> Timo
>>>>>
>>>>> [1]
>>>>>
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>>>>> [2]
>>>>>
>>>>>
>>>>
>>>
>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Benchao Li
>>> School of Electronics Engineering and Computer Science, Peking University
>>> Tel:+86-15650713730
>>> Email: [hidden email]; [hidden email]
>>>
>>
>
>
> --
> Best, Jingsong Lee

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

dwysakowicz

Generally +1

One slight concern I have is about the SupportsProjectionPushDown.I don't necessarily understand how can we express projections with TableSchema. It's unclear for me what happens when a type of a field changes, fields are in a different order, when types do not match. How do we express projection of a nested field with TableSchema?

I don't think this changes the core design presented in the FLIP, therefore I'm fine with accepting the FLIP. I wanted to mention my concerns, so that maybe we can adjust the passed around structures slightly.

Best,

Dawid

On 30/03/2020 14:42, Leonard Xu wrote:
+1(non-binding)

Best,
Leonard Xu

在 2020年3月30日,16:43,Jingsong Li [hidden email] 写道:

+1

Best,
Jingsong Lee

On Mon, Mar 30, 2020 at 4:41 PM Kurt Young [hidden email] wrote:

+1

Best,
Kurt


On Mon, Mar 30, 2020 at 4:08 PM Benchao Li [hidden email] wrote:

+1 (non-binding)

Jark Wu [hidden email] 于2020年3月30日周一 下午3:57写道:

+1 from my side.

Thanks Timo for driving this.

Best,
Jark

On Mon, 30 Mar 2020 at 15:36, Timo Walther [hidden email] wrote:

Hi all,

I would like to start the vote for FLIP-95 [1], which is discussed
and
reached a consensus in the discussion thread [2].

The vote will be open until April 2nd (72h), unless there is an
objection or not enough votes.

Thanks,
Timo

[1]



            

          
https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
[2]



            

          
https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E

              

            

--

Benchao Li
School of Electronics Engineering and Computer Science, Peking University
<a class="moz-txt-link-freetext" href="Tel:+86-15650713730">Tel:+86-15650713730
Email: [hidden email]; [hidden email]


        

-- 
Best, Jingsong Lee

    

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Timo Walther-2
Hi Dawid,

thanks for your feedback. I agree with your concerns. I also observed
that TableSchema might be used at too many different places for
different responsibilities.

How about we introduce a helper class for `SupportsProjectionPushDown`
and also `LookupTableSource#Context#getKeys()` to represent nested
structure of names. Data types, constraints, or computed columns are not
necessary at those locations.

We can also add utility methods for connectors to this helper class
there to quickly figuring out differences between the original table
schema and the new one.

SelectedFields {

        private LogicalType orignalRowType; // set by the planner

        private int[][] indices;

        getNames(int... at): String[]

        getNames(String... at): String[]

        getIndices(int... at): int[]

        getNames(String... at): String[]

        toTableSchema(): TableSchema
}

What do others think?

Thanks,
Timo



On 02.04.20 12:28, Dawid Wysakowicz wrote:

> Generally +1
>
> One slight concern I have is about the |SupportsProjectionPushDown.|I
> don't necessarily understand how can we express projections with
> TableSchema. It's unclear for me what happens when a type of a field
> changes, fields are in a different order, when types do not match. How
> do we express projection of a nested field with TableSchema?
>
> I don't think this changes the core design presented in the FLIP,
> therefore I'm fine with accepting the FLIP. I wanted to mention my
> concerns, so that maybe we can adjust the passed around structures slightly.
>
> Best,
>
> Dawid
> ||
>
> On 30/03/2020 14:42, Leonard Xu wrote:
>> +1(non-binding)
>>
>> Best,
>> Leonard Xu
>>
>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]>  写道:
>>>
>>> +1
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]>  wrote:
>>>
>>>> +1
>>>>
>>>> Best,
>>>> Kurt
>>>>
>>>>
>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]>  wrote:
>>>>
>>>>> +1 (non-binding)
>>>>>
>>>>> Jark Wu<[hidden email]>  于2020年3月30日周一 下午3:57写道:
>>>>>
>>>>>> +1 from my side.
>>>>>>
>>>>>> Thanks Timo for driving this.
>>>>>>
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]>  wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I would like to start the vote for FLIP-95 [1], which is discussed
>>>> and
>>>>>>> reached a consensus in the discussion thread [2].
>>>>>>>
>>>>>>> The vote will be open until April 2nd (72h), unless there is an
>>>>>>> objection or not enough votes.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Timo
>>>>>>>
>>>>>>> [1]
>>>>>>>
>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>>>>>>> [2]
>>>>>>>
>>>>>>>
>>>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
>>>>> --
>>>>>
>>>>> Benchao Li
>>>>> School of Electronics Engineering and Computer Science, Peking University
>>>>> Tel:+86-15650713730
>>>>> Email:[hidden email];[hidden email]
>>>>>
>>> --
>>> Best, Jingsong Lee

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Jark Wu-2
Hi Dawid,

> How to express projections with TableSchema?
The TableSource holds the original TableSchema (i.e. from DDL) and the
pushed TableSchema represents the schema after projection.
Thus the table source can compare them to figure out changed field orders
or not matched types.
For most sources who maps physical storage by field names (e.g. jdbc,
hbase, json) they can just simply apply the pushed TableSchema.
But sources who maps by field indexes (e.g. csv), they need to figure out
the projected indexes by comparing the original and projected schema.
For example, the original schema is [a: String, b: Int, c: Timestamp], and
b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
source can figure out index=1 is pruned.

> How do we express projection of a nested field with TableSchema?
This is the same to the above one. For example, the original schema is [rk:
String, f1 Row<q1 Int, q2 Double>].
If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
Double>].

> TableSchema might be used at too many different places for different
responsibilities.
Agree. We have recognized that a structure and builder for pure table
schema is required in many places. But we mixed many concepts of catalog
table schema in TableSchema.
IIRC, in an offline discussion of FLIP-84, we want to introduce a new
`CatalogTableSchema` to represent the schema part of a DDL,
and remove all the watermark, computed column information from TableSchema?
Then `TableSchema` can continue to serve as a pure table schema and it
stays in a good package.

Best,
Jark




On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote:

> Hi Dawid,
>
> thanks for your feedback. I agree with your concerns. I also observed
> that TableSchema might be used at too many different places for
> different responsibilities.
>
> How about we introduce a helper class for `SupportsProjectionPushDown`
> and also `LookupTableSource#Context#getKeys()` to represent nested
> structure of names. Data types, constraints, or computed columns are not
> necessary at those locations.
>
> We can also add utility methods for connectors to this helper class
> there to quickly figuring out differences between the original table
> schema and the new one.
>
> SelectedFields {
>
>         private LogicalType orignalRowType; // set by the planner
>
>         private int[][] indices;
>
>         getNames(int... at): String[]
>
>         getNames(String... at): String[]
>
>         getIndices(int... at): int[]
>
>         getNames(String... at): String[]
>
>         toTableSchema(): TableSchema
> }
>
> What do others think?
>
> Thanks,
> Timo
>
>
>
> On 02.04.20 12:28, Dawid Wysakowicz wrote:
> > Generally +1
> >
> > One slight concern I have is about the |SupportsProjectionPushDown.|I
> > don't necessarily understand how can we express projections with
> > TableSchema. It's unclear for me what happens when a type of a field
> > changes, fields are in a different order, when types do not match. How
> > do we express projection of a nested field with TableSchema?
> >
> > I don't think this changes the core design presented in the FLIP,
> > therefore I'm fine with accepting the FLIP. I wanted to mention my
> > concerns, so that maybe we can adjust the passed around structures
> slightly.
> >
> > Best,
> >
> > Dawid
> > ||
> >
> > On 30/03/2020 14:42, Leonard Xu wrote:
> >> +1(non-binding)
> >>
> >> Best,
> >> Leonard Xu
> >>
> >>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]>  写道:
> >>>
> >>> +1
> >>>
> >>> Best,
> >>> Jingsong Lee
> >>>
> >>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]>  wrote:
> >>>
> >>>> +1
> >>>>
> >>>> Best,
> >>>> Kurt
> >>>>
> >>>>
> >>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]>
> wrote:
> >>>>
> >>>>> +1 (non-binding)
> >>>>>
> >>>>> Jark Wu<[hidden email]>  于2020年3月30日周一 下午3:57写道:
> >>>>>
> >>>>>> +1 from my side.
> >>>>>>
> >>>>>> Thanks Timo for driving this.
> >>>>>>
> >>>>>> Best,
> >>>>>> Jark
> >>>>>>
> >>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]>
> wrote:
> >>>>>>
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I would like to start the vote for FLIP-95 [1], which is discussed
> >>>> and
> >>>>>>> reached a consensus in the discussion thread [2].
> >>>>>>>
> >>>>>>> The vote will be open until April 2nd (72h), unless there is an
> >>>>>>> objection or not enough votes.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Timo
> >>>>>>>
> >>>>>>> [1]
> >>>>>>>
> >>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> >>>>>>> [2]
> >>>>>>>
> >>>>>>>
> >>>>
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> >>>>> --
> >>>>>
> >>>>> Benchao Li
> >>>>> School of Electronics Engineering and Computer Science, Peking
> University
> >>>>> Tel:+86-15650713730
> >>>>> Email:[hidden email];[hidden email]
> >>>>>
> >>> --
> >>> Best, Jingsong Lee
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Timo Walther-2
Hi Jark,

if catalogs use `CatalogTableSchema` in the future. The source would
internally also work with `CatalogTableSchema`. I'm fine with cleaning
up the `TableSchema` class but should a source deal with two different
schema classes then?

Another problem that I see is that connectors usually need to perform
some index arithmetics. Dealing with TableSchema and additionally within
a field with DataType might be a bit inconvenient. A dedicated class
with utilities might be helpful such that not every source needs to
implement the same intersection/diff logic again.

Regards,
Timo


On 02.04.20 14:06, Jark Wu wrote:

> Hi Dawid,
>
>> How to express projections with TableSchema?
> The TableSource holds the original TableSchema (i.e. from DDL) and the
> pushed TableSchema represents the schema after projection.
> Thus the table source can compare them to figure out changed field orders
> or not matched types.
> For most sources who maps physical storage by field names (e.g. jdbc,
> hbase, json) they can just simply apply the pushed TableSchema.
> But sources who maps by field indexes (e.g. csv), they need to figure out
> the projected indexes by comparing the original and projected schema.
> For example, the original schema is [a: String, b: Int, c: Timestamp], and
> b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
> source can figure out index=1 is pruned.
>
>> How do we express projection of a nested field with TableSchema?
> This is the same to the above one. For example, the original schema is [rk:
> String, f1 Row<q1 Int, q2 Double>].
> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
> Double>].
>
>> TableSchema might be used at too many different places for different
> responsibilities.
> Agree. We have recognized that a structure and builder for pure table
> schema is required in many places. But we mixed many concepts of catalog
> table schema in TableSchema.
> IIRC, in an offline discussion of FLIP-84, we want to introduce a new
> `CatalogTableSchema` to represent the schema part of a DDL,
> and remove all the watermark, computed column information from TableSchema?
> Then `TableSchema` can continue to serve as a pure table schema and it
> stays in a good package.
>
> Best,
> Jark
>
>
>
>
> On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote:
>
>> Hi Dawid,
>>
>> thanks for your feedback. I agree with your concerns. I also observed
>> that TableSchema might be used at too many different places for
>> different responsibilities.
>>
>> How about we introduce a helper class for `SupportsProjectionPushDown`
>> and also `LookupTableSource#Context#getKeys()` to represent nested
>> structure of names. Data types, constraints, or computed columns are not
>> necessary at those locations.
>>
>> We can also add utility methods for connectors to this helper class
>> there to quickly figuring out differences between the original table
>> schema and the new one.
>>
>> SelectedFields {
>>
>>          private LogicalType orignalRowType; // set by the planner
>>
>>          private int[][] indices;
>>
>>          getNames(int... at): String[]
>>
>>          getNames(String... at): String[]
>>
>>          getIndices(int... at): int[]
>>
>>          getNames(String... at): String[]
>>
>>          toTableSchema(): TableSchema
>> }
>>
>> What do others think?
>>
>> Thanks,
>> Timo
>>
>>
>>
>> On 02.04.20 12:28, Dawid Wysakowicz wrote:
>>> Generally +1
>>>
>>> One slight concern I have is about the |SupportsProjectionPushDown.|I
>>> don't necessarily understand how can we express projections with
>>> TableSchema. It's unclear for me what happens when a type of a field
>>> changes, fields are in a different order, when types do not match. How
>>> do we express projection of a nested field with TableSchema?
>>>
>>> I don't think this changes the core design presented in the FLIP,
>>> therefore I'm fine with accepting the FLIP. I wanted to mention my
>>> concerns, so that maybe we can adjust the passed around structures
>> slightly.
>>>
>>> Best,
>>>
>>> Dawid
>>> ||
>>>
>>> On 30/03/2020 14:42, Leonard Xu wrote:
>>>> +1(non-binding)
>>>>
>>>> Best,
>>>> Leonard Xu
>>>>
>>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]>  写道:
>>>>>
>>>>> +1
>>>>>
>>>>> Best,
>>>>> Jingsong Lee
>>>>>
>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]>  wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Best,
>>>>>> Kurt
>>>>>>
>>>>>>
>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]>
>> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Jark Wu<[hidden email]>  于2020年3月30日周一 下午3:57写道:
>>>>>>>
>>>>>>>> +1 from my side.
>>>>>>>>
>>>>>>>> Thanks Timo for driving this.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jark
>>>>>>>>
>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]>
>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is discussed
>>>>>> and
>>>>>>>>> reached a consensus in the discussion thread [2].
>>>>>>>>>
>>>>>>>>> The vote will be open until April 2nd (72h), unless there is an
>>>>>>>>> objection or not enough votes.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Timo
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>>
>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>>>>>>>>> [2]
>>>>>>>>>
>>>>>>>>>
>>>>>>
>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
>>>>>>> --
>>>>>>>
>>>>>>> Benchao Li
>>>>>>> School of Electronics Engineering and Computer Science, Peking
>> University
>>>>>>> Tel:+86-15650713730
>>>>>>> Email:[hidden email];[hidden email]
>>>>>>>
>>>>> --
>>>>> Best, Jingsong Lee
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Jark Wu-2
Hi Timo,

I don't think source should work with `CatalogTableSchema`. So far, a table
source doesn't need to know the logic information of computed column and
watermark.
IMO, we should provide a method to convert from `CatalogTableSchema` into
`TableSchema` without computed columns in source factory,
and a source should just hold the `TableSchema`.

I agree doing the intersection/diff logic is trivial, but maybe we can
provide utilities to do that? So that we can keep the interface clean.


Best,
Jark


On Thu, 2 Apr 2020 at 20:17, Timo Walther <[hidden email]> wrote:

> Hi Jark,
>
> if catalogs use `CatalogTableSchema` in the future. The source would
> internally also work with `CatalogTableSchema`. I'm fine with cleaning
> up the `TableSchema` class but should a source deal with two different
> schema classes then?
>
> Another problem that I see is that connectors usually need to perform
> some index arithmetics. Dealing with TableSchema and additionally within
> a field with DataType might be a bit inconvenient. A dedicated class
> with utilities might be helpful such that not every source needs to
> implement the same intersection/diff logic again.
>
> Regards,
> Timo
>
>
> On 02.04.20 14:06, Jark Wu wrote:
> > Hi Dawid,
> >
> >> How to express projections with TableSchema?
> > The TableSource holds the original TableSchema (i.e. from DDL) and the
> > pushed TableSchema represents the schema after projection.
> > Thus the table source can compare them to figure out changed field orders
> > or not matched types.
> > For most sources who maps physical storage by field names (e.g. jdbc,
> > hbase, json) they can just simply apply the pushed TableSchema.
> > But sources who maps by field indexes (e.g. csv), they need to figure out
> > the projected indexes by comparing the original and projected schema.
> > For example, the original schema is [a: String, b: Int, c: Timestamp],
> and
> > b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
> > source can figure out index=1 is pruned.
> >
> >> How do we express projection of a nested field with TableSchema?
> > This is the same to the above one. For example, the original schema is
> [rk:
> > String, f1 Row<q1 Int, q2 Double>].
> > If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
> > Double>].
> >
> >> TableSchema might be used at too many different places for different
> > responsibilities.
> > Agree. We have recognized that a structure and builder for pure table
> > schema is required in many places. But we mixed many concepts of catalog
> > table schema in TableSchema.
> > IIRC, in an offline discussion of FLIP-84, we want to introduce a new
> > `CatalogTableSchema` to represent the schema part of a DDL,
> > and remove all the watermark, computed column information from
> TableSchema?
> > Then `TableSchema` can continue to serve as a pure table schema and it
> > stays in a good package.
> >
> > Best,
> > Jark
> >
> >
> >
> >
> > On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote:
> >
> >> Hi Dawid,
> >>
> >> thanks for your feedback. I agree with your concerns. I also observed
> >> that TableSchema might be used at too many different places for
> >> different responsibilities.
> >>
> >> How about we introduce a helper class for `SupportsProjectionPushDown`
> >> and also `LookupTableSource#Context#getKeys()` to represent nested
> >> structure of names. Data types, constraints, or computed columns are not
> >> necessary at those locations.
> >>
> >> We can also add utility methods for connectors to this helper class
> >> there to quickly figuring out differences between the original table
> >> schema and the new one.
> >>
> >> SelectedFields {
> >>
> >>          private LogicalType orignalRowType; // set by the planner
> >>
> >>          private int[][] indices;
> >>
> >>          getNames(int... at): String[]
> >>
> >>          getNames(String... at): String[]
> >>
> >>          getIndices(int... at): int[]
> >>
> >>          getNames(String... at): String[]
> >>
> >>          toTableSchema(): TableSchema
> >> }
> >>
> >> What do others think?
> >>
> >> Thanks,
> >> Timo
> >>
> >>
> >>
> >> On 02.04.20 12:28, Dawid Wysakowicz wrote:
> >>> Generally +1
> >>>
> >>> One slight concern I have is about the |SupportsProjectionPushDown.|I
> >>> don't necessarily understand how can we express projections with
> >>> TableSchema. It's unclear for me what happens when a type of a field
> >>> changes, fields are in a different order, when types do not match. How
> >>> do we express projection of a nested field with TableSchema?
> >>>
> >>> I don't think this changes the core design presented in the FLIP,
> >>> therefore I'm fine with accepting the FLIP. I wanted to mention my
> >>> concerns, so that maybe we can adjust the passed around structures
> >> slightly.
> >>>
> >>> Best,
> >>>
> >>> Dawid
> >>> ||
> >>>
> >>> On 30/03/2020 14:42, Leonard Xu wrote:
> >>>> +1(non-binding)
> >>>>
> >>>> Best,
> >>>> Leonard Xu
> >>>>
> >>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]>  写道:
> >>>>>
> >>>>> +1
> >>>>>
> >>>>> Best,
> >>>>> Jingsong Lee
> >>>>>
> >>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]>  wrote:
> >>>>>
> >>>>>> +1
> >>>>>>
> >>>>>> Best,
> >>>>>> Kurt
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]>
> >> wrote:
> >>>>>>
> >>>>>>> +1 (non-binding)
> >>>>>>>
> >>>>>>> Jark Wu<[hidden email]>  于2020年3月30日周一 下午3:57写道:
> >>>>>>>
> >>>>>>>> +1 from my side.
> >>>>>>>>
> >>>>>>>> Thanks Timo for driving this.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]>
> >> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all,
> >>>>>>>>>
> >>>>>>>>> I would like to start the vote for FLIP-95 [1], which is
> discussed
> >>>>>> and
> >>>>>>>>> reached a consensus in the discussion thread [2].
> >>>>>>>>>
> >>>>>>>>> The vote will be open until April 2nd (72h), unless there is an
> >>>>>>>>> objection or not enough votes.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Timo
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
> >>>>>>>>> [2]
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>
> >>
> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
> >>>>>>> --
> >>>>>>>
> >>>>>>> Benchao Li
> >>>>>>> School of Electronics Engineering and Computer Science, Peking
> >> University
> >>>>>>> Tel:+86-15650713730
> >>>>>>> Email:[hidden email];[hidden email]
> >>>>>>>
> >>>>> --
> >>>>> Best, Jingsong Lee
> >>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

dwysakowicz
Hi all,

Just to make it clear. I don't want to block the whole effort. I'm big
+1 on the whole document and -0 on using the TableSchema for a
projection pushdown.

My personal opinion was that TableSchema is misleading for Projection
pushdown and I still think that way. That's why I wanted to bring it up
to see if I am the only one. If that's the case, let's just proceed with
the TableSchema.

Best,

Dawid

On 02/04/2020 17:19, Jark Wu wrote:

> Hi Timo,
>
> I don't think source should work with `CatalogTableSchema`. So far, a table
> source doesn't need to know the logic information of computed column and
> watermark.
> IMO, we should provide a method to convert from `CatalogTableSchema` into
> `TableSchema` without computed columns in source factory,
> and a source should just hold the `TableSchema`.
>
> I agree doing the intersection/diff logic is trivial, but maybe we can
> provide utilities to do that? So that we can keep the interface clean.
>
>
> Best,
> Jark
>
>
> On Thu, 2 Apr 2020 at 20:17, Timo Walther <[hidden email]> wrote:
>
>> Hi Jark,
>>
>> if catalogs use `CatalogTableSchema` in the future. The source would
>> internally also work with `CatalogTableSchema`. I'm fine with cleaning
>> up the `TableSchema` class but should a source deal with two different
>> schema classes then?
>>
>> Another problem that I see is that connectors usually need to perform
>> some index arithmetics. Dealing with TableSchema and additionally within
>> a field with DataType might be a bit inconvenient. A dedicated class
>> with utilities might be helpful such that not every source needs to
>> implement the same intersection/diff logic again.
>>
>> Regards,
>> Timo
>>
>>
>> On 02.04.20 14:06, Jark Wu wrote:
>>> Hi Dawid,
>>>
>>>> How to express projections with TableSchema?
>>> The TableSource holds the original TableSchema (i.e. from DDL) and the
>>> pushed TableSchema represents the schema after projection.
>>> Thus the table source can compare them to figure out changed field orders
>>> or not matched types.
>>> For most sources who maps physical storage by field names (e.g. jdbc,
>>> hbase, json) they can just simply apply the pushed TableSchema.
>>> But sources who maps by field indexes (e.g. csv), they need to figure out
>>> the projected indexes by comparing the original and projected schema.
>>> For example, the original schema is [a: String, b: Int, c: Timestamp],
>> and
>>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
>>> source can figure out index=1 is pruned.
>>>
>>>> How do we express projection of a nested field with TableSchema?
>>> This is the same to the above one. For example, the original schema is
>> [rk:
>>> String, f1 Row<q1 Int, q2 Double>].
>>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
>>> Double>].
>>>
>>>> TableSchema might be used at too many different places for different
>>> responsibilities.
>>> Agree. We have recognized that a structure and builder for pure table
>>> schema is required in many places. But we mixed many concepts of catalog
>>> table schema in TableSchema.
>>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new
>>> `CatalogTableSchema` to represent the schema part of a DDL,
>>> and remove all the watermark, computed column information from
>> TableSchema?
>>> Then `TableSchema` can continue to serve as a pure table schema and it
>>> stays in a good package.
>>>
>>> Best,
>>> Jark
>>>
>>>
>>>
>>>
>>> On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote:
>>>
>>>> Hi Dawid,
>>>>
>>>> thanks for your feedback. I agree with your concerns. I also observed
>>>> that TableSchema might be used at too many different places for
>>>> different responsibilities.
>>>>
>>>> How about we introduce a helper class for `SupportsProjectionPushDown`
>>>> and also `LookupTableSource#Context#getKeys()` to represent nested
>>>> structure of names. Data types, constraints, or computed columns are not
>>>> necessary at those locations.
>>>>
>>>> We can also add utility methods for connectors to this helper class
>>>> there to quickly figuring out differences between the original table
>>>> schema and the new one.
>>>>
>>>> SelectedFields {
>>>>
>>>>          private LogicalType orignalRowType; // set by the planner
>>>>
>>>>          private int[][] indices;
>>>>
>>>>          getNames(int... at): String[]
>>>>
>>>>          getNames(String... at): String[]
>>>>
>>>>          getIndices(int... at): int[]
>>>>
>>>>          getNames(String... at): String[]
>>>>
>>>>          toTableSchema(): TableSchema
>>>> }
>>>>
>>>> What do others think?
>>>>
>>>> Thanks,
>>>> Timo
>>>>
>>>>
>>>>
>>>> On 02.04.20 12:28, Dawid Wysakowicz wrote:
>>>>> Generally +1
>>>>>
>>>>> One slight concern I have is about the |SupportsProjectionPushDown.|I
>>>>> don't necessarily understand how can we express projections with
>>>>> TableSchema. It's unclear for me what happens when a type of a field
>>>>> changes, fields are in a different order, when types do not match. How
>>>>> do we express projection of a nested field with TableSchema?
>>>>>
>>>>> I don't think this changes the core design presented in the FLIP,
>>>>> therefore I'm fine with accepting the FLIP. I wanted to mention my
>>>>> concerns, so that maybe we can adjust the passed around structures
>>>> slightly.
>>>>> Best,
>>>>>
>>>>> Dawid
>>>>> ||
>>>>>
>>>>> On 30/03/2020 14:42, Leonard Xu wrote:
>>>>>> +1(non-binding)
>>>>>>
>>>>>> Best,
>>>>>> Leonard Xu
>>>>>>
>>>>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]>  写道:
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> Best,
>>>>>>> Jingsong Lee
>>>>>>>
>>>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]>  wrote:
>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Kurt
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]>
>>>> wrote:
>>>>>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> Jark Wu<[hidden email]>  于2020年3月30日周一 下午3:57写道:
>>>>>>>>>
>>>>>>>>>> +1 from my side.
>>>>>>>>>>
>>>>>>>>>> Thanks Timo for driving this.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Jark
>>>>>>>>>>
>>>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]>
>>>> wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is
>> discussed
>>>>>>>> and
>>>>>>>>>>> reached a consensus in the discussion thread [2].
>>>>>>>>>>>
>>>>>>>>>>> The vote will be open until April 2nd (72h), unless there is an
>>>>>>>>>>> objection or not enough votes.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Timo
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>>
>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>>>>>>>>>>> [2]
>>>>>>>>>>>
>>>>>>>>>>>
>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> Benchao Li
>>>>>>>>> School of Electronics Engineering and Computer Science, Peking
>>>> University
>>>>>>>>> Tel:+86-15650713730
>>>>>>>>> Email:[hidden email];[hidden email]
>>>>>>>>>
>>>>>>> --
>>>>>>> Best, Jingsong Lee
>>>>
>>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

Timo Walther-2
Hi Dawid,

thanks for your response. Let's postpone the discussion around a single
interface into a later issue. It seems that we reached a consensus on
the remaining interfaces. Given the size of the FLIP, I think it is fine
to conclude the voting now.

Thanks,
Timo

On 03.04.20 10:42, Dawid Wysakowicz wrote:

> Hi all,
>
> Just to make it clear. I don't want to block the whole effort. I'm big
> +1 on the whole document and -0 on using the TableSchema for a
> projection pushdown.
>
> My personal opinion was that TableSchema is misleading for Projection
> pushdown and I still think that way. That's why I wanted to bring it up
> to see if I am the only one. If that's the case, let's just proceed with
> the TableSchema.
>
> Best,
>
> Dawid
>
> On 02/04/2020 17:19, Jark Wu wrote:
>> Hi Timo,
>>
>> I don't think source should work with `CatalogTableSchema`. So far, a table
>> source doesn't need to know the logic information of computed column and
>> watermark.
>> IMO, we should provide a method to convert from `CatalogTableSchema` into
>> `TableSchema` without computed columns in source factory,
>> and a source should just hold the `TableSchema`.
>>
>> I agree doing the intersection/diff logic is trivial, but maybe we can
>> provide utilities to do that? So that we can keep the interface clean.
>>
>>
>> Best,
>> Jark
>>
>>
>> On Thu, 2 Apr 2020 at 20:17, Timo Walther <[hidden email]> wrote:
>>
>>> Hi Jark,
>>>
>>> if catalogs use `CatalogTableSchema` in the future. The source would
>>> internally also work with `CatalogTableSchema`. I'm fine with cleaning
>>> up the `TableSchema` class but should a source deal with two different
>>> schema classes then?
>>>
>>> Another problem that I see is that connectors usually need to perform
>>> some index arithmetics. Dealing with TableSchema and additionally within
>>> a field with DataType might be a bit inconvenient. A dedicated class
>>> with utilities might be helpful such that not every source needs to
>>> implement the same intersection/diff logic again.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 02.04.20 14:06, Jark Wu wrote:
>>>> Hi Dawid,
>>>>
>>>>> How to express projections with TableSchema?
>>>> The TableSource holds the original TableSchema (i.e. from DDL) and the
>>>> pushed TableSchema represents the schema after projection.
>>>> Thus the table source can compare them to figure out changed field orders
>>>> or not matched types.
>>>> For most sources who maps physical storage by field names (e.g. jdbc,
>>>> hbase, json) they can just simply apply the pushed TableSchema.
>>>> But sources who maps by field indexes (e.g. csv), they need to figure out
>>>> the projected indexes by comparing the original and projected schema.
>>>> For example, the original schema is [a: String, b: Int, c: Timestamp],
>>> and
>>>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
>>>> source can figure out index=1 is pruned.
>>>>
>>>>> How do we express projection of a nested field with TableSchema?
>>>> This is the same to the above one. For example, the original schema is
>>> [rk:
>>>> String, f1 Row<q1 Int, q2 Double>].
>>>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
>>>> Double>].
>>>>
>>>>> TableSchema might be used at too many different places for different
>>>> responsibilities.
>>>> Agree. We have recognized that a structure and builder for pure table
>>>> schema is required in many places. But we mixed many concepts of catalog
>>>> table schema in TableSchema.
>>>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new
>>>> `CatalogTableSchema` to represent the schema part of a DDL,
>>>> and remove all the watermark, computed column information from
>>> TableSchema?
>>>> Then `TableSchema` can continue to serve as a pure table schema and it
>>>> stays in a good package.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote:
>>>>
>>>>> Hi Dawid,
>>>>>
>>>>> thanks for your feedback. I agree with your concerns. I also observed
>>>>> that TableSchema might be used at too many different places for
>>>>> different responsibilities.
>>>>>
>>>>> How about we introduce a helper class for `SupportsProjectionPushDown`
>>>>> and also `LookupTableSource#Context#getKeys()` to represent nested
>>>>> structure of names. Data types, constraints, or computed columns are not
>>>>> necessary at those locations.
>>>>>
>>>>> We can also add utility methods for connectors to this helper class
>>>>> there to quickly figuring out differences between the original table
>>>>> schema and the new one.
>>>>>
>>>>> SelectedFields {
>>>>>
>>>>>           private LogicalType orignalRowType; // set by the planner
>>>>>
>>>>>           private int[][] indices;
>>>>>
>>>>>           getNames(int... at): String[]
>>>>>
>>>>>           getNames(String... at): String[]
>>>>>
>>>>>           getIndices(int... at): int[]
>>>>>
>>>>>           getNames(String... at): String[]
>>>>>
>>>>>           toTableSchema(): TableSchema
>>>>> }
>>>>>
>>>>> What do others think?
>>>>>
>>>>> Thanks,
>>>>> Timo
>>>>>
>>>>>
>>>>>
>>>>> On 02.04.20 12:28, Dawid Wysakowicz wrote:
>>>>>> Generally +1
>>>>>>
>>>>>> One slight concern I have is about the |SupportsProjectionPushDown.|I
>>>>>> don't necessarily understand how can we express projections with
>>>>>> TableSchema. It's unclear for me what happens when a type of a field
>>>>>> changes, fields are in a different order, when types do not match. How
>>>>>> do we express projection of a nested field with TableSchema?
>>>>>>
>>>>>> I don't think this changes the core design presented in the FLIP,
>>>>>> therefore I'm fine with accepting the FLIP. I wanted to mention my
>>>>>> concerns, so that maybe we can adjust the passed around structures
>>>>> slightly.
>>>>>> Best,
>>>>>>
>>>>>> Dawid
>>>>>> ||
>>>>>>
>>>>>> On 30/03/2020 14:42, Leonard Xu wrote:
>>>>>>> +1(non-binding)
>>>>>>>
>>>>>>> Best,
>>>>>>> Leonard Xu
>>>>>>>
>>>>>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]>  写道:
>>>>>>>>
>>>>>>>> +1
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Jingsong Lee
>>>>>>>>
>>>>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]>  wrote:
>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Kurt
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]>
>>>>> wrote:
>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>
>>>>>>>>>> Jark Wu<[hidden email]>  于2020年3月30日周一 下午3:57写道:
>>>>>>>>>>
>>>>>>>>>>> +1 from my side.
>>>>>>>>>>>
>>>>>>>>>>> Thanks Timo for driving this.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Jark
>>>>>>>>>>>
>>>>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]>
>>>>> wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is
>>> discussed
>>>>>>>>> and
>>>>>>>>>>>> reached a consensus in the discussion thread [2].
>>>>>>>>>>>>
>>>>>>>>>>>> The vote will be open until April 2nd (72h), unless there is an
>>>>>>>>>>>> objection or not enough votes.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Timo
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>>>>>>>>>>>> [2]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E
>>>>>>>>>> --
>>>>>>>>>>
>>>>>>>>>> Benchao Li
>>>>>>>>>> School of Electronics Engineering and Computer Science, Peking
>>>>> University
>>>>>>>>>> Tel:+86-15650713730
>>>>>>>>>> Email:[hidden email];[hidden email]
>>>>>>>>>>
>>>>>>>> --
>>>>>>>> Best, Jingsong Lee
>>>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

[RESULT][VOTE] FLIP-95: New TableSource and TableSink interfaces

Timo Walther-2
Hi all,

The voting time for FLIP-95 has passed. I'm closing the vote now.

There were 6 +1 votes, 4 of which are binding:
- Jark (binding)
- Benchao (non-binding)
- Kurt (binding)
- Jingsong (binding)
- Leonard (non-binding)
- Dawid (binding)

There were no disapproving votes.

Thus, FLIP-95 has been accepted.

Thanks everyone for joining the discussion and giving feedback!

Best,
Timo


On 03.04.20 10:52, Timo Walther wrote:

> Hi Dawid,
>
> thanks for your response. Let's postpone the discussion around a single
> interface into a later issue. It seems that we reached a consensus on
> the remaining interfaces. Given the size of the FLIP, I think it is fine
> to conclude the voting now.
>
> Thanks,
> Timo
>
> On 03.04.20 10:42, Dawid Wysakowicz wrote:
>> Hi all,
>>
>> Just to make it clear. I don't want to block the whole effort. I'm big
>> +1 on the whole document and -0 on using the TableSchema for a
>> projection pushdown.
>>
>> My personal opinion was that TableSchema is misleading for Projection
>> pushdown and I still think that way. That's why I wanted to bring it up
>> to see if I am the only one. If that's the case, let's just proceed with
>> the TableSchema.
>>
>> Best,
>>
>> Dawid
>>
>> On 02/04/2020 17:19, Jark Wu wrote:
>>> Hi Timo,
>>>
>>> I don't think source should work with `CatalogTableSchema`. So far, a
>>> table
>>> source doesn't need to know the logic information of computed column and
>>> watermark.
>>> IMO, we should provide a method to convert from `CatalogTableSchema`
>>> into
>>> `TableSchema` without computed columns in source factory,
>>> and a source should just hold the `TableSchema`.
>>>
>>> I agree doing the intersection/diff logic is trivial, but maybe we can
>>> provide utilities to do that? So that we can keep the interface clean.
>>>
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> On Thu, 2 Apr 2020 at 20:17, Timo Walther <[hidden email]> wrote:
>>>
>>>> Hi Jark,
>>>>
>>>> if catalogs use `CatalogTableSchema` in the future. The source would
>>>> internally also work with `CatalogTableSchema`. I'm fine with cleaning
>>>> up the `TableSchema` class but should a source deal with two different
>>>> schema classes then?
>>>>
>>>> Another problem that I see is that connectors usually need to perform
>>>> some index arithmetics. Dealing with TableSchema and additionally
>>>> within
>>>> a field with DataType might be a bit inconvenient. A dedicated class
>>>> with utilities might be helpful such that not every source needs to
>>>> implement the same intersection/diff logic again.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 02.04.20 14:06, Jark Wu wrote:
>>>>> Hi Dawid,
>>>>>
>>>>>> How to express projections with TableSchema?
>>>>> The TableSource holds the original TableSchema (i.e. from DDL) and the
>>>>> pushed TableSchema represents the schema after projection.
>>>>> Thus the table source can compare them to figure out changed field
>>>>> orders
>>>>> or not matched types.
>>>>> For most sources who maps physical storage by field names (e.g. jdbc,
>>>>> hbase, json) they can just simply apply the pushed TableSchema.
>>>>> But sources who maps by field indexes (e.g. csv), they need to
>>>>> figure out
>>>>> the projected indexes by comparing the original and projected schema.
>>>>> For example, the original schema is [a: String, b: Int, c: Timestamp],
>>>> and
>>>>> b is pruned, then the pushed schema is [a: String, c: Timestamp].
>>>>> So the
>>>>> source can figure out index=1 is pruned.
>>>>>
>>>>>> How do we express projection of a nested field with TableSchema?
>>>>> This is the same to the above one. For example, the original schema is
>>>> [rk:
>>>>> String, f1 Row<q1 Int, q2 Double>].
>>>>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row<q2
>>>>> Double>].
>>>>>
>>>>>> TableSchema might be used at too many different places for different
>>>>> responsibilities.
>>>>> Agree. We have recognized that a structure and builder for pure table
>>>>> schema is required in many places. But we mixed many concepts of
>>>>> catalog
>>>>> table schema in TableSchema.
>>>>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new
>>>>> `CatalogTableSchema` to represent the schema part of a DDL,
>>>>> and remove all the watermark, computed column information from
>>>> TableSchema?
>>>>> Then `TableSchema` can continue to serve as a pure table schema and it
>>>>> stays in a good package.
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 2 Apr 2020 at 19:39, Timo Walther <[hidden email]> wrote:
>>>>>
>>>>>> Hi Dawid,
>>>>>>
>>>>>> thanks for your feedback. I agree with your concerns. I also observed
>>>>>> that TableSchema might be used at too many different places for
>>>>>> different responsibilities.
>>>>>>
>>>>>> How about we introduce a helper class for
>>>>>> `SupportsProjectionPushDown`
>>>>>> and also `LookupTableSource#Context#getKeys()` to represent nested
>>>>>> structure of names. Data types, constraints, or computed columns
>>>>>> are not
>>>>>> necessary at those locations.
>>>>>>
>>>>>> We can also add utility methods for connectors to this helper class
>>>>>> there to quickly figuring out differences between the original table
>>>>>> schema and the new one.
>>>>>>
>>>>>> SelectedFields {
>>>>>>
>>>>>>           private LogicalType orignalRowType; // set by the planner
>>>>>>
>>>>>>           private int[][] indices;
>>>>>>
>>>>>>           getNames(int... at): String[]
>>>>>>
>>>>>>           getNames(String... at): String[]
>>>>>>
>>>>>>           getIndices(int... at): int[]
>>>>>>
>>>>>>           getNames(String... at): String[]
>>>>>>
>>>>>>           toTableSchema(): TableSchema
>>>>>> }
>>>>>>
>>>>>> What do others think?
>>>>>>
>>>>>> Thanks,
>>>>>> Timo
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 02.04.20 12:28, Dawid Wysakowicz wrote:
>>>>>>> Generally +1
>>>>>>>
>>>>>>> One slight concern I have is about the
>>>>>>> |SupportsProjectionPushDown.|I
>>>>>>> don't necessarily understand how can we express projections with
>>>>>>> TableSchema. It's unclear for me what happens when a type of a field
>>>>>>> changes, fields are in a different order, when types do not
>>>>>>> match. How
>>>>>>> do we express projection of a nested field with TableSchema?
>>>>>>>
>>>>>>> I don't think this changes the core design presented in the FLIP,
>>>>>>> therefore I'm fine with accepting the FLIP. I wanted to mention my
>>>>>>> concerns, so that maybe we can adjust the passed around structures
>>>>>> slightly.
>>>>>>> Best,
>>>>>>>
>>>>>>> Dawid
>>>>>>> ||
>>>>>>>
>>>>>>> On 30/03/2020 14:42, Leonard Xu wrote:
>>>>>>>> +1(non-binding)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Leonard Xu
>>>>>>>>
>>>>>>>>> 在 2020年3月30日,16:43,Jingsong Li<[hidden email]>  
>>>>>>>>> 写道:
>>>>>>>>>
>>>>>>>>> +1
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Jingsong Lee
>>>>>>>>>
>>>>>>>>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young<[hidden email]>  
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kurt
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li<[hidden email]>
>>>>>> wrote:
>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>
>>>>>>>>>>> Jark Wu<[hidden email]>  于2020年3月30日周一 下午3:57写道:
>>>>>>>>>>>
>>>>>>>>>>>> +1 from my side.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks Timo for driving this.
>>>>>>>>>>>>
>>>>>>>>>>>> Best,
>>>>>>>>>>>> Jark
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, 30 Mar 2020 at 15:36, Timo Walther<[hidden email]>
>>>>>> wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like to start the vote for FLIP-95 [1], which is
>>>> discussed
>>>>>>>>>> and
>>>>>>>>>>>>> reached a consensus in the discussion thread [2].
>>>>>>>>>>>>>
>>>>>>>>>>>>> The vote will be open until April 2nd (72h), unless there
>>>>>>>>>>>>> is an
>>>>>>>>>>>>> objection or not enough votes.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Timo
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces 
>>>>
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E 
>>>>
>>>>>>>>>>> --
>>>>>>>>>>>
>>>>>>>>>>> Benchao Li
>>>>>>>>>>> School of Electronics Engineering and Computer Science, Peking
>>>>>> University
>>>>>>>>>>> Tel:+86-15650713730
>>>>>>>>>>> Email:[hidden email];[hidden email]
>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best, Jingsong Lee
>>>>>>
>>>>
>>