Hi everyone,
you might have seen that we discussed a better schema API in past as part of FLIP-129 and FLIP-136. We also discussed this topic during different releases: https://issues.apache.org/jira/browse/FLINK-17793 Jark and I had an offline discussion how we can finally fix this shortcoming and maintain backwards compatibile for a couple of releases to give people time to update their code. I would like to propose the following FLIP: https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs The FLIP updates the class hierarchy to achieve the following goals: - make it visible whether a schema is resolved or unresolved and when the resolution happens - offer a unified API for FLIP-129, FLIP-136, and catalogs - allow arbitrary data types and expressions in the schema for watermark spec or columns - have access to other catalogs for declaring a data type or expression via CatalogManager - a cleaned up TableSchema - remain backwards compatible in the persisted properties and API Looking forward to your feedback. Thanks, Timo |
Hi Timo,
From my perspective the proposed changes look good. I agree it is an important step towards FLIP-129 and FLIP-136. Personally I feel comfortable voting on the document. Best, Dawid On 05/02/2021 16:09, Timo Walther wrote: > Hi everyone, > > you might have seen that we discussed a better schema API in past as > part of FLIP-129 and FLIP-136. We also discussed this topic during > different releases: > > https://issues.apache.org/jira/browse/FLINK-17793 > > Jark and I had an offline discussion how we can finally fix this > shortcoming and maintain backwards compatibile for a couple of > releases to give people time to update their code. > > I would like to propose the following FLIP: > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs > > > The FLIP updates the class hierarchy to achieve the following goals: > > - make it visible whether a schema is resolved or unresolved and when > the resolution happens > - offer a unified API for FLIP-129, FLIP-136, and catalogs > - allow arbitrary data types and expressions in the schema for > watermark spec or columns > - have access to other catalogs for declaring a data type or > expression via CatalogManager > - a cleaned up TableSchema > - remain backwards compatible in the persisted properties and API > > Looking forward to your feedback. > > Thanks, > Timo signature.asc (849 bytes) Download Attachment |
Hi Timo,
The messy TableSchema confuses many developers. It's great to see we can finally come up with a clean interface hierarchy and still backward compatible. Thanks for preparing the nice FLIP. It looks good to me. I have some minor comments: 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead of `Column`? 2) You mentioned ResolvedSchema should store ResolvedExpression, should we extend `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`? 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to add un-deprecated methods of `TableSchema` into `ResolvedSchema` (e.g. `getColumnDataTypes()`). Then users can have a smooth migration. Best, Jark On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz <[hidden email]> wrote: > Hi Timo, > > From my perspective the proposed changes look good. I agree it is an > important step towards FLIP-129 and FLIP-136. Personally I feel > comfortable voting on the document. > > Best, > > Dawid > > On 05/02/2021 16:09, Timo Walther wrote: > > Hi everyone, > > > > you might have seen that we discussed a better schema API in past as > > part of FLIP-129 and FLIP-136. We also discussed this topic during > > different releases: > > > > https://issues.apache.org/jira/browse/FLINK-17793 > > > > Jark and I had an offline discussion how we can finally fix this > > shortcoming and maintain backwards compatibile for a couple of > > releases to give people time to update their code. > > > > I would like to propose the following FLIP: > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs > > > > > > The FLIP updates the class hierarchy to achieve the following goals: > > > > - make it visible whether a schema is resolved or unresolved and when > > the resolution happens > > - offer a unified API for FLIP-129, FLIP-136, and catalogs > > - allow arbitrary data types and expressions in the schema for > > watermark spec or columns > > - have access to other catalogs for declaring a data type or > > expression via CatalogManager > > - a cleaned up TableSchema > > - remain backwards compatible in the persisted properties and API > > > > Looking forward to your feedback. > > > > Thanks, > > Timo > > |
Hi Jark,
thanks for your feedback. Let me answer some of your comments: 1) Since we decided to build an entire new stack, we can also introduce better names for columns, constraints, and watermark spec. My goal was to shorten the names during this refactoring. Therefore, `TableSchema` becomes `Schema` and `TableColumn` becomes `Column`. This also fits better to a `CatalogView` that has a schema but is actually not a table but a view. So `Column` is very generic. What do you think? 2) `ComputedColumn` and `WatermarkSpec` of the new generation will store `ResolvedExpression`. 3) I adopted most of the methods from `TableSchema` in `ResolvedSchema`. However, I skipped `getColumnDataTypes()` because the behavior is not clear to me. Should it include computed columns or virtual metadata columns? I think we should force users to think about what they require. Otherwise we implicitly introduce bugs. Regards, Timo On 09.02.21 10:56, Jark Wu wrote: > Hi Timo, > > The messy TableSchema confuses many developers. > It's great to see we can finally come up with a clean interface hierarchy > and still backward compatible. > > Thanks for preparing the nice FLIP. It looks good to me. I have some minor > comments: > > 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead of > `Column`? > > 2) You mentioned ResolvedSchema should store ResolvedExpression, should we > extend > `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`? > > 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to > add un-deprecated > methods of `TableSchema` into `ResolvedSchema` > (e.g. `getColumnDataTypes()`). > Then users can have a smooth migration. > > Best, > Jark > > On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz <[hidden email]> > wrote: > >> Hi Timo, >> >> From my perspective the proposed changes look good. I agree it is an >> important step towards FLIP-129 and FLIP-136. Personally I feel >> comfortable voting on the document. >> >> Best, >> >> Dawid >> >> On 05/02/2021 16:09, Timo Walther wrote: >>> Hi everyone, >>> >>> you might have seen that we discussed a better schema API in past as >>> part of FLIP-129 and FLIP-136. We also discussed this topic during >>> different releases: >>> >>> https://issues.apache.org/jira/browse/FLINK-17793 >>> >>> Jark and I had an offline discussion how we can finally fix this >>> shortcoming and maintain backwards compatibile for a couple of >>> releases to give people time to update their code. >>> >>> I would like to propose the following FLIP: >>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs >>> >>> >>> The FLIP updates the class hierarchy to achieve the following goals: >>> >>> - make it visible whether a schema is resolved or unresolved and when >>> the resolution happens >>> - offer a unified API for FLIP-129, FLIP-136, and catalogs >>> - allow arbitrary data types and expressions in the schema for >>> watermark spec or columns >>> - have access to other catalogs for declaring a data type or >>> expression via CatalogManager >>> - a cleaned up TableSchema >>> - remain backwards compatible in the persisted properties and API >>> >>> Looking forward to your feedback. >>> >>> Thanks, >>> Timo >> >> > |
Hi Timo,
Thanks for the FLIP. It looks good to me overall. I have two questions. 1. When should we use a resolved schema and when to use an unresolved one? 2. The FLIP mentions only resolved tables/views can be stored into a catalog. Does that mean the getTable method should also return a resolved object? On Tue, Feb 9, 2021 at 6:29 PM Timo Walther <[hidden email]> wrote: > Hi Jark, > > thanks for your feedback. Let me answer some of your comments: > > 1) Since we decided to build an entire new stack, we can also introduce > better names for columns, constraints, and watermark spec. My goal was > to shorten the names during this refactoring. Therefore, `TableSchema` > becomes `Schema` and `TableColumn` becomes `Column`. This also fits > better to a `CatalogView` that has a schema but is actually not a table > but a view. So `Column` is very generic. What do you think? > > 2) `ComputedColumn` and `WatermarkSpec` of the new generation will store > `ResolvedExpression`. > > 3) I adopted most of the methods from `TableSchema` in `ResolvedSchema`. > However, I skipped `getColumnDataTypes()` because the behavior is not > clear to me. Should it include computed columns or virtual metadata > columns? I think we should force users to think about what they require. > Otherwise we implicitly introduce bugs. > > Regards, > Timo > > On 09.02.21 10:56, Jark Wu wrote: > > Hi Timo, > > > > The messy TableSchema confuses many developers. > > It's great to see we can finally come up with a clean interface hierarchy > > and still backward compatible. > > > > Thanks for preparing the nice FLIP. It looks good to me. I have some > minor > > comments: > > > > 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead > of > > `Column`? > > > > 2) You mentioned ResolvedSchema should store ResolvedExpression, should > we > > extend > > `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`? > > > > 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to > > add un-deprecated > > methods of `TableSchema` into `ResolvedSchema` > > (e.g. `getColumnDataTypes()`). > > Then users can have a smooth migration. > > > > Best, > > Jark > > > > On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz <[hidden email]> > > wrote: > > > >> Hi Timo, > >> > >> From my perspective the proposed changes look good. I agree it is an > >> important step towards FLIP-129 and FLIP-136. Personally I feel > >> comfortable voting on the document. > >> > >> Best, > >> > >> Dawid > >> > >> On 05/02/2021 16:09, Timo Walther wrote: > >>> Hi everyone, > >>> > >>> you might have seen that we discussed a better schema API in past as > >>> part of FLIP-129 and FLIP-136. We also discussed this topic during > >>> different releases: > >>> > >>> https://issues.apache.org/jira/browse/FLINK-17793 > >>> > >>> Jark and I had an offline discussion how we can finally fix this > >>> shortcoming and maintain backwards compatibile for a couple of > >>> releases to give people time to update their code. > >>> > >>> I would like to propose the following FLIP: > >>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs > >>> > >>> > >>> The FLIP updates the class hierarchy to achieve the following goals: > >>> > >>> - make it visible whether a schema is resolved or unresolved and when > >>> the resolution happens > >>> - offer a unified API for FLIP-129, FLIP-136, and catalogs > >>> - allow arbitrary data types and expressions in the schema for > >>> watermark spec or columns > >>> - have access to other catalogs for declaring a data type or > >>> expression via CatalogManager > >>> - a cleaned up TableSchema > >>> - remain backwards compatible in the persisted properties and API > >>> > >>> Looking forward to your feedback. > >>> > >>> Thanks, > >>> Timo > >> > >> > > > > -- Best regards! Rui Li |
Hi Rui,
1. It depends whether you would like to declare (unresolved) or use (resolved) a schema. In catalogs and APIs, people would actually like to declare a schema. Because the schema might reference objects from other catalogs etc. However, whenever the schema comes out of the framework it is fully resolved and people can use to configure their UI, connector, etc. 2. No, `getTable` doesn't have to return a resolved schema. Actually, this was my initial design (see Rejected Alternatives 1) where we pass the SchemaResolver into the Catalog. However, a catalog must not deal with resolution. When storing a table we need a resolved schema to perist the fully expanded properties, however, when reading those properties in again the schema can be resolved in a later stage. Regards, Timo On 09.02.21 14:07, Rui Li wrote: > Hi Timo, > > Thanks for the FLIP. It looks good to me overall. I have two questions. > 1. When should we use a resolved schema and when to use an unresolved one? > 2. The FLIP mentions only resolved tables/views can be stored into a > catalog. Does that mean the getTable method should also return a resolved > object? > > On Tue, Feb 9, 2021 at 6:29 PM Timo Walther <[hidden email]> wrote: > >> Hi Jark, >> >> thanks for your feedback. Let me answer some of your comments: >> >> 1) Since we decided to build an entire new stack, we can also introduce >> better names for columns, constraints, and watermark spec. My goal was >> to shorten the names during this refactoring. Therefore, `TableSchema` >> becomes `Schema` and `TableColumn` becomes `Column`. This also fits >> better to a `CatalogView` that has a schema but is actually not a table >> but a view. So `Column` is very generic. What do you think? >> >> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will store >> `ResolvedExpression`. >> >> 3) I adopted most of the methods from `TableSchema` in `ResolvedSchema`. >> However, I skipped `getColumnDataTypes()` because the behavior is not >> clear to me. Should it include computed columns or virtual metadata >> columns? I think we should force users to think about what they require. >> Otherwise we implicitly introduce bugs. >> >> Regards, >> Timo >> >> On 09.02.21 10:56, Jark Wu wrote: >>> Hi Timo, >>> >>> The messy TableSchema confuses many developers. >>> It's great to see we can finally come up with a clean interface hierarchy >>> and still backward compatible. >>> >>> Thanks for preparing the nice FLIP. It looks good to me. I have some >> minor >>> comments: >>> >>> 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead >> of >>> `Column`? >>> >>> 2) You mentioned ResolvedSchema should store ResolvedExpression, should >> we >>> extend >>> `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`? >>> >>> 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better to >>> add un-deprecated >>> methods of `TableSchema` into `ResolvedSchema` >>> (e.g. `getColumnDataTypes()`). >>> Then users can have a smooth migration. >>> >>> Best, >>> Jark >>> >>> On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz <[hidden email]> >>> wrote: >>> >>>> Hi Timo, >>>> >>>> From my perspective the proposed changes look good. I agree it is an >>>> important step towards FLIP-129 and FLIP-136. Personally I feel >>>> comfortable voting on the document. >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> On 05/02/2021 16:09, Timo Walther wrote: >>>>> Hi everyone, >>>>> >>>>> you might have seen that we discussed a better schema API in past as >>>>> part of FLIP-129 and FLIP-136. We also discussed this topic during >>>>> different releases: >>>>> >>>>> https://issues.apache.org/jira/browse/FLINK-17793 >>>>> >>>>> Jark and I had an offline discussion how we can finally fix this >>>>> shortcoming and maintain backwards compatibile for a couple of >>>>> releases to give people time to update their code. >>>>> >>>>> I would like to propose the following FLIP: >>>>> >>>>> >>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs >>>>> >>>>> >>>>> The FLIP updates the class hierarchy to achieve the following goals: >>>>> >>>>> - make it visible whether a schema is resolved or unresolved and when >>>>> the resolution happens >>>>> - offer a unified API for FLIP-129, FLIP-136, and catalogs >>>>> - allow arbitrary data types and expressions in the schema for >>>>> watermark spec or columns >>>>> - have access to other catalogs for declaring a data type or >>>>> expression via CatalogManager >>>>> - a cleaned up TableSchema >>>>> - remain backwards compatible in the persisted properties and API >>>>> >>>>> Looking forward to your feedback. >>>>> >>>>> Thanks, >>>>> Timo >>>> >>>> >>> >> >> > |
I see. Makes sense to me. Thanks Timo for the detailed explanation!
On Tue, Feb 9, 2021 at 9:48 PM Timo Walther <[hidden email]> wrote: > Hi Rui, > > 1. It depends whether you would like to declare (unresolved) or use > (resolved) a schema. In catalogs and APIs, people would actually like to > declare a schema. Because the schema might reference objects from other > catalogs etc. However, whenever the schema comes out of the framework it > is fully resolved and people can use to configure their UI, connector, etc. > 2. No, `getTable` doesn't have to return a resolved schema. Actually, > this was my initial design (see Rejected Alternatives 1) where we pass > the SchemaResolver into the Catalog. However, a catalog must not deal > with resolution. When storing a table we need a resolved schema to > perist the fully expanded properties, however, when reading those > properties in again the schema can be resolved in a later stage. > > Regards, > Timo > > On 09.02.21 14:07, Rui Li wrote: > > Hi Timo, > > > > Thanks for the FLIP. It looks good to me overall. I have two questions. > > 1. When should we use a resolved schema and when to use an unresolved > one? > > 2. The FLIP mentions only resolved tables/views can be stored into a > > catalog. Does that mean the getTable method should also return a resolved > > object? > > > > On Tue, Feb 9, 2021 at 6:29 PM Timo Walther <[hidden email]> wrote: > > > >> Hi Jark, > >> > >> thanks for your feedback. Let me answer some of your comments: > >> > >> 1) Since we decided to build an entire new stack, we can also introduce > >> better names for columns, constraints, and watermark spec. My goal was > >> to shorten the names during this refactoring. Therefore, `TableSchema` > >> becomes `Schema` and `TableColumn` becomes `Column`. This also fits > >> better to a `CatalogView` that has a schema but is actually not a table > >> but a view. So `Column` is very generic. What do you think? > >> > >> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will store > >> `ResolvedExpression`. > >> > >> 3) I adopted most of the methods from `TableSchema` in `ResolvedSchema`. > >> However, I skipped `getColumnDataTypes()` because the behavior is not > >> clear to me. Should it include computed columns or virtual metadata > >> columns? I think we should force users to think about what they require. > >> Otherwise we implicitly introduce bugs. > >> > >> Regards, > >> Timo > >> > >> On 09.02.21 10:56, Jark Wu wrote: > >>> Hi Timo, > >>> > >>> The messy TableSchema confuses many developers. > >>> It's great to see we can finally come up with a clean interface > hierarchy > >>> and still backward compatible. > >>> > >>> Thanks for preparing the nice FLIP. It looks good to me. I have some > >> minor > >>> comments: > >>> > >>> 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` instead > >> of > >>> `Column`? > >>> > >>> 2) You mentioned ResolvedSchema should store ResolvedExpression, should > >> we > >>> extend > >>> `ComputedColumn` and `WatermarkSpec` to allow `ResolvedExpression`? > >>> > >>> 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better > to > >>> add un-deprecated > >>> methods of `TableSchema` into `ResolvedSchema` > >>> (e.g. `getColumnDataTypes()`). > >>> Then users can have a smooth migration. > >>> > >>> Best, > >>> Jark > >>> > >>> On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz <[hidden email]> > >>> wrote: > >>> > >>>> Hi Timo, > >>>> > >>>> From my perspective the proposed changes look good. I agree it is an > >>>> important step towards FLIP-129 and FLIP-136. Personally I feel > >>>> comfortable voting on the document. > >>>> > >>>> Best, > >>>> > >>>> Dawid > >>>> > >>>> On 05/02/2021 16:09, Timo Walther wrote: > >>>>> Hi everyone, > >>>>> > >>>>> you might have seen that we discussed a better schema API in past as > >>>>> part of FLIP-129 and FLIP-136. We also discussed this topic during > >>>>> different releases: > >>>>> > >>>>> https://issues.apache.org/jira/browse/FLINK-17793 > >>>>> > >>>>> Jark and I had an offline discussion how we can finally fix this > >>>>> shortcoming and maintain backwards compatibile for a couple of > >>>>> releases to give people time to update their code. > >>>>> > >>>>> I would like to propose the following FLIP: > >>>>> > >>>>> > >>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs > >>>>> > >>>>> > >>>>> The FLIP updates the class hierarchy to achieve the following goals: > >>>>> > >>>>> - make it visible whether a schema is resolved or unresolved and when > >>>>> the resolution happens > >>>>> - offer a unified API for FLIP-129, FLIP-136, and catalogs > >>>>> - allow arbitrary data types and expressions in the schema for > >>>>> watermark spec or columns > >>>>> - have access to other catalogs for declaring a data type or > >>>>> expression via CatalogManager > >>>>> - a cleaned up TableSchema > >>>>> - remain backwards compatible in the persisted properties and API > >>>>> > >>>>> Looking forward to your feedback. > >>>>> > >>>>> Thanks, > >>>>> Timo > >>>> > >>>> > >>> > >> > >> > > > > -- Best regards! Rui Li |
Hi Timo,
1) I'm fine with `Column`, but are we going to introduce new interfaces for `UniqueConstraint` and `WatermarkSpec`? If we want to introduce a new stack, it would be better to have a different name, otherwise, it's easy to use a wrong class for users. Best, Jark On Wed, 10 Feb 2021 at 09:49, Rui Li <[hidden email]> wrote: > I see. Makes sense to me. Thanks Timo for the detailed explanation! > > On Tue, Feb 9, 2021 at 9:48 PM Timo Walther <[hidden email]> wrote: > > > Hi Rui, > > > > 1. It depends whether you would like to declare (unresolved) or use > > (resolved) a schema. In catalogs and APIs, people would actually like to > > declare a schema. Because the schema might reference objects from other > > catalogs etc. However, whenever the schema comes out of the framework it > > is fully resolved and people can use to configure their UI, connector, > etc. > > 2. No, `getTable` doesn't have to return a resolved schema. Actually, > > this was my initial design (see Rejected Alternatives 1) where we pass > > the SchemaResolver into the Catalog. However, a catalog must not deal > > with resolution. When storing a table we need a resolved schema to > > perist the fully expanded properties, however, when reading those > > properties in again the schema can be resolved in a later stage. > > > > Regards, > > Timo > > > > On 09.02.21 14:07, Rui Li wrote: > > > Hi Timo, > > > > > > Thanks for the FLIP. It looks good to me overall. I have two questions. > > > 1. When should we use a resolved schema and when to use an unresolved > > one? > > > 2. The FLIP mentions only resolved tables/views can be stored into a > > > catalog. Does that mean the getTable method should also return a > resolved > > > object? > > > > > > On Tue, Feb 9, 2021 at 6:29 PM Timo Walther <[hidden email]> > wrote: > > > > > >> Hi Jark, > > >> > > >> thanks for your feedback. Let me answer some of your comments: > > >> > > >> 1) Since we decided to build an entire new stack, we can also > introduce > > >> better names for columns, constraints, and watermark spec. My goal was > > >> to shorten the names during this refactoring. Therefore, `TableSchema` > > >> becomes `Schema` and `TableColumn` becomes `Column`. This also fits > > >> better to a `CatalogView` that has a schema but is actually not a > table > > >> but a view. So `Column` is very generic. What do you think? > > >> > > >> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will > store > > >> `ResolvedExpression`. > > >> > > >> 3) I adopted most of the methods from `TableSchema` in > `ResolvedSchema`. > > >> However, I skipped `getColumnDataTypes()` because the behavior is not > > >> clear to me. Should it include computed columns or virtual metadata > > >> columns? I think we should force users to think about what they > require. > > >> Otherwise we implicitly introduce bugs. > > >> > > >> Regards, > > >> Timo > > >> > > >> On 09.02.21 10:56, Jark Wu wrote: > > >>> Hi Timo, > > >>> > > >>> The messy TableSchema confuses many developers. > > >>> It's great to see we can finally come up with a clean interface > > hierarchy > > >>> and still backward compatible. > > >>> > > >>> Thanks for preparing the nice FLIP. It looks good to me. I have some > > >> minor > > >>> comments: > > >>> > > >>> 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` > instead > > >> of > > >>> `Column`? > > >>> > > >>> 2) You mentioned ResolvedSchema should store ResolvedExpression, > should > > >> we > > >>> extend > > >>> `ComputedColumn` and `WatermarkSpec` to allow > `ResolvedExpression`? > > >>> > > >>> 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better > > to > > >>> add un-deprecated > > >>> methods of `TableSchema` into `ResolvedSchema` > > >>> (e.g. `getColumnDataTypes()`). > > >>> Then users can have a smooth migration. > > >>> > > >>> Best, > > >>> Jark > > >>> > > >>> On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz < > [hidden email]> > > >>> wrote: > > >>> > > >>>> Hi Timo, > > >>>> > > >>>> From my perspective the proposed changes look good. I agree it is > an > > >>>> important step towards FLIP-129 and FLIP-136. Personally I feel > > >>>> comfortable voting on the document. > > >>>> > > >>>> Best, > > >>>> > > >>>> Dawid > > >>>> > > >>>> On 05/02/2021 16:09, Timo Walther wrote: > > >>>>> Hi everyone, > > >>>>> > > >>>>> you might have seen that we discussed a better schema API in past > as > > >>>>> part of FLIP-129 and FLIP-136. We also discussed this topic during > > >>>>> different releases: > > >>>>> > > >>>>> https://issues.apache.org/jira/browse/FLINK-17793 > > >>>>> > > >>>>> Jark and I had an offline discussion how we can finally fix this > > >>>>> shortcoming and maintain backwards compatibile for a couple of > > >>>>> releases to give people time to update their code. > > >>>>> > > >>>>> I would like to propose the following FLIP: > > >>>>> > > >>>>> > > >>>> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs > > >>>>> > > >>>>> > > >>>>> The FLIP updates the class hierarchy to achieve the following > goals: > > >>>>> > > >>>>> - make it visible whether a schema is resolved or unresolved and > when > > >>>>> the resolution happens > > >>>>> - offer a unified API for FLIP-129, FLIP-136, and catalogs > > >>>>> - allow arbitrary data types and expressions in the schema for > > >>>>> watermark spec or columns > > >>>>> - have access to other catalogs for declaring a data type or > > >>>>> expression via CatalogManager > > >>>>> - a cleaned up TableSchema > > >>>>> - remain backwards compatible in the persisted properties and API > > >>>>> > > >>>>> Looking forward to your feedback. > > >>>>> > > >>>>> Thanks, > > >>>>> Timo > > >>>> > > >>>> > > >>> > > >> > > >> > > > > > > > > > -- > Best regards! > Rui Li > |
Hi Jark,
I don't think many users use WatermarkSpec. UniqueConstraint could cause some confusion but this mostly affects catalog or connector implementers. After deprecating the old APIs it should be obvious when an outdated interface is used. I'm fine with using a different name, do we have a better name? But otherwise I would just maintain two classes in different packages for a 1-2 releases. Regards, Timo On 10.02.21 02:54, Jark Wu wrote: > Hi Timo, > > 1) I'm fine with `Column`, but are we going to introduce new interfaces > for `UniqueConstraint` and `WatermarkSpec`? If we want to introduce > a new stack, it would be better to have a different name, otherwise, > it's easy to use a wrong class for users. > > Best, > Jark > > On Wed, 10 Feb 2021 at 09:49, Rui Li <[hidden email]> wrote: > >> I see. Makes sense to me. Thanks Timo for the detailed explanation! >> >> On Tue, Feb 9, 2021 at 9:48 PM Timo Walther <[hidden email]> wrote: >> >>> Hi Rui, >>> >>> 1. It depends whether you would like to declare (unresolved) or use >>> (resolved) a schema. In catalogs and APIs, people would actually like to >>> declare a schema. Because the schema might reference objects from other >>> catalogs etc. However, whenever the schema comes out of the framework it >>> is fully resolved and people can use to configure their UI, connector, >> etc. >>> 2. No, `getTable` doesn't have to return a resolved schema. Actually, >>> this was my initial design (see Rejected Alternatives 1) where we pass >>> the SchemaResolver into the Catalog. However, a catalog must not deal >>> with resolution. When storing a table we need a resolved schema to >>> perist the fully expanded properties, however, when reading those >>> properties in again the schema can be resolved in a later stage. >>> >>> Regards, >>> Timo >>> >>> On 09.02.21 14:07, Rui Li wrote: >>>> Hi Timo, >>>> >>>> Thanks for the FLIP. It looks good to me overall. I have two questions. >>>> 1. When should we use a resolved schema and when to use an unresolved >>> one? >>>> 2. The FLIP mentions only resolved tables/views can be stored into a >>>> catalog. Does that mean the getTable method should also return a >> resolved >>>> object? >>>> >>>> On Tue, Feb 9, 2021 at 6:29 PM Timo Walther <[hidden email]> >> wrote: >>>> >>>>> Hi Jark, >>>>> >>>>> thanks for your feedback. Let me answer some of your comments: >>>>> >>>>> 1) Since we decided to build an entire new stack, we can also >> introduce >>>>> better names for columns, constraints, and watermark spec. My goal was >>>>> to shorten the names during this refactoring. Therefore, `TableSchema` >>>>> becomes `Schema` and `TableColumn` becomes `Column`. This also fits >>>>> better to a `CatalogView` that has a schema but is actually not a >> table >>>>> but a view. So `Column` is very generic. What do you think? >>>>> >>>>> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will >> store >>>>> `ResolvedExpression`. >>>>> >>>>> 3) I adopted most of the methods from `TableSchema` in >> `ResolvedSchema`. >>>>> However, I skipped `getColumnDataTypes()` because the behavior is not >>>>> clear to me. Should it include computed columns or virtual metadata >>>>> columns? I think we should force users to think about what they >> require. >>>>> Otherwise we implicitly introduce bugs. >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> On 09.02.21 10:56, Jark Wu wrote: >>>>>> Hi Timo, >>>>>> >>>>>> The messy TableSchema confuses many developers. >>>>>> It's great to see we can finally come up with a clean interface >>> hierarchy >>>>>> and still backward compatible. >>>>>> >>>>>> Thanks for preparing the nice FLIP. It looks good to me. I have some >>>>> minor >>>>>> comments: >>>>>> >>>>>> 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` >> instead >>>>> of >>>>>> `Column`? >>>>>> >>>>>> 2) You mentioned ResolvedSchema should store ResolvedExpression, >> should >>>>> we >>>>>> extend >>>>>> `ComputedColumn` and `WatermarkSpec` to allow >> `ResolvedExpression`? >>>>>> >>>>>> 3) `ResolvedSchema` aims to replace `TableSchema`, it would be better >>> to >>>>>> add un-deprecated >>>>>> methods of `TableSchema` into `ResolvedSchema` >>>>>> (e.g. `getColumnDataTypes()`). >>>>>> Then users can have a smooth migration. >>>>>> >>>>>> Best, >>>>>> Jark >>>>>> >>>>>> On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz < >> [hidden email]> >>>>>> wrote: >>>>>> >>>>>>> Hi Timo, >>>>>>> >>>>>>> From my perspective the proposed changes look good. I agree it is >> an >>>>>>> important step towards FLIP-129 and FLIP-136. Personally I feel >>>>>>> comfortable voting on the document. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Dawid >>>>>>> >>>>>>> On 05/02/2021 16:09, Timo Walther wrote: >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> you might have seen that we discussed a better schema API in past >> as >>>>>>>> part of FLIP-129 and FLIP-136. We also discussed this topic during >>>>>>>> different releases: >>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-17793 >>>>>>>> >>>>>>>> Jark and I had an offline discussion how we can finally fix this >>>>>>>> shortcoming and maintain backwards compatibile for a couple of >>>>>>>> releases to give people time to update their code. >>>>>>>> >>>>>>>> I would like to propose the following FLIP: >>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs >>>>>>>> >>>>>>>> >>>>>>>> The FLIP updates the class hierarchy to achieve the following >> goals: >>>>>>>> >>>>>>>> - make it visible whether a schema is resolved or unresolved and >> when >>>>>>>> the resolution happens >>>>>>>> - offer a unified API for FLIP-129, FLIP-136, and catalogs >>>>>>>> - allow arbitrary data types and expressions in the schema for >>>>>>>> watermark spec or columns >>>>>>>> - have access to other catalogs for declaring a data type or >>>>>>>> expression via CatalogManager >>>>>>>> - a cleaned up TableSchema >>>>>>>> - remain backwards compatible in the persisted properties and API >>>>>>>> >>>>>>>> Looking forward to your feedback. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Timo >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> >> -- >> Best regards! >> Rui Li >> > |
Thanks for the feedback everyone. If there are no objections, I would
like to continue with the voting now. We can still discuss class names or package locations during the implementation. But as far as I can see everyone agrees about the general proposition of this FLIP to improve the schema declaration. Regards, Timo On 10.02.21 12:12, Timo Walther wrote: > Hi Jark, > > I don't think many users use WatermarkSpec. UniqueConstraint could cause > some confusion but this mostly affects catalog or connector > implementers. After deprecating the old APIs it should be obvious when > an outdated interface is used. I'm fine with using a different name, do > we have a better name? But otherwise I would just maintain two classes > in different packages for a 1-2 releases. > > Regards, > Timo > > On 10.02.21 02:54, Jark Wu wrote: >> Hi Timo, >> >> 1) I'm fine with `Column`, but are we going to introduce new interfaces >> for `UniqueConstraint` and `WatermarkSpec`? If we want to introduce >> a new stack, it would be better to have a different name, otherwise, >> it's easy to use a wrong class for users. >> >> Best, >> Jark >> >> On Wed, 10 Feb 2021 at 09:49, Rui Li <[hidden email]> wrote: >> >>> I see. Makes sense to me. Thanks Timo for the detailed explanation! >>> >>> On Tue, Feb 9, 2021 at 9:48 PM Timo Walther <[hidden email]> wrote: >>> >>>> Hi Rui, >>>> >>>> 1. It depends whether you would like to declare (unresolved) or use >>>> (resolved) a schema. In catalogs and APIs, people would actually >>>> like to >>>> declare a schema. Because the schema might reference objects from other >>>> catalogs etc. However, whenever the schema comes out of the >>>> framework it >>>> is fully resolved and people can use to configure their UI, connector, >>> etc. >>>> 2. No, `getTable` doesn't have to return a resolved schema. Actually, >>>> this was my initial design (see Rejected Alternatives 1) where we pass >>>> the SchemaResolver into the Catalog. However, a catalog must not deal >>>> with resolution. When storing a table we need a resolved schema to >>>> perist the fully expanded properties, however, when reading those >>>> properties in again the schema can be resolved in a later stage. >>>> >>>> Regards, >>>> Timo >>>> >>>> On 09.02.21 14:07, Rui Li wrote: >>>>> Hi Timo, >>>>> >>>>> Thanks for the FLIP. It looks good to me overall. I have two >>>>> questions. >>>>> 1. When should we use a resolved schema and when to use an unresolved >>>> one? >>>>> 2. The FLIP mentions only resolved tables/views can be stored into a >>>>> catalog. Does that mean the getTable method should also return a >>> resolved >>>>> object? >>>>> >>>>> On Tue, Feb 9, 2021 at 6:29 PM Timo Walther <[hidden email]> >>> wrote: >>>>> >>>>>> Hi Jark, >>>>>> >>>>>> thanks for your feedback. Let me answer some of your comments: >>>>>> >>>>>> 1) Since we decided to build an entire new stack, we can also >>> introduce >>>>>> better names for columns, constraints, and watermark spec. My goal >>>>>> was >>>>>> to shorten the names during this refactoring. Therefore, >>>>>> `TableSchema` >>>>>> becomes `Schema` and `TableColumn` becomes `Column`. This also fits >>>>>> better to a `CatalogView` that has a schema but is actually not a >>> table >>>>>> but a view. So `Column` is very generic. What do you think? >>>>>> >>>>>> 2) `ComputedColumn` and `WatermarkSpec` of the new generation will >>> store >>>>>> `ResolvedExpression`. >>>>>> >>>>>> 3) I adopted most of the methods from `TableSchema` in >>> `ResolvedSchema`. >>>>>> However, I skipped `getColumnDataTypes()` because the behavior is not >>>>>> clear to me. Should it include computed columns or virtual metadata >>>>>> columns? I think we should force users to think about what they >>> require. >>>>>> Otherwise we implicitly introduce bugs. >>>>>> >>>>>> Regards, >>>>>> Timo >>>>>> >>>>>> On 09.02.21 10:56, Jark Wu wrote: >>>>>>> Hi Timo, >>>>>>> >>>>>>> The messy TableSchema confuses many developers. >>>>>>> It's great to see we can finally come up with a clean interface >>>> hierarchy >>>>>>> and still backward compatible. >>>>>>> >>>>>>> Thanks for preparing the nice FLIP. It looks good to me. I have some >>>>>> minor >>>>>>> comments: >>>>>>> >>>>>>> 1) Should `ResolvedSchema#getColumn(int)` returns `TableColumn` >>> instead >>>>>> of >>>>>>> `Column`? >>>>>>> >>>>>>> 2) You mentioned ResolvedSchema should store ResolvedExpression, >>> should >>>>>> we >>>>>>> extend >>>>>>> `ComputedColumn` and `WatermarkSpec` to allow >>> `ResolvedExpression`? >>>>>>> >>>>>>> 3) `ResolvedSchema` aims to replace `TableSchema`, it would be >>>>>>> better >>>> to >>>>>>> add un-deprecated >>>>>>> methods of `TableSchema` into `ResolvedSchema` >>>>>>> (e.g. `getColumnDataTypes()`). >>>>>>> Then users can have a smooth migration. >>>>>>> >>>>>>> Best, >>>>>>> Jark >>>>>>> >>>>>>> On Mon, 8 Feb 2021 at 20:21, Dawid Wysakowicz < >>> [hidden email]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Timo, >>>>>>>> >>>>>>>> From my perspective the proposed changes look good. I agree >>>>>>>> it is >>> an >>>>>>>> important step towards FLIP-129 and FLIP-136. Personally I feel >>>>>>>> comfortable voting on the document. >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Dawid >>>>>>>> >>>>>>>> On 05/02/2021 16:09, Timo Walther wrote: >>>>>>>>> Hi everyone, >>>>>>>>> >>>>>>>>> you might have seen that we discussed a better schema API in past >>> as >>>>>>>>> part of FLIP-129 and FLIP-136. We also discussed this topic during >>>>>>>>> different releases: >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/FLINK-17793 >>>>>>>>> >>>>>>>>> Jark and I had an offline discussion how we can finally fix this >>>>>>>>> shortcoming and maintain backwards compatibile for a couple of >>>>>>>>> releases to give people time to update their code. >>>>>>>>> >>>>>>>>> I would like to propose the following FLIP: >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-164%3A+Improve+Schema+Handling+in+Catalogs >>> >>>>>>>>> >>>>>>>>> >>>>>>>>> The FLIP updates the class hierarchy to achieve the following >>> goals: >>>>>>>>> >>>>>>>>> - make it visible whether a schema is resolved or unresolved and >>> when >>>>>>>>> the resolution happens >>>>>>>>> - offer a unified API for FLIP-129, FLIP-136, and catalogs >>>>>>>>> - allow arbitrary data types and expressions in the schema for >>>>>>>>> watermark spec or columns >>>>>>>>> - have access to other catalogs for declaring a data type or >>>>>>>>> expression via CatalogManager >>>>>>>>> - a cleaned up TableSchema >>>>>>>>> - remain backwards compatible in the persisted properties and API >>>>>>>>> >>>>>>>>> Looking forward to your feedback. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Timo >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> -- >>> Best regards! >>> Rui Li >>> >> > |
Free forum by Nabble | Edit this page |