Hi all, As part of FLIP-30 a Catalog API was introduced that enables storing table meta objects permanently. At the same time the majority of current APIs create temporary objects that cannot be serialized. We should clarify the creation of meta objects (tables, views, functions) in a unified way. Another current problem in the API is that all the temporary objects are stored in a special built-in catalog, which is not very intuitive for many users, as they must be aware of that catalog to reference temporary objects. Lastly, different APIs have different ways of providing
object paths:
We should choose one approach and unify it across all APIs. I suggest a FLIP to address the above issues. Looking forward to your opinions. signature.asc (849 bytes) Download Attachment |
Thanks dawid, +1 for this approach.
One concern is the removal of registerTableSink & registerTableSource in TableEnvironment. It has two alternatives: 1.the properties approach (DDL, descriptor). 2.from/toDataStream. #1 can only be properties, not java states, and some Connectors are difficult to convert all states to properties. #2 can contain java state. But can't use TableSource-related features, like project & filter push down, partition support, etc.. Any idea about this? Best, Jingsong Lee ------------------------------------------------------------------ From:Dawid Wysakowicz <[hidden email]> Send Time:2019年9月4日(星期三) 22:20 To:dev <[hidden email]> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module Hi all, As part of FLIP-30 a Catalog API was introduced that enables storing table meta objects permanently. At the same time the majority of current APIs create temporary objects that cannot be serialized. We should clarify the creation of meta objects (tables, views, functions) in a unified way. Another current problem in the API is that all the temporary objects are stored in a special built-in catalog, which is not very intuitive for many users, as they must be aware of that catalog to reference temporary objects. Lastly, different APIs have different ways of providing object paths: String path…, String path, String pathContinued… String name We should choose one approach and unify it across all APIs. I suggest a FLIP to address the above issues. Looking forward to your opinions. FLIP link: https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module |
Thanks to Dawid for starting the discussion and writeup. It looks pretty
good to me except that I'm a little concerned about the object reference and string parsing in the code, which seems to an anti-pattern to OOP. Have we considered using ObjectIdenitifier with optional catalog and db parts, esp. if we are worried about arguments of variable length or method overloading? It's quite likely that the result of string parsing is an ObjectIdentifier instance any way. Having string parsing logic in the code is a little dangerous as it duplicates part of the DDL/DML parsing, and they can easily get out of sync. Thanks, Xuefu On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]> wrote: > Thanks dawid, +1 for this approach. > > One concern is the removal of registerTableSink & registerTableSource > in TableEnvironment. It has two alternatives: > 1.the properties approach (DDL, descriptor). > 2.from/toDataStream. > > #1 can only be properties, not java states, and some Connectors > are difficult to convert all states to properties. > #2 can contain java state. But can't use TableSource-related features, > like project & filter push down, partition support, etc.. > > Any idea about this? > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Dawid Wysakowicz <[hidden email]> > Send Time:2019年9月4日(星期三) 22:20 > To:dev <[hidden email]> > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module > > Hi all, > As part of FLIP-30 a Catalog API was introduced that enables storing table > meta objects permanently. At the same time the majority of current APIs > create temporary objects that cannot be serialized. We should clarify the > creation of meta objects (tables, views, functions) in a unified way. > Another current problem in the API is that all the temporary objects are > stored in a special built-in catalog, which is not very intuitive for many > users, as they must be aware of that catalog to reference temporary objects. > Lastly, different APIs have different ways of providing object paths: > > String path…, > String path, String pathContinued… > String name > We should choose one approach and unify it across all APIs. > I suggest a FLIP to address the above issues. > Looking forward to your opinions. > FLIP link: > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > -- Xuefu Zhang "In Honey We Trust!" |
Hi all, @Jingsong Could you elaborate a bit more what do you mean by "some Connectors are difficult to convert all states to properties" All the Flink provided connectors will definitely be expressible
with properties (In the end you should be able to use them from
DDL). I think if a TableSource is complex enough that it handles
filter push down, partition support etc. should rather be made
available both from DDL & java/scala code. I'm happy to
reconsider adding registerTemporaryTable(String path, TableSource
source) if you have some concrete examples in mind.
@Xuefu: We also considered the ObjectIdentifier (or actually introducing a new identifier representation to differentiate between resolved and unresolved identifiers) with the same concerns. We decided to suggest the string & parsing logic because of usability. tEnv.from("cat.db.table") is shorter and easier to write than tEnv.from(Identifier.for("cat", "db", "name") And also implicitly solves the problem what happens if a user (e.g. used to other systems) uses that API in a following manner: tEnv.from(Identifier.for("db.name") I'm happy to revisit it if the general consensus is that it's better to use the OO aproach. Best, Dawid On 06/09/2019 10:00, Xuefu Z wrote:
Thanks to Dawid for starting the discussion and writeup. It looks pretty good to me except that I'm a little concerned about the object reference and string parsing in the code, which seems to an anti-pattern to OOP. Have we considered using ObjectIdenitifier with optional catalog and db parts, esp. if we are worried about arguments of variable length or method overloading? It's quite likely that the result of string parsing is an ObjectIdentifier instance any way. Having string parsing logic in the code is a little dangerous as it duplicates part of the DDL/DML parsing, and they can easily get out of sync. Thanks, Xuefu On Fri, Sep 6, 2019 at 1:57 PM JingsongLee [hidden email] wrote:Thanks dawid, +1 for this approach. One concern is the removal of registerTableSink & registerTableSource in TableEnvironment. It has two alternatives: 1.the properties approach (DDL, descriptor). 2.from/toDataStream. #1 can only be properties, not java states, and some Connectors are difficult to convert all states to properties. #2 can contain java state. But can't use TableSource-related features, like project & filter push down, partition support, etc.. Any idea about this? Best, Jingsong Lee ------------------------------------------------------------------ From:Dawid Wysakowicz [hidden email] Send Time:2019年9月4日(星期三) 22:20 To:dev [hidden email] Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module Hi all, As part of FLIP-30 a Catalog API was introduced that enables storing table meta objects permanently. At the same time the majority of current APIs create temporary objects that cannot be serialized. We should clarify the creation of meta objects (tables, views, functions) in a unified way. Another current problem in the API is that all the temporary objects are stored in a special built-in catalog, which is not very intuitive for many users, as they must be aware of that catalog to reference temporary objects. Lastly, different APIs have different ways of providing object paths: String path…, String path, String pathContinued… String name We should choose one approach and unify it across all APIs. I suggest a FLIP to address the above issues. Looking forward to your opinions. FLIP link: https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module signature.asc (849 bytes) Download Attachment |
Hi dawid:
It is difficult to describe specific examples. Sometimes users will generate some java converters through some Java code, or generate some Java classes through third-party libraries. Of course, these can be best done through properties. But this requires additional work from users.My suggestion is to keep this Java instance class way that is user-friendly. Best, Jingsong Lee ------------------------------------------------------------------ From:Dawid Wysakowicz <[hidden email]> Send Time:2019年9月6日(星期五) 16:21 To:dev <[hidden email]> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module Hi all, @Jingsong Could you elaborate a bit more what do you mean by "some Connectors are difficult to convert all states to properties" All the Flink provided connectors will definitely be expressible with properties (In the end you should be able to use them from DDL). I think if a TableSource is complex enough that it handles filter push down, partition support etc. should rather be made available both from DDL & java/scala code. I'm happy to reconsider adding registerTemporaryTable(String path, TableSource source) if you have some concrete examples in mind. @Xuefu: We also considered the ObjectIdentifier (or actually introducing a new identifier representation to differentiate between resolved and unresolved identifiers) with the same concerns. We decided to suggest the string & parsing logic because of usability. tEnv.from("cat.db.table") is shorter and easier to write than tEnv.from(Identifier.for("cat", "db", "name") And also implicitly solves the problem what happens if a user (e.g. used to other systems) uses that API in a following manner: tEnv.from(Identifier.for("db.name") I'm happy to revisit it if the general consensus is that it's better to use the OO aproach. Best, Dawid On 06/09/2019 10:00, Xuefu Z wrote: Thanks to Dawid for starting the discussion and writeup. It looks pretty good to me except that I'm a little concerned about the object reference and string parsing in the code, which seems to an anti-pattern to OOP. Have we considered using ObjectIdenitifier with optional catalog and db parts, esp. if we are worried about arguments of variable length or method overloading? It's quite likely that the result of string parsing is an ObjectIdentifier instance any way. Having string parsing logic in the code is a little dangerous as it duplicates part of the DDL/DML parsing, and they can easily get out of sync. Thanks, Xuefu On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]> wrote: Thanks dawid, +1 for this approach. One concern is the removal of registerTableSink & registerTableSource in TableEnvironment. It has two alternatives: 1.the properties approach (DDL, descriptor). 2.from/toDataStream. #1 can only be properties, not java states, and some Connectors are difficult to convert all states to properties. #2 can contain java state. But can't use TableSource-related features, like project & filter push down, partition support, etc.. Any idea about this? Best, Jingsong Lee ------------------------------------------------------------------ From:Dawid Wysakowicz <[hidden email]> Send Time:2019年9月4日(星期三) 22:20 To:dev <[hidden email]> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module Hi all, As part of FLIP-30 a Catalog API was introduced that enables storing table meta objects permanently. At the same time the majority of current APIs create temporary objects that cannot be serialized. We should clarify the creation of meta objects (tables, views, functions) in a unified way. Another current problem in the API is that all the temporary objects are stored in a special built-in catalog, which is not very intuitive for many users, as they must be aware of that catalog to reference temporary objects. Lastly, different APIs have different ways of providing object paths: String path…, String path, String pathContinued… String name We should choose one approach and unify it across all APIs. I suggest a FLIP to address the above issues. Looking forward to your opinions. FLIP link: https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module |
In reply to this post by dwysakowicz
Thanks david for pushing this forward.
I have one concern about temporary objects and non-persistent catalog(e.g., GenericInMemoryCatalog). In SQL, temporary objects exist at the session level. They are only visible to the session in which they were created and are automatically dropped when that session logs off. So in that sense, all objects in non-persistent catalogs should be temporary. My concern is: How does the non-persistent catalog work with temporary objects? *Best Regards,* *Zhenghua Gao* On Wed, Sep 4, 2019 at 10:20 PM Dawid Wysakowicz <[hidden email]> wrote: > Hi all, > > As part of FLIP-30 > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs> > a Catalog API was introduced that enables storing table meta objects > permanently. At the same time the majority of current APIs create temporary > objects that cannot be serialized. We should clarify the creation of meta > objects (tables, views, functions) in a unified way. > > Another current problem in the API is that all the temporary objects are > stored in a special built-in catalog, which is not very intuitive for many > users, as they must be aware of that catalog to reference temporary objects. > > Lastly, different APIs have different ways of providing object paths: > > - String path…, > - String path, String pathContinued… > - String name > > We should choose one approach and unify it across all APIs. > > I suggest a FLIP to address the above issues. > > Looking forward to your opinions. > > FLIP link: > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > |
In reply to this post by JingsongLee-2
Hi Dawid,
thanks for the design document. It fixes big concept gaps due to historical reasons with proper support for serializability and catalog support in mind. I would not mind a registerTemporarySource/Sink, but the problem that I see is that many people think that this is the recommended way of registering a table source/sink which is not true. We should guide users to either use connect() or DDL API which can be validated and stored in catalog. Also from a concept perspective, registering a source/sink does not fit into the SQL world. SQL does not know about source/sinks but only about tables. If the responsibility of a TableSource/TableSink is just a pure physical data consumer/producer that is not connected to the actual logical table schema, we would need a possibility of defining time attributes and interpreting/converting a changelog. This should be done by the framework with information from the DDL/connect() and not be defined in every table source. Regards, Timo On 09.09.19 14:16, JingsongLee wrote: > Hi dawid: > > It is difficult to describe specific examples. > Sometimes users will generate some java converters through some > Java code, or generate some Java classes through third-party > libraries. Of course, these can be best done through properties. > But this requires additional work from users.My suggestion is to > keep this Java instance class way that is user-friendly. > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Dawid Wysakowicz <[hidden email]> > Send Time:2019年9月6日(星期五) 16:21 > To:dev <[hidden email]> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module > > Hi all, > @Jingsong Could you elaborate a bit more what do you mean by > "some Connectors are difficult to convert all states to properties" > All the Flink provided connectors will definitely be expressible with properties (In the end you should be able to use them from DDL). I think if a TableSource is complex enough that it handles filter push down, partition support etc. should rather be made available both from DDL & java/scala code. I'm happy to reconsider adding registerTemporaryTable(String path, TableSource source) if you have some concrete examples in mind. > > > @Xuefu: We also considered the ObjectIdentifier (or actually introducing a new identifier representation to differentiate between resolved and unresolved identifiers) with the same concerns. We decided to suggest the string & parsing logic because of usability. > tEnv.from("cat.db.table") > is shorter and easier to write than > tEnv.from(Identifier.for("cat", "db", "name") > And also implicitly solves the problem what happens if a user (e.g. used to other systems) uses that API in a following manner: > tEnv.from(Identifier.for("db.name") > I'm happy to revisit it if the general consensus is that it's better to use the OO aproach. > Best, > Dawid > > On 06/09/2019 10:00, Xuefu Z wrote: > > Thanks to Dawid for starting the discussion and writeup. It looks pretty > good to me except that I'm a little concerned about the object reference > and string parsing in the code, which seems to an anti-pattern to OOP. Have > we considered using ObjectIdenitifier with optional catalog and db parts, > esp. if we are worried about arguments of variable length or method > overloading? It's quite likely that the result of string parsing is an > ObjectIdentifier instance any way. > > Having string parsing logic in the code is a little dangerous as it > duplicates part of the DDL/DML parsing, and they can easily get out of sync. > > Thanks, > Xuefu > > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]> > wrote: > > > Thanks dawid, +1 for this approach. > > One concern is the removal of registerTableSink & registerTableSource > in TableEnvironment. It has two alternatives: > 1.the properties approach (DDL, descriptor). > 2.from/toDataStream. > > #1 can only be properties, not java states, and some Connectors > are difficult to convert all states to properties. > #2 can contain java state. But can't use TableSource-related features, > like project & filter push down, partition support, etc.. > > Any idea about this? > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Dawid Wysakowicz <[hidden email]> > Send Time:2019年9月4日(星期三) 22:20 > To:dev <[hidden email]> > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module > > Hi all, > As part of FLIP-30 a Catalog API was introduced that enables storing table > meta objects permanently. At the same time the majority of current APIs > create temporary objects that cannot be serialized. We should clarify the > creation of meta objects (tables, views, functions) in a unified way. > Another current problem in the API is that all the temporary objects are > stored in a special built-in catalog, which is not very intuitive for many > users, as they must be aware of that catalog to reference temporary objects. > Lastly, different APIs have different ways of providing object paths: > > String path…, > String path, String pathContinued… > String name > We should choose one approach and unify it across all APIs. > I suggest a FLIP to address the above issues. > Looking forward to your opinions. > FLIP link: > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > |
After some development and thinking, I have a general understanding.
+1 to registering a source/sink does not fit into the SQL world. I am OK to have a deprecated registerTemporarySource/Sink to compatible with old ways. Best, Jingsong Lee ------------------------------------------------------------------ From:Timo Walther <[hidden email]> Send Time:2019年9月17日(星期二) 08:00 To:dev <[hidden email]> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module Hi Dawid, thanks for the design document. It fixes big concept gaps due to historical reasons with proper support for serializability and catalog support in mind. I would not mind a registerTemporarySource/Sink, but the problem that I see is that many people think that this is the recommended way of registering a table source/sink which is not true. We should guide users to either use connect() or DDL API which can be validated and stored in catalog. Also from a concept perspective, registering a source/sink does not fit into the SQL world. SQL does not know about source/sinks but only about tables. If the responsibility of a TableSource/TableSink is just a pure physical data consumer/producer that is not connected to the actual logical table schema, we would need a possibility of defining time attributes and interpreting/converting a changelog. This should be done by the framework with information from the DDL/connect() and not be defined in every table source. Regards, Timo On 09.09.19 14:16, JingsongLee wrote: > Hi dawid: > > It is difficult to describe specific examples. > Sometimes users will generate some java converters through some > Java code, or generate some Java classes through third-party > libraries. Of course, these can be best done through properties. > But this requires additional work from users.My suggestion is to > keep this Java instance class way that is user-friendly. > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Dawid Wysakowicz <[hidden email]> > Send Time:2019年9月6日(星期五) 16:21 > To:dev <[hidden email]> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module > > Hi all, > @Jingsong Could you elaborate a bit more what do you mean by > "some Connectors are difficult to convert all states to properties" > All the Flink provided connectors will definitely be expressible with properties (In the end you should be able to use them from DDL). I think if a TableSource is complex enough that it handles filter push down, partition support etc. should rather be made available both from DDL & java/scala code. I'm happy to reconsider adding registerTemporaryTable(String path, TableSource source) if you have some concrete examples in mind. > > > @Xuefu: We also considered the ObjectIdentifier (or actually introducing a new identifier representation to differentiate between resolved and unresolved identifiers) with the same concerns. We decided to suggest the string & parsing logic because of usability. > tEnv.from("cat.db.table") > is shorter and easier to write than > tEnv.from(Identifier.for("cat", "db", "name") > And also implicitly solves the problem what happens if a user (e.g. used to other systems) uses that API in a following manner: > tEnv.from(Identifier.for("db.name") > I'm happy to revisit it if the general consensus is that it's better to use the OO aproach. > Best, > Dawid > > On 06/09/2019 10:00, Xuefu Z wrote: > > Thanks to Dawid for starting the discussion and writeup. It looks pretty > good to me except that I'm a little concerned about the object reference > and string parsing in the code, which seems to an anti-pattern to OOP. Have > we considered using ObjectIdenitifier with optional catalog and db parts, > esp. if we are worried about arguments of variable length or method > overloading? It's quite likely that the result of string parsing is an > ObjectIdentifier instance any way. > > Having string parsing logic in the code is a little dangerous as it > duplicates part of the DDL/DML parsing, and they can easily get out of sync. > > Thanks, > Xuefu > > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email]> > wrote: > > > Thanks dawid, +1 for this approach. > > One concern is the removal of registerTableSink & registerTableSource > in TableEnvironment. It has two alternatives: > 1.the properties approach (DDL, descriptor). > 2.from/toDataStream. > > #1 can only be properties, not java states, and some Connectors > are difficult to convert all states to properties. > #2 can contain java state. But can't use TableSource-related features, > like project & filter push down, partition support, etc.. > > Any idea about this? > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Dawid Wysakowicz <[hidden email]> > Send Time:2019年9月4日(星期三) 22:20 > To:dev <[hidden email]> > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module > > Hi all, > As part of FLIP-30 a Catalog API was introduced that enables storing table > meta objects permanently. At the same time the majority of current APIs > create temporary objects that cannot be serialized. We should clarify the > creation of meta objects (tables, views, functions) in a unified way. > Another current problem in the API is that all the temporary objects are > stored in a special built-in catalog, which is not very intuitive for many > users, as they must be aware of that catalog to reference temporary objects. > Lastly, different APIs have different ways of providing object paths: > > String path…, > String path, String pathContinued… > String name > We should choose one approach and unify it across all APIs. > I suggest a FLIP to address the above issues. > Looking forward to your opinions. > FLIP link: > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > |
Hi all,
Sorry to join this party late. Big +1 to this flip, especially for the dropping "registerTableSink & registerTableSource" part. These are indeed legacy and we should try to unify them through CatalogTable after we introduce the concept of Catalog. From my understanding, what we can registered should all be metadata, TableSource/TableSink should only be the one who is responsible to do the real work, i.e. reading and writing data according to the schema and other information like computed column, partition, .e.g. Best, Kurt On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]> wrote: > After some development and thinking, I have a general understanding. > +1 to registering a source/sink does not fit into the SQL world. > I am OK to have a deprecated registerTemporarySource/Sink to compatible > with old ways. > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Timo Walther <[hidden email]> > Send Time:2019年9月17日(星期二) 08:00 > To:dev <[hidden email]> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > module > > Hi Dawid, > > thanks for the design document. It fixes big concept gaps due to > historical reasons with proper support for serializability and catalog > support in mind. > > I would not mind a registerTemporarySource/Sink, but the problem that I > see is that many people think that this is the recommended way of > registering a table source/sink which is not true. We should guide users > to either use connect() or DDL API which can be validated and stored in > catalog. > > Also from a concept perspective, registering a source/sink does not fit > into the SQL world. SQL does not know about source/sinks but only about > tables. If the responsibility of a TableSource/TableSink is just a pure > physical data consumer/producer that is not connected to the actual > logical table schema, we would need a possibility of defining time > attributes and interpreting/converting a changelog. This should be done > by the framework with information from the DDL/connect() and not be > defined in every table source. > > Regards, > Timo > > > On 09.09.19 14:16, JingsongLee wrote: > > Hi dawid: > > > > It is difficult to describe specific examples. > > Sometimes users will generate some java converters through some > > Java code, or generate some Java classes through third-party > > libraries. Of course, these can be best done through properties. > > But this requires additional work from users.My suggestion is to > > keep this Java instance class way that is user-friendly. > > > > Best, > > Jingsong Lee > > > > > > ------------------------------------------------------------------ > > From:Dawid Wysakowicz <[hidden email]> > > Send Time:2019年9月6日(星期五) 16:21 > > To:dev <[hidden email]> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > module > > > > Hi all, > > @Jingsong Could you elaborate a bit more what do you mean by > > "some Connectors are difficult to convert all states to properties" > > All the Flink provided connectors will definitely be expressible with > properties (In the end you should be able to use them from DDL). I think if > a TableSource is complex enough that it handles filter push down, partition > support etc. should rather be made available both from DDL & java/scala > code. I'm happy to reconsider adding registerTemporaryTable(String path, > TableSource source) if you have some concrete examples in mind. > > > > > > @Xuefu: We also considered the ObjectIdentifier (or actually introducing > a new identifier representation to differentiate between resolved and > unresolved identifiers) with the same concerns. We decided to suggest the > string & parsing logic because of usability. > > tEnv.from("cat.db.table") > > is shorter and easier to write than > > tEnv.from(Identifier.for("cat", "db", "name") > > And also implicitly solves the problem what happens if a user (e.g. used > to other systems) uses that API in a following manner: > > tEnv.from(Identifier.for("db.name") > > I'm happy to revisit it if the general consensus is that it's better to > use the OO aproach. > > Best, > > Dawid > > > > On 06/09/2019 10:00, Xuefu Z wrote: > > > > Thanks to Dawid for starting the discussion and writeup. It looks pretty > > good to me except that I'm a little concerned about the object reference > > and string parsing in the code, which seems to an anti-pattern to OOP. > Have > > we considered using ObjectIdenitifier with optional catalog and db parts, > > esp. if we are worried about arguments of variable length or method > > overloading? It's quite likely that the result of string parsing is an > > ObjectIdentifier instance any way. > > > > Having string parsing logic in the code is a little dangerous as it > > duplicates part of the DDL/DML parsing, and they can easily get out of > sync. > > > > Thanks, > > Xuefu > > > > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email] > .invalid> > > wrote: > > > > > > Thanks dawid, +1 for this approach. > > > > One concern is the removal of registerTableSink & registerTableSource > > in TableEnvironment. It has two alternatives: > > 1.the properties approach (DDL, descriptor). > > 2.from/toDataStream. > > > > #1 can only be properties, not java states, and some Connectors > > are difficult to convert all states to properties. > > #2 can contain java state. But can't use TableSource-related features, > > like project & filter push down, partition support, etc.. > > > > Any idea about this? > > > > Best, > > Jingsong Lee > > > > > > ------------------------------------------------------------------ > > From:Dawid Wysakowicz <[hidden email]> > > Send Time:2019年9月4日(星期三) 22:20 > > To:dev <[hidden email]> > > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module > > > > Hi all, > > As part of FLIP-30 a Catalog API was introduced that enables storing > table > > meta objects permanently. At the same time the majority of current APIs > > create temporary objects that cannot be serialized. We should clarify the > > creation of meta objects (tables, views, functions) in a unified way. > > Another current problem in the API is that all the temporary objects are > > stored in a special built-in catalog, which is not very intuitive for > many > > users, as they must be aware of that catalog to reference temporary > objects. > > Lastly, different APIs have different ways of providing object paths: > > > > String path…, > > String path, String pathContinued… > > String name > > We should choose one approach and unify it across all APIs. > > I suggest a FLIP to address the above issues. > > Looking forward to your opinions. > > FLIP link: > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > |
Hi dawid:
Can temporary tables achieve the same capabilities as catalog table? like statistics: CatalogTableStatistics, CatalogColumnStatistics, PartitionStatistics like partition support: we have added some catalog equivalent interfaces on TableSource/TableSink: getPartitions, getPartitionFieldNames Maybe it's not a good idea to add these interfaces to TableSource/TableSink. What do you think? Best, Jingsong Lee ------------------------------------------------------------------ From:Kurt Young <[hidden email]> Send Time:2019年9月18日(星期三) 17:54 To:dev <[hidden email]> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module Hi all, Sorry to join this party late. Big +1 to this flip, especially for the dropping "registerTableSink & registerTableSource" part. These are indeed legacy and we should try to unify them through CatalogTable after we introduce the concept of Catalog. From my understanding, what we can registered should all be metadata, TableSource/TableSink should only be the one who is responsible to do the real work, i.e. reading and writing data according to the schema and other information like computed column, partition, .e.g. Best, Kurt On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email]> wrote: > After some development and thinking, I have a general understanding. > +1 to registering a source/sink does not fit into the SQL world. > I am OK to have a deprecated registerTemporarySource/Sink to compatible > with old ways. > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Timo Walther <[hidden email]> > Send Time:2019年9月17日(星期二) 08:00 > To:dev <[hidden email]> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > module > > Hi Dawid, > > thanks for the design document. It fixes big concept gaps due to > historical reasons with proper support for serializability and catalog > support in mind. > > I would not mind a registerTemporarySource/Sink, but the problem that I > see is that many people think that this is the recommended way of > registering a table source/sink which is not true. We should guide users > to either use connect() or DDL API which can be validated and stored in > catalog. > > Also from a concept perspective, registering a source/sink does not fit > into the SQL world. SQL does not know about source/sinks but only about > tables. If the responsibility of a TableSource/TableSink is just a pure > physical data consumer/producer that is not connected to the actual > logical table schema, we would need a possibility of defining time > attributes and interpreting/converting a changelog. This should be done > by the framework with information from the DDL/connect() and not be > defined in every table source. > > Regards, > Timo > > > On 09.09.19 14:16, JingsongLee wrote: > > Hi dawid: > > > > It is difficult to describe specific examples. > > Sometimes users will generate some java converters through some > > Java code, or generate some Java classes through third-party > > libraries. Of course, these can be best done through properties. > > But this requires additional work from users.My suggestion is to > > keep this Java instance class way that is user-friendly. > > > > Best, > > Jingsong Lee > > > > > > ------------------------------------------------------------------ > > From:Dawid Wysakowicz <[hidden email]> > > Send Time:2019年9月6日(星期五) 16:21 > > To:dev <[hidden email]> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > module > > > > Hi all, > > @Jingsong Could you elaborate a bit more what do you mean by > > "some Connectors are difficult to convert all states to properties" > > All the Flink provided connectors will definitely be expressible with > properties (In the end you should be able to use them from DDL). I think if > a TableSource is complex enough that it handles filter push down, partition > support etc. should rather be made available both from DDL & java/scala > code. I'm happy to reconsider adding registerTemporaryTable(String path, > TableSource source) if you have some concrete examples in mind. > > > > > > @Xuefu: We also considered the ObjectIdentifier (or actually introducing > a new identifier representation to differentiate between resolved and > unresolved identifiers) with the same concerns. We decided to suggest the > string & parsing logic because of usability. > > tEnv.from("cat.db.table") > > is shorter and easier to write than > > tEnv.from(Identifier.for("cat", "db", "name") > > And also implicitly solves the problem what happens if a user (e.g. used > to other systems) uses that API in a following manner: > > tEnv.from(Identifier.for("db.name") > > I'm happy to revisit it if the general consensus is that it's better to > use the OO aproach. > > Best, > > Dawid > > > > On 06/09/2019 10:00, Xuefu Z wrote: > > > > Thanks to Dawid for starting the discussion and writeup. It looks pretty > > good to me except that I'm a little concerned about the object reference > > and string parsing in the code, which seems to an anti-pattern to OOP. > Have > > we considered using ObjectIdenitifier with optional catalog and db parts, > > esp. if we are worried about arguments of variable length or method > > overloading? It's quite likely that the result of string parsing is an > > ObjectIdentifier instance any way. > > > > Having string parsing logic in the code is a little dangerous as it > > duplicates part of the DDL/DML parsing, and they can easily get out of > sync. > > > > Thanks, > > Xuefu > > > > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email] > .invalid> > > wrote: > > > > > > Thanks dawid, +1 for this approach. > > > > One concern is the removal of registerTableSink & registerTableSource > > in TableEnvironment. It has two alternatives: > > 1.the properties approach (DDL, descriptor). > > 2.from/toDataStream. > > > > #1 can only be properties, not java states, and some Connectors > > are difficult to convert all states to properties. > > #2 can contain java state. But can't use TableSource-related features, > > like project & filter push down, partition support, etc.. > > > > Any idea about this? > > > > Best, > > Jingsong Lee > > > > > > ------------------------------------------------------------------ > > From:Dawid Wysakowicz <[hidden email]> > > Send Time:2019年9月4日(星期三) 22:20 > > To:dev <[hidden email]> > > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module > > > > Hi all, > > As part of FLIP-30 a Catalog API was introduced that enables storing > table > > meta objects permanently. At the same time the majority of current APIs > > create temporary objects that cannot be serialized. We should clarify the > > creation of meta objects (tables, views, functions) in a unified way. > > Another current problem in the API is that all the temporary objects are > > stored in a special built-in catalog, which is not very intuitive for > many > > users, as they must be aware of that catalog to reference temporary > objects. > > Lastly, different APIs have different ways of providing object paths: > > > > String path…, > > String path, String pathContinued… > > String name > > We should choose one approach and unify it across all APIs. > > I suggest a FLIP to address the above issues. > > Looking forward to your opinions. > > FLIP link: > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > |
Hi JingsongLee,
From my understanding they can. Underneath they will be CatalogTables. The difference is the lifetime of the tables. Plus some of the user facing interfaces cannot be persisted e.g. datastream. Therefore we must have a separate methods for that. In the end the temporary tables are held in memory as CatalogTables. Best, Dawid On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]> wrote: > Hi dawid: > Can temporary tables achieve the same capabilities as catalog table? > like statistics: CatalogTableStatistics, CatalogColumnStatistics, > PartitionStatistics > like partition support: we have added some catalog equivalent interfaces > on TableSource/TableSink: getPartitions, getPartitionFieldNames > Maybe it's not a good idea to add these interfaces to > TableSource/TableSink. What do you think? > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Kurt Young <[hidden email]> > Send Time:2019年9月18日(星期三) 17:54 > To:dev <[hidden email]> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > module > > Hi all, > > Sorry to join this party late. Big +1 to this flip, especially for the > dropping > "registerTableSink & registerTableSource" part. These are indeed legacy > and we should try to unify them through CatalogTable after we introduce > the concept of Catalog. > > From my understanding, what we can registered should all be metadata, > TableSource/TableSink should only be the one who is responsible to do > the real work, i.e. reading and writing data according to the schema and > other information like computed column, partition, .e.g. > > Best, > Kurt > > > On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email] > .invalid> > wrote: > > > After some development and thinking, I have a general understanding. > > +1 to registering a source/sink does not fit into the SQL world. > > I am OK to have a deprecated registerTemporarySource/Sink to compatible > > with old ways. > > > > Best, > > Jingsong Lee > > > > > > ------------------------------------------------------------------ > > From:Timo Walther <[hidden email]> > > Send Time:2019年9月17日(星期二) 08:00 > > To:dev <[hidden email]> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > > module > > > > Hi Dawid, > > > > thanks for the design document. It fixes big concept gaps due to > > historical reasons with proper support for serializability and catalog > > support in mind. > > > > I would not mind a registerTemporarySource/Sink, but the problem that I > > see is that many people think that this is the recommended way of > > registering a table source/sink which is not true. We should guide users > > to either use connect() or DDL API which can be validated and stored in > > catalog. > > > > Also from a concept perspective, registering a source/sink does not fit > > into the SQL world. SQL does not know about source/sinks but only about > > tables. If the responsibility of a TableSource/TableSink is just a pure > > physical data consumer/producer that is not connected to the actual > > logical table schema, we would need a possibility of defining time > > attributes and interpreting/converting a changelog. This should be done > > by the framework with information from the DDL/connect() and not be > > defined in every table source. > > > > Regards, > > Timo > > > > > > On 09.09.19 14:16, JingsongLee wrote: > > > Hi dawid: > > > > > > It is difficult to describe specific examples. > > > Sometimes users will generate some java converters through some > > > Java code, or generate some Java classes through third-party > > > libraries. Of course, these can be best done through properties. > > > But this requires additional work from users.My suggestion is to > > > keep this Java instance class way that is user-friendly. > > > > > > Best, > > > Jingsong Lee > > > > > > > > > ------------------------------------------------------------------ > > > From:Dawid Wysakowicz <[hidden email]> > > > Send Time:2019年9月6日(星期五) 16:21 > > > To:dev <[hidden email]> > > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > > module > > > > > > Hi all, > > > @Jingsong Could you elaborate a bit more what do you mean by > > > "some Connectors are difficult to convert all states to properties" > > > All the Flink provided connectors will definitely be expressible with > > properties (In the end you should be able to use them from DDL). I think > if > > a TableSource is complex enough that it handles filter push down, > partition > > support etc. should rather be made available both from DDL & java/scala > > code. I'm happy to reconsider adding registerTemporaryTable(String path, > > TableSource source) if you have some concrete examples in mind. > > > > > > > > > @Xuefu: We also considered the ObjectIdentifier (or actually > introducing > > a new identifier representation to differentiate between resolved and > > unresolved identifiers) with the same concerns. We decided to suggest the > > string & parsing logic because of usability. > > > tEnv.from("cat.db.table") > > > is shorter and easier to write than > > > tEnv.from(Identifier.for("cat", "db", "name") > > > And also implicitly solves the problem what happens if a user (e.g. > used > > to other systems) uses that API in a following manner: > > > tEnv.from(Identifier.for("db.name") > > > I'm happy to revisit it if the general consensus is that it's better to > > use the OO aproach. > > > Best, > > > Dawid > > > > > > On 06/09/2019 10:00, Xuefu Z wrote: > > > > > > Thanks to Dawid for starting the discussion and writeup. It looks > pretty > > > good to me except that I'm a little concerned about the object > reference > > > and string parsing in the code, which seems to an anti-pattern to OOP. > > Have > > > we considered using ObjectIdenitifier with optional catalog and db > parts, > > > esp. if we are worried about arguments of variable length or method > > > overloading? It's quite likely that the result of string parsing is an > > > ObjectIdentifier instance any way. > > > > > > Having string parsing logic in the code is a little dangerous as it > > > duplicates part of the DDL/DML parsing, and they can easily get out of > > sync. > > > > > > Thanks, > > > Xuefu > > > > > > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email] > > .invalid> > > > wrote: > > > > > > > > > Thanks dawid, +1 for this approach. > > > > > > One concern is the removal of registerTableSink & registerTableSource > > > in TableEnvironment. It has two alternatives: > > > 1.the properties approach (DDL, descriptor). > > > 2.from/toDataStream. > > > > > > #1 can only be properties, not java states, and some Connectors > > > are difficult to convert all states to properties. > > > #2 can contain java state. But can't use TableSource-related features, > > > like project & filter push down, partition support, etc.. > > > > > > Any idea about this? > > > > > > Best, > > > Jingsong Lee > > > > > > > > > ------------------------------------------------------------------ > > > From:Dawid Wysakowicz <[hidden email]> > > > Send Time:2019年9月4日(星期三) 22:20 > > > To:dev <[hidden email]> > > > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table > module > > > > > > Hi all, > > > As part of FLIP-30 a Catalog API was introduced that enables storing > > table > > > meta objects permanently. At the same time the majority of current APIs > > > create temporary objects that cannot be serialized. We should clarify > the > > > creation of meta objects (tables, views, functions) in a unified way. > > > Another current problem in the API is that all the temporary objects > are > > > stored in a special built-in catalog, which is not very intuitive for > > many > > > users, as they must be aware of that catalog to reference temporary > > objects. > > > Lastly, different APIs have different ways of providing object paths: > > > > > > String path…, > > > String path, String pathContinued… > > > String name > > > We should choose one approach and unify it across all APIs. > > > I suggest a FLIP to address the above issues. > > > Looking forward to your opinions. > > > FLIP link: > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > > > > |
Thanks Dawid for the design doc.
In general, I’m +1 to the FLIP. +1 to the single-string and parse way to express object path. +1 to deprecate registerTableSink & registerTableSource. But I would suggest to provide an easy way to register a custom source/sink before we drop them (this is another story). Currently, it’s not easy to implement a custom connector descriptor. Best, Jark > 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道: > > Hi JingsongLee, > From my understanding they can. Underneath they will be CatalogTables. The > difference is the lifetime of the tables. Plus some of the user facing > interfaces cannot be persisted e.g. datastream. Therefore we must have a > separate methods for that. In the end the temporary tables are held in > memory as CatalogTables. > Best, > Dawid > > On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email]> > wrote: > >> Hi dawid: >> Can temporary tables achieve the same capabilities as catalog table? >> like statistics: CatalogTableStatistics, CatalogColumnStatistics, >> PartitionStatistics >> like partition support: we have added some catalog equivalent interfaces >> on TableSource/TableSink: getPartitions, getPartitionFieldNames >> Maybe it's not a good idea to add these interfaces to >> TableSource/TableSink. What do you think? >> >> Best, >> Jingsong Lee >> >> >> ------------------------------------------------------------------ >> From:Kurt Young <[hidden email]> >> Send Time:2019年9月18日(星期三) 17:54 >> To:dev <[hidden email]> >> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table >> module >> >> Hi all, >> >> Sorry to join this party late. Big +1 to this flip, especially for the >> dropping >> "registerTableSink & registerTableSource" part. These are indeed legacy >> and we should try to unify them through CatalogTable after we introduce >> the concept of Catalog. >> >> From my understanding, what we can registered should all be metadata, >> TableSource/TableSink should only be the one who is responsible to do >> the real work, i.e. reading and writing data according to the schema and >> other information like computed column, partition, .e.g. >> >> Best, >> Kurt >> >> >> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email] >> .invalid> >> wrote: >> >>> After some development and thinking, I have a general understanding. >>> +1 to registering a source/sink does not fit into the SQL world. >>> I am OK to have a deprecated registerTemporarySource/Sink to compatible >>> with old ways. >>> >>> Best, >>> Jingsong Lee >>> >>> >>> ------------------------------------------------------------------ >>> From:Timo Walther <[hidden email]> >>> Send Time:2019年9月17日(星期二) 08:00 >>> To:dev <[hidden email]> >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table >>> module >>> >>> Hi Dawid, >>> >>> thanks for the design document. It fixes big concept gaps due to >>> historical reasons with proper support for serializability and catalog >>> support in mind. >>> >>> I would not mind a registerTemporarySource/Sink, but the problem that I >>> see is that many people think that this is the recommended way of >>> registering a table source/sink which is not true. We should guide users >>> to either use connect() or DDL API which can be validated and stored in >>> catalog. >>> >>> Also from a concept perspective, registering a source/sink does not fit >>> into the SQL world. SQL does not know about source/sinks but only about >>> tables. If the responsibility of a TableSource/TableSink is just a pure >>> physical data consumer/producer that is not connected to the actual >>> logical table schema, we would need a possibility of defining time >>> attributes and interpreting/converting a changelog. This should be done >>> by the framework with information from the DDL/connect() and not be >>> defined in every table source. >>> >>> Regards, >>> Timo >>> >>> >>> On 09.09.19 14:16, JingsongLee wrote: >>>> Hi dawid: >>>> >>>> It is difficult to describe specific examples. >>>> Sometimes users will generate some java converters through some >>>> Java code, or generate some Java classes through third-party >>>> libraries. Of course, these can be best done through properties. >>>> But this requires additional work from users.My suggestion is to >>>> keep this Java instance class way that is user-friendly. >>>> >>>> Best, >>>> Jingsong Lee >>>> >>>> >>>> ------------------------------------------------------------------ >>>> From:Dawid Wysakowicz <[hidden email]> >>>> Send Time:2019年9月6日(星期五) 16:21 >>>> To:dev <[hidden email]> >>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table >>> module >>>> >>>> Hi all, >>>> @Jingsong Could you elaborate a bit more what do you mean by >>>> "some Connectors are difficult to convert all states to properties" >>>> All the Flink provided connectors will definitely be expressible with >>> properties (In the end you should be able to use them from DDL). I think >> if >>> a TableSource is complex enough that it handles filter push down, >> partition >>> support etc. should rather be made available both from DDL & java/scala >>> code. I'm happy to reconsider adding registerTemporaryTable(String path, >>> TableSource source) if you have some concrete examples in mind. >>>> >>>> >>>> @Xuefu: We also considered the ObjectIdentifier (or actually >> introducing >>> a new identifier representation to differentiate between resolved and >>> unresolved identifiers) with the same concerns. We decided to suggest the >>> string & parsing logic because of usability. >>>> tEnv.from("cat.db.table") >>>> is shorter and easier to write than >>>> tEnv.from(Identifier.for("cat", "db", "name") >>>> And also implicitly solves the problem what happens if a user (e.g. >> used >>> to other systems) uses that API in a following manner: >>>> tEnv.from(Identifier.for("db.name") >>>> I'm happy to revisit it if the general consensus is that it's better to >>> use the OO aproach. >>>> Best, >>>> Dawid >>>> >>>> On 06/09/2019 10:00, Xuefu Z wrote: >>>> >>>> Thanks to Dawid for starting the discussion and writeup. It looks >> pretty >>>> good to me except that I'm a little concerned about the object >> reference >>>> and string parsing in the code, which seems to an anti-pattern to OOP. >>> Have >>>> we considered using ObjectIdenitifier with optional catalog and db >> parts, >>>> esp. if we are worried about arguments of variable length or method >>>> overloading? It's quite likely that the result of string parsing is an >>>> ObjectIdentifier instance any way. >>>> >>>> Having string parsing logic in the code is a little dangerous as it >>>> duplicates part of the DDL/DML parsing, and they can easily get out of >>> sync. >>>> >>>> Thanks, >>>> Xuefu >>>> >>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email] >>> .invalid> >>>> wrote: >>>> >>>> >>>> Thanks dawid, +1 for this approach. >>>> >>>> One concern is the removal of registerTableSink & registerTableSource >>>> in TableEnvironment. It has two alternatives: >>>> 1.the properties approach (DDL, descriptor). >>>> 2.from/toDataStream. >>>> >>>> #1 can only be properties, not java states, and some Connectors >>>> are difficult to convert all states to properties. >>>> #2 can contain java state. But can't use TableSource-related features, >>>> like project & filter push down, partition support, etc.. >>>> >>>> Any idea about this? >>>> >>>> Best, >>>> Jingsong Lee >>>> >>>> >>>> ------------------------------------------------------------------ >>>> From:Dawid Wysakowicz <[hidden email]> >>>> Send Time:2019年9月4日(星期三) 22:20 >>>> To:dev <[hidden email]> >>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table >> module >>>> >>>> Hi all, >>>> As part of FLIP-30 a Catalog API was introduced that enables storing >>> table >>>> meta objects permanently. At the same time the majority of current APIs >>>> create temporary objects that cannot be serialized. We should clarify >> the >>>> creation of meta objects (tables, views, functions) in a unified way. >>>> Another current problem in the API is that all the temporary objects >> are >>>> stored in a special built-in catalog, which is not very intuitive for >>> many >>>> users, as they must be aware of that catalog to reference temporary >>> objects. >>>> Lastly, different APIs have different ways of providing object paths: >>>> >>>> String path…, >>>> String path, String pathContinued… >>>> String name >>>> We should choose one approach and unify it across all APIs. >>>> I suggest a FLIP to address the above issues. >>>> Looking forward to your opinions. >>>> FLIP link: >>>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >>>> >> >> |
IIUC it's good to see that both serializable (tables description from DDL)
and unserializable (tables with DataStream underneath) tables are treated unify with CatalogTable. Can I also assume functions that either come from a function class (from DDL) or function objects (newed by user) will also treated unify with CatalogFunction? This will greatly simplify and unify current API level concepts and design. And it seems only one thing left, how do we deal with ConnectTableDescriptor? It's actually very similar with serializable CatalogTable, both carry some text properties which even are the same. Is there any chance we can further unify this to CatalogTable? object Best, Kurt On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote: > Thanks Dawid for the design doc. > > In general, I’m +1 to the FLIP. > > > +1 to the single-string and parse way to express object path. > > +1 to deprecate registerTableSink & registerTableSource. > But I would suggest to provide an easy way to register a custom > source/sink before we drop them (this is another story). > Currently, it’s not easy to implement a custom connector descriptor. > > Best, > Jark > > > > 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道: > > > > Hi JingsongLee, > > From my understanding they can. Underneath they will be CatalogTables. > The > > difference is the lifetime of the tables. Plus some of the user facing > > interfaces cannot be persisted e.g. datastream. Therefore we must have a > > separate methods for that. In the end the temporary tables are held in > > memory as CatalogTables. > > Best, > > Dawid > > > > On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] > .invalid> > > wrote: > > > >> Hi dawid: > >> Can temporary tables achieve the same capabilities as catalog table? > >> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >> PartitionStatistics > >> like partition support: we have added some catalog equivalent interfaces > >> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >> Maybe it's not a good idea to add these interfaces to > >> TableSource/TableSink. What do you think? > >> > >> Best, > >> Jingsong Lee > >> > >> > >> ------------------------------------------------------------------ > >> From:Kurt Young <[hidden email]> > >> Send Time:2019年9月18日(星期三) 17:54 > >> To:dev <[hidden email]> > >> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > >> module > >> > >> Hi all, > >> > >> Sorry to join this party late. Big +1 to this flip, especially for the > >> dropping > >> "registerTableSink & registerTableSource" part. These are indeed legacy > >> and we should try to unify them through CatalogTable after we introduce > >> the concept of Catalog. > >> > >> From my understanding, what we can registered should all be metadata, > >> TableSource/TableSink should only be the one who is responsible to do > >> the real work, i.e. reading and writing data according to the schema and > >> other information like computed column, partition, .e.g. > >> > >> Best, > >> Kurt > >> > >> > >> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email] > >> .invalid> > >> wrote: > >> > >>> After some development and thinking, I have a general understanding. > >>> +1 to registering a source/sink does not fit into the SQL world. > >>> I am OK to have a deprecated registerTemporarySource/Sink to compatible > >>> with old ways. > >>> > >>> Best, > >>> Jingsong Lee > >>> > >>> > >>> ------------------------------------------------------------------ > >>> From:Timo Walther <[hidden email]> > >>> Send Time:2019年9月17日(星期二) 08:00 > >>> To:dev <[hidden email]> > >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>> module > >>> > >>> Hi Dawid, > >>> > >>> thanks for the design document. It fixes big concept gaps due to > >>> historical reasons with proper support for serializability and catalog > >>> support in mind. > >>> > >>> I would not mind a registerTemporarySource/Sink, but the problem that I > >>> see is that many people think that this is the recommended way of > >>> registering a table source/sink which is not true. We should guide > users > >>> to either use connect() or DDL API which can be validated and stored in > >>> catalog. > >>> > >>> Also from a concept perspective, registering a source/sink does not fit > >>> into the SQL world. SQL does not know about source/sinks but only about > >>> tables. If the responsibility of a TableSource/TableSink is just a pure > >>> physical data consumer/producer that is not connected to the actual > >>> logical table schema, we would need a possibility of defining time > >>> attributes and interpreting/converting a changelog. This should be done > >>> by the framework with information from the DDL/connect() and not be > >>> defined in every table source. > >>> > >>> Regards, > >>> Timo > >>> > >>> > >>> On 09.09.19 14:16, JingsongLee wrote: > >>>> Hi dawid: > >>>> > >>>> It is difficult to describe specific examples. > >>>> Sometimes users will generate some java converters through some > >>>> Java code, or generate some Java classes through third-party > >>>> libraries. Of course, these can be best done through properties. > >>>> But this requires additional work from users.My suggestion is to > >>>> keep this Java instance class way that is user-friendly. > >>>> > >>>> Best, > >>>> Jingsong Lee > >>>> > >>>> > >>>> ------------------------------------------------------------------ > >>>> From:Dawid Wysakowicz <[hidden email]> > >>>> Send Time:2019年9月6日(星期五) 16:21 > >>>> To:dev <[hidden email]> > >>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>> module > >>>> > >>>> Hi all, > >>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>> "some Connectors are difficult to convert all states to properties" > >>>> All the Flink provided connectors will definitely be expressible with > >>> properties (In the end you should be able to use them from DDL). I > think > >> if > >>> a TableSource is complex enough that it handles filter push down, > >> partition > >>> support etc. should rather be made available both from DDL & java/scala > >>> code. I'm happy to reconsider adding registerTemporaryTable(String > path, > >>> TableSource source) if you have some concrete examples in mind. > >>>> > >>>> > >>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >> introducing > >>> a new identifier representation to differentiate between resolved and > >>> unresolved identifiers) with the same concerns. We decided to suggest > the > >>> string & parsing logic because of usability. > >>>> tEnv.from("cat.db.table") > >>>> is shorter and easier to write than > >>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>> And also implicitly solves the problem what happens if a user (e.g. > >> used > >>> to other systems) uses that API in a following manner: > >>>> tEnv.from(Identifier.for("db.name") > >>>> I'm happy to revisit it if the general consensus is that it's better > to > >>> use the OO aproach. > >>>> Best, > >>>> Dawid > >>>> > >>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>> > >>>> Thanks to Dawid for starting the discussion and writeup. It looks > >> pretty > >>>> good to me except that I'm a little concerned about the object > >> reference > >>>> and string parsing in the code, which seems to an anti-pattern to OOP. > >>> Have > >>>> we considered using ObjectIdenitifier with optional catalog and db > >> parts, > >>>> esp. if we are worried about arguments of variable length or method > >>>> overloading? It's quite likely that the result of string parsing is an > >>>> ObjectIdentifier instance any way. > >>>> > >>>> Having string parsing logic in the code is a little dangerous as it > >>>> duplicates part of the DDL/DML parsing, and they can easily get out of > >>> sync. > >>>> > >>>> Thanks, > >>>> Xuefu > >>>> > >>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email] > >>> .invalid> > >>>> wrote: > >>>> > >>>> > >>>> Thanks dawid, +1 for this approach. > >>>> > >>>> One concern is the removal of registerTableSink & registerTableSource > >>>> in TableEnvironment. It has two alternatives: > >>>> 1.the properties approach (DDL, descriptor). > >>>> 2.from/toDataStream. > >>>> > >>>> #1 can only be properties, not java states, and some Connectors > >>>> are difficult to convert all states to properties. > >>>> #2 can contain java state. But can't use TableSource-related features, > >>>> like project & filter push down, partition support, etc.. > >>>> > >>>> Any idea about this? > >>>> > >>>> Best, > >>>> Jingsong Lee > >>>> > >>>> > >>>> ------------------------------------------------------------------ > >>>> From:Dawid Wysakowicz <[hidden email]> > >>>> Send Time:2019年9月4日(星期三) 22:20 > >>>> To:dev <[hidden email]> > >>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table > >> module > >>>> > >>>> Hi all, > >>>> As part of FLIP-30 a Catalog API was introduced that enables storing > >>> table > >>>> meta objects permanently. At the same time the majority of current > APIs > >>>> create temporary objects that cannot be serialized. We should clarify > >> the > >>>> creation of meta objects (tables, views, functions) in a unified way. > >>>> Another current problem in the API is that all the temporary objects > >> are > >>>> stored in a special built-in catalog, which is not very intuitive for > >>> many > >>>> users, as they must be aware of that catalog to reference temporary > >>> objects. > >>>> Lastly, different APIs have different ways of providing object paths: > >>>> > >>>> String path…, > >>>> String path, String pathContinued… > >>>> String name > >>>> We should choose one approach and unify it across all APIs. > >>>> I suggest a FLIP to address the above issues. > >>>> Looking forward to your opinions. > >>>> FLIP link: > >>>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >>>> > >> > >> > > |
Sorry for a very delayed response.
@Kurt Yes, this is the goal to have a function created like new Function(...) also be wrapped into CatalogFunction. This would have to be though a temporary function as we cannot represent that as a set of properties. Similar to the createTemporaryView(DataStream stream). As for the ConnectTableDescriptor I agree this is very similar to CatalogTable. I am not sure though if we should get rid of it. In the end I see it as a builder for a CatalogTable, which is a slightly more internal API, but we might revisit that some time in the future if we find that it makes more sense. @All I updated the FLIP page with some more details from the outcome of the discussions around FLIP-57. Please take a look. I would like to start a vote on this FLIP as soon as the vote on FLIP-57 goes through. Best, Dawid On 19/09/2019 09:24, Kurt Young wrote: > IIUC it's good to see that both serializable (tables description from DDL) > and unserializable (tables with DataStream underneath) tables are treated > unify with CatalogTable. > > Can I also assume functions that either come from a function class (from > DDL) > or function objects (newed by user) will also treated unify with > CatalogFunction? > > This will greatly simplify and unify current API level concepts and design. > > And it seems only one thing left, how do we deal with > ConnectTableDescriptor? > It's actually very similar with serializable CatalogTable, both carry some > text > properties which even are the same. Is there any chance we can further unify > this to CatalogTable? > > object > Best, > Kurt > > > On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote: > >> Thanks Dawid for the design doc. >> >> In general, I’m +1 to the FLIP. >> >> >> +1 to the single-string and parse way to express object path. >> >> +1 to deprecate registerTableSink & registerTableSource. >> But I would suggest to provide an easy way to register a custom >> source/sink before we drop them (this is another story). >> Currently, it’s not easy to implement a custom connector descriptor. >> >> Best, >> Jark >> >> >>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道: >>> >>> Hi JingsongLee, >>> From my understanding they can. Underneath they will be CatalogTables. >> The >>> difference is the lifetime of the tables. Plus some of the user facing >>> interfaces cannot be persisted e.g. datastream. Therefore we must have a >>> separate methods for that. In the end the temporary tables are held in >>> memory as CatalogTables. >>> Best, >>> Dawid >>> >>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] >> .invalid> >>> wrote: >>> >>>> Hi dawid: >>>> Can temporary tables achieve the same capabilities as catalog table? >>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, >>>> PartitionStatistics >>>> like partition support: we have added some catalog equivalent interfaces >>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames >>>> Maybe it's not a good idea to add these interfaces to >>>> TableSource/TableSink. What do you think? >>>> >>>> Best, >>>> Jingsong Lee >>>> >>>> >>>> ------------------------------------------------------------------ >>>> From:Kurt Young <[hidden email]> >>>> Send Time:2019年9月18日(星期三) 17:54 >>>> To:dev <[hidden email]> >>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table >>>> module >>>> >>>> Hi all, >>>> >>>> Sorry to join this party late. Big +1 to this flip, especially for the >>>> dropping >>>> "registerTableSink & registerTableSource" part. These are indeed legacy >>>> and we should try to unify them through CatalogTable after we introduce >>>> the concept of Catalog. >>>> >>>> From my understanding, what we can registered should all be metadata, >>>> TableSource/TableSink should only be the one who is responsible to do >>>> the real work, i.e. reading and writing data according to the schema and >>>> other information like computed column, partition, .e.g. >>>> >>>> Best, >>>> Kurt >>>> >>>> >>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email] >>>> .invalid> >>>> wrote: >>>> >>>>> After some development and thinking, I have a general understanding. >>>>> +1 to registering a source/sink does not fit into the SQL world. >>>>> I am OK to have a deprecated registerTemporarySource/Sink to compatible >>>>> with old ways. >>>>> >>>>> Best, >>>>> Jingsong Lee >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> From:Timo Walther <[hidden email]> >>>>> Send Time:2019年9月17日(星期二) 08:00 >>>>> To:dev <[hidden email]> >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table >>>>> module >>>>> >>>>> Hi Dawid, >>>>> >>>>> thanks for the design document. It fixes big concept gaps due to >>>>> historical reasons with proper support for serializability and catalog >>>>> support in mind. >>>>> >>>>> I would not mind a registerTemporarySource/Sink, but the problem that I >>>>> see is that many people think that this is the recommended way of >>>>> registering a table source/sink which is not true. We should guide >> users >>>>> to either use connect() or DDL API which can be validated and stored in >>>>> catalog. >>>>> >>>>> Also from a concept perspective, registering a source/sink does not fit >>>>> into the SQL world. SQL does not know about source/sinks but only about >>>>> tables. If the responsibility of a TableSource/TableSink is just a pure >>>>> physical data consumer/producer that is not connected to the actual >>>>> logical table schema, we would need a possibility of defining time >>>>> attributes and interpreting/converting a changelog. This should be done >>>>> by the framework with information from the DDL/connect() and not be >>>>> defined in every table source. >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> >>>>> On 09.09.19 14:16, JingsongLee wrote: >>>>>> Hi dawid: >>>>>> >>>>>> It is difficult to describe specific examples. >>>>>> Sometimes users will generate some java converters through some >>>>>> Java code, or generate some Java classes through third-party >>>>>> libraries. Of course, these can be best done through properties. >>>>>> But this requires additional work from users.My suggestion is to >>>>>> keep this Java instance class way that is user-friendly. >>>>>> >>>>>> Best, >>>>>> Jingsong Lee >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> From:Dawid Wysakowicz <[hidden email]> >>>>>> Send Time:2019年9月6日(星期五) 16:21 >>>>>> To:dev <[hidden email]> >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table >>>>> module >>>>>> Hi all, >>>>>> @Jingsong Could you elaborate a bit more what do you mean by >>>>>> "some Connectors are difficult to convert all states to properties" >>>>>> All the Flink provided connectors will definitely be expressible with >>>>> properties (In the end you should be able to use them from DDL). I >> think >>>> if >>>>> a TableSource is complex enough that it handles filter push down, >>>> partition >>>>> support etc. should rather be made available both from DDL & java/scala >>>>> code. I'm happy to reconsider adding registerTemporaryTable(String >> path, >>>>> TableSource source) if you have some concrete examples in mind. >>>>>> >>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually >>>> introducing >>>>> a new identifier representation to differentiate between resolved and >>>>> unresolved identifiers) with the same concerns. We decided to suggest >> the >>>>> string & parsing logic because of usability. >>>>>> tEnv.from("cat.db.table") >>>>>> is shorter and easier to write than >>>>>> tEnv.from(Identifier.for("cat", "db", "name") >>>>>> And also implicitly solves the problem what happens if a user (e.g. >>>> used >>>>> to other systems) uses that API in a following manner: >>>>>> tEnv.from(Identifier.for("db.name") >>>>>> I'm happy to revisit it if the general consensus is that it's better >> to >>>>> use the OO aproach. >>>>>> Best, >>>>>> Dawid >>>>>> >>>>>> On 06/09/2019 10:00, Xuefu Z wrote: >>>>>> >>>>>> Thanks to Dawid for starting the discussion and writeup. It looks >>>> pretty >>>>>> good to me except that I'm a little concerned about the object >>>> reference >>>>>> and string parsing in the code, which seems to an anti-pattern to OOP. >>>>> Have >>>>>> we considered using ObjectIdenitifier with optional catalog and db >>>> parts, >>>>>> esp. if we are worried about arguments of variable length or method >>>>>> overloading? It's quite likely that the result of string parsing is an >>>>>> ObjectIdentifier instance any way. >>>>>> >>>>>> Having string parsing logic in the code is a little dangerous as it >>>>>> duplicates part of the DDL/DML parsing, and they can easily get out of >>>>> sync. >>>>>> Thanks, >>>>>> Xuefu >>>>>> >>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email] >>>>> .invalid> >>>>>> wrote: >>>>>> >>>>>> >>>>>> Thanks dawid, +1 for this approach. >>>>>> >>>>>> One concern is the removal of registerTableSink & registerTableSource >>>>>> in TableEnvironment. It has two alternatives: >>>>>> 1.the properties approach (DDL, descriptor). >>>>>> 2.from/toDataStream. >>>>>> >>>>>> #1 can only be properties, not java states, and some Connectors >>>>>> are difficult to convert all states to properties. >>>>>> #2 can contain java state. But can't use TableSource-related features, >>>>>> like project & filter push down, partition support, etc.. >>>>>> >>>>>> Any idea about this? >>>>>> >>>>>> Best, >>>>>> Jingsong Lee >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> From:Dawid Wysakowicz <[hidden email]> >>>>>> Send Time:2019年9月4日(星期三) 22:20 >>>>>> To:dev <[hidden email]> >>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table >>>> module >>>>>> Hi all, >>>>>> As part of FLIP-30 a Catalog API was introduced that enables storing >>>>> table >>>>>> meta objects permanently. At the same time the majority of current >> APIs >>>>>> create temporary objects that cannot be serialized. We should clarify >>>> the >>>>>> creation of meta objects (tables, views, functions) in a unified way. >>>>>> Another current problem in the API is that all the temporary objects >>>> are >>>>>> stored in a special built-in catalog, which is not very intuitive for >>>>> many >>>>>> users, as they must be aware of that catalog to reference temporary >>>>> objects. >>>>>> Lastly, different APIs have different ways of providing object paths: >>>>>> >>>>>> String path…, >>>>>> String path, String pathContinued… >>>>>> String name >>>>>> We should choose one approach and unify it across all APIs. >>>>>> I suggest a FLIP to address the above issues. >>>>>> Looking forward to your opinions. >>>>>> FLIP link: >>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >>>> >> signature.asc (849 bytes) Download Attachment |
Hi Dawid,
+1 for proposed changes On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email]> wrote: > Sorry for a very delayed response. > > @Kurt Yes, this is the goal to have a function created like new > Function(...) also be wrapped into CatalogFunction. This would have to > be though a temporary function as we cannot represent that as a set of > properties. Similar to the createTemporaryView(DataStream stream). > > As for the ConnectTableDescriptor I agree this is very similar to > CatalogTable. I am not sure though if we should get rid of it. In the > end I see it as a builder for a CatalogTable, which is a slightly more > internal API, but we might revisit that some time in the future if we > find that it makes more sense. > > @All I updated the FLIP page with some more details from the outcome of > the discussions around FLIP-57. Please take a look. I would like to > start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > > Best, > > Dawid > > > On 19/09/2019 09:24, Kurt Young wrote: > > IIUC it's good to see that both serializable (tables description from > DDL) > > and unserializable (tables with DataStream underneath) tables are treated > > unify with CatalogTable. > > > > Can I also assume functions that either come from a function class (from > > DDL) > > or function objects (newed by user) will also treated unify with > > CatalogFunction? > > > > This will greatly simplify and unify current API level concepts and > design. > > > > And it seems only one thing left, how do we deal with > > ConnectTableDescriptor? > > It's actually very similar with serializable CatalogTable, both carry > some > > text > > properties which even are the same. Is there any chance we can further > unify > > this to CatalogTable? > > > > object > > Best, > > Kurt > > > > > > On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote: > > > >> Thanks Dawid for the design doc. > >> > >> In general, I’m +1 to the FLIP. > >> > >> > >> +1 to the single-string and parse way to express object path. > >> > >> +1 to deprecate registerTableSink & registerTableSource. > >> But I would suggest to provide an easy way to register a custom > >> source/sink before we drop them (this is another story). > >> Currently, it’s not easy to implement a custom connector descriptor. > >> > >> Best, > >> Jark > >> > >> > >>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道: > >>> > >>> Hi JingsongLee, > >>> From my understanding they can. Underneath they will be CatalogTables. > >> The > >>> difference is the lifetime of the tables. Plus some of the user facing > >>> interfaces cannot be persisted e.g. datastream. Therefore we must have > a > >>> separate methods for that. In the end the temporary tables are held in > >>> memory as CatalogTables. > >>> Best, > >>> Dawid > >>> > >>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] > >> .invalid> > >>> wrote: > >>> > >>>> Hi dawid: > >>>> Can temporary tables achieve the same capabilities as catalog table? > >>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>>> PartitionStatistics > >>>> like partition support: we have added some catalog equivalent > interfaces > >>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>>> Maybe it's not a good idea to add these interfaces to > >>>> TableSource/TableSink. What do you think? > >>>> > >>>> Best, > >>>> Jingsong Lee > >>>> > >>>> > >>>> ------------------------------------------------------------------ > >>>> From:Kurt Young <[hidden email]> > >>>> Send Time:2019年9月18日(星期三) 17:54 > >>>> To:dev <[hidden email]> > >>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>>> module > >>>> > >>>> Hi all, > >>>> > >>>> Sorry to join this party late. Big +1 to this flip, especially for the > >>>> dropping > >>>> "registerTableSink & registerTableSource" part. These are indeed > legacy > >>>> and we should try to unify them through CatalogTable after we > introduce > >>>> the concept of Catalog. > >>>> > >>>> From my understanding, what we can registered should all be metadata, > >>>> TableSource/TableSink should only be the one who is responsible to do > >>>> the real work, i.e. reading and writing data according to the schema > and > >>>> other information like computed column, partition, .e.g. > >>>> > >>>> Best, > >>>> Kurt > >>>> > >>>> > >>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email] > >>>> .invalid> > >>>> wrote: > >>>> > >>>>> After some development and thinking, I have a general understanding. > >>>>> +1 to registering a source/sink does not fit into the SQL world. > >>>>> I am OK to have a deprecated registerTemporarySource/Sink to > compatible > >>>>> with old ways. > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> From:Timo Walther <[hidden email]> > >>>>> Send Time:2019年9月17日(星期二) 08:00 > >>>>> To:dev <[hidden email]> > >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>>>> module > >>>>> > >>>>> Hi Dawid, > >>>>> > >>>>> thanks for the design document. It fixes big concept gaps due to > >>>>> historical reasons with proper support for serializability and > catalog > >>>>> support in mind. > >>>>> > >>>>> I would not mind a registerTemporarySource/Sink, but the problem > that I > >>>>> see is that many people think that this is the recommended way of > >>>>> registering a table source/sink which is not true. We should guide > >> users > >>>>> to either use connect() or DDL API which can be validated and stored > in > >>>>> catalog. > >>>>> > >>>>> Also from a concept perspective, registering a source/sink does not > fit > >>>>> into the SQL world. SQL does not know about source/sinks but only > about > >>>>> tables. If the responsibility of a TableSource/TableSink is just a > pure > >>>>> physical data consumer/producer that is not connected to the actual > >>>>> logical table schema, we would need a possibility of defining time > >>>>> attributes and interpreting/converting a changelog. This should be > done > >>>>> by the framework with information from the DDL/connect() and not be > >>>>> defined in every table source. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 09.09.19 14:16, JingsongLee wrote: > >>>>>> Hi dawid: > >>>>>> > >>>>>> It is difficult to describe specific examples. > >>>>>> Sometimes users will generate some java converters through some > >>>>>> Java code, or generate some Java classes through third-party > >>>>>> libraries. Of course, these can be best done through properties. > >>>>>> But this requires additional work from users.My suggestion is to > >>>>>> keep this Java instance class way that is user-friendly. > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>> From:Dawid Wysakowicz <[hidden email]> > >>>>>> Send Time:2019年9月6日(星期五) 16:21 > >>>>>> To:dev <[hidden email]> > >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > Table > >>>>> module > >>>>>> Hi all, > >>>>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>>>> "some Connectors are difficult to convert all states to properties" > >>>>>> All the Flink provided connectors will definitely be expressible > with > >>>>> properties (In the end you should be able to use them from DDL). I > >> think > >>>> if > >>>>> a TableSource is complex enough that it handles filter push down, > >>>> partition > >>>>> support etc. should rather be made available both from DDL & > java/scala > >>>>> code. I'm happy to reconsider adding registerTemporaryTable(String > >> path, > >>>>> TableSource source) if you have some concrete examples in mind. > >>>>>> > >>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>>> introducing > >>>>> a new identifier representation to differentiate between resolved and > >>>>> unresolved identifiers) with the same concerns. We decided to suggest > >> the > >>>>> string & parsing logic because of usability. > >>>>>> tEnv.from("cat.db.table") > >>>>>> is shorter and easier to write than > >>>>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>>>> And also implicitly solves the problem what happens if a user (e.g. > >>>> used > >>>>> to other systems) uses that API in a following manner: > >>>>>> tEnv.from(Identifier.for("db.name") > >>>>>> I'm happy to revisit it if the general consensus is that it's better > >> to > >>>>> use the OO aproach. > >>>>>> Best, > >>>>>> Dawid > >>>>>> > >>>>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>>>> > >>>>>> Thanks to Dawid for starting the discussion and writeup. It looks > >>>> pretty > >>>>>> good to me except that I'm a little concerned about the object > >>>> reference > >>>>>> and string parsing in the code, which seems to an anti-pattern to > OOP. > >>>>> Have > >>>>>> we considered using ObjectIdenitifier with optional catalog and db > >>>> parts, > >>>>>> esp. if we are worried about arguments of variable length or method > >>>>>> overloading? It's quite likely that the result of string parsing is > an > >>>>>> ObjectIdentifier instance any way. > >>>>>> > >>>>>> Having string parsing logic in the code is a little dangerous as it > >>>>>> duplicates part of the DDL/DML parsing, and they can easily get out > of > >>>>> sync. > >>>>>> Thanks, > >>>>>> Xuefu > >>>>>> > >>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email] > >>>>> .invalid> > >>>>>> wrote: > >>>>>> > >>>>>> > >>>>>> Thanks dawid, +1 for this approach. > >>>>>> > >>>>>> One concern is the removal of registerTableSink & > registerTableSource > >>>>>> in TableEnvironment. It has two alternatives: > >>>>>> 1.the properties approach (DDL, descriptor). > >>>>>> 2.from/toDataStream. > >>>>>> > >>>>>> #1 can only be properties, not java states, and some Connectors > >>>>>> are difficult to convert all states to properties. > >>>>>> #2 can contain java state. But can't use TableSource-related > features, > >>>>>> like project & filter push down, partition support, etc.. > >>>>>> > >>>>>> Any idea about this? > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>> From:Dawid Wysakowicz <[hidden email]> > >>>>>> Send Time:2019年9月4日(星期三) 22:20 > >>>>>> To:dev <[hidden email]> > >>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>>> module > >>>>>> Hi all, > >>>>>> As part of FLIP-30 a Catalog API was introduced that enables storing > >>>>> table > >>>>>> meta objects permanently. At the same time the majority of current > >> APIs > >>>>>> create temporary objects that cannot be serialized. We should > clarify > >>>> the > >>>>>> creation of meta objects (tables, views, functions) in a unified > way. > >>>>>> Another current problem in the API is that all the temporary objects > >>>> are > >>>>>> stored in a special built-in catalog, which is not very intuitive > for > >>>>> many > >>>>>> users, as they must be aware of that catalog to reference temporary > >>>>> objects. > >>>>>> Lastly, different APIs have different ways of providing object > paths: > >>>>>> > >>>>>> String path…, > >>>>>> String path, String pathContinued… > >>>>>> String name > >>>>>> We should choose one approach and unify it across all APIs. > >>>>>> I suggest a FLIP to address the above issues. > >>>>>> Looking forward to your opinions. > >>>>>> FLIP link: > >>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >>>> > >> > > |
Hi everyone,
thanks for the great discussion with a good outcome. I have one last comment about: void createTemporaryFunction(String path, UserDefinedFunction function); We should encourage users to register a class instead of an instance. We should enforce a default constructor in the future. If we need metadata or type inference, the planner can simply instantiate the class and call the corresponding getters. If people would like to parameterize a function, they can use global job parameters instead - which are available in the open() method. I suggest: void createTemporaryFunction(String path, Class<? extends UserDefinedFunction> functionClass); This is also closer to the SQL DDL that is about to be implemented in: https://issues.apache.org/jira/browse/FLINK-7151 Thanks, Timo On 09.10.19 17:07, Bowen Li wrote: > Hi Dawid, > > +1 for proposed changes > > On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email]> > wrote: > >> Sorry for a very delayed response. >> >> @Kurt Yes, this is the goal to have a function created like new >> Function(...) also be wrapped into CatalogFunction. This would have to >> be though a temporary function as we cannot represent that as a set of >> properties. Similar to the createTemporaryView(DataStream stream). >> >> As for the ConnectTableDescriptor I agree this is very similar to >> CatalogTable. I am not sure though if we should get rid of it. In the >> end I see it as a builder for a CatalogTable, which is a slightly more >> internal API, but we might revisit that some time in the future if we >> find that it makes more sense. >> >> @All I updated the FLIP page with some more details from the outcome of >> the discussions around FLIP-57. Please take a look. I would like to >> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. >> >> Best, >> >> Dawid >> >> >> On 19/09/2019 09:24, Kurt Young wrote: >>> IIUC it's good to see that both serializable (tables description from >> DDL) >>> and unserializable (tables with DataStream underneath) tables are treated >>> unify with CatalogTable. >>> >>> Can I also assume functions that either come from a function class (from >>> DDL) >>> or function objects (newed by user) will also treated unify with >>> CatalogFunction? >>> >>> This will greatly simplify and unify current API level concepts and >> design. >>> And it seems only one thing left, how do we deal with >>> ConnectTableDescriptor? >>> It's actually very similar with serializable CatalogTable, both carry >> some >>> text >>> properties which even are the same. Is there any chance we can further >> unify >>> this to CatalogTable? >>> >>> object >>> Best, >>> Kurt >>> >>> >>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote: >>> >>>> Thanks Dawid for the design doc. >>>> >>>> In general, I’m +1 to the FLIP. >>>> >>>> >>>> +1 to the single-string and parse way to express object path. >>>> >>>> +1 to deprecate registerTableSink & registerTableSource. >>>> But I would suggest to provide an easy way to register a custom >>>> source/sink before we drop them (this is another story). >>>> Currently, it’s not easy to implement a custom connector descriptor. >>>> >>>> Best, >>>> Jark >>>> >>>> >>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道: >>>>> >>>>> Hi JingsongLee, >>>>> From my understanding they can. Underneath they will be CatalogTables. >>>> The >>>>> difference is the lifetime of the tables. Plus some of the user facing >>>>> interfaces cannot be persisted e.g. datastream. Therefore we must have >> a >>>>> separate methods for that. In the end the temporary tables are held in >>>>> memory as CatalogTables. >>>>> Best, >>>>> Dawid >>>>> >>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] >>>> .invalid> >>>>> wrote: >>>>> >>>>>> Hi dawid: >>>>>> Can temporary tables achieve the same capabilities as catalog table? >>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, >>>>>> PartitionStatistics >>>>>> like partition support: we have added some catalog equivalent >> interfaces >>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames >>>>>> Maybe it's not a good idea to add these interfaces to >>>>>> TableSource/TableSink. What do you think? >>>>>> >>>>>> Best, >>>>>> Jingsong Lee >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------ >>>>>> From:Kurt Young <[hidden email]> >>>>>> Send Time:2019年9月18日(星期三) 17:54 >>>>>> To:dev <[hidden email]> >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table >>>>>> module >>>>>> >>>>>> Hi all, >>>>>> >>>>>> Sorry to join this party late. Big +1 to this flip, especially for the >>>>>> dropping >>>>>> "registerTableSink & registerTableSource" part. These are indeed >> legacy >>>>>> and we should try to unify them through CatalogTable after we >> introduce >>>>>> the concept of Catalog. >>>>>> >>>>>> From my understanding, what we can registered should all be metadata, >>>>>> TableSource/TableSink should only be the one who is responsible to do >>>>>> the real work, i.e. reading and writing data according to the schema >> and >>>>>> other information like computed column, partition, .e.g. >>>>>> >>>>>> Best, >>>>>> Kurt >>>>>> >>>>>> >>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email] >>>>>> .invalid> >>>>>> wrote: >>>>>> >>>>>>> After some development and thinking, I have a general understanding. >>>>>>> +1 to registering a source/sink does not fit into the SQL world. >>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to >> compatible >>>>>>> with old ways. >>>>>>> >>>>>>> Best, >>>>>>> Jingsong Lee >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------ >>>>>>> From:Timo Walther <[hidden email]> >>>>>>> Send Time:2019年9月17日(星期二) 08:00 >>>>>>> To:dev <[hidden email]> >>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table >>>>>>> module >>>>>>> >>>>>>> Hi Dawid, >>>>>>> >>>>>>> thanks for the design document. It fixes big concept gaps due to >>>>>>> historical reasons with proper support for serializability and >> catalog >>>>>>> support in mind. >>>>>>> >>>>>>> I would not mind a registerTemporarySource/Sink, but the problem >> that I >>>>>>> see is that many people think that this is the recommended way of >>>>>>> registering a table source/sink which is not true. We should guide >>>> users >>>>>>> to either use connect() or DDL API which can be validated and stored >> in >>>>>>> catalog. >>>>>>> >>>>>>> Also from a concept perspective, registering a source/sink does not >> fit >>>>>>> into the SQL world. SQL does not know about source/sinks but only >> about >>>>>>> tables. If the responsibility of a TableSource/TableSink is just a >> pure >>>>>>> physical data consumer/producer that is not connected to the actual >>>>>>> logical table schema, we would need a possibility of defining time >>>>>>> attributes and interpreting/converting a changelog. This should be >> done >>>>>>> by the framework with information from the DDL/connect() and not be >>>>>>> defined in every table source. >>>>>>> >>>>>>> Regards, >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> On 09.09.19 14:16, JingsongLee wrote: >>>>>>>> Hi dawid: >>>>>>>> >>>>>>>> It is difficult to describe specific examples. >>>>>>>> Sometimes users will generate some java converters through some >>>>>>>> Java code, or generate some Java classes through third-party >>>>>>>> libraries. Of course, these can be best done through properties. >>>>>>>> But this requires additional work from users.My suggestion is to >>>>>>>> keep this Java instance class way that is user-friendly. >>>>>>>> >>>>>>>> Best, >>>>>>>> Jingsong Lee >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------ >>>>>>>> From:Dawid Wysakowicz <[hidden email]> >>>>>>>> Send Time:2019年9月6日(星期五) 16:21 >>>>>>>> To:dev <[hidden email]> >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >> Table >>>>>>> module >>>>>>>> Hi all, >>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by >>>>>>>> "some Connectors are difficult to convert all states to properties" >>>>>>>> All the Flink provided connectors will definitely be expressible >> with >>>>>>> properties (In the end you should be able to use them from DDL). I >>>> think >>>>>> if >>>>>>> a TableSource is complex enough that it handles filter push down, >>>>>> partition >>>>>>> support etc. should rather be made available both from DDL & >> java/scala >>>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String >>>> path, >>>>>>> TableSource source) if you have some concrete examples in mind. >>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually >>>>>> introducing >>>>>>> a new identifier representation to differentiate between resolved and >>>>>>> unresolved identifiers) with the same concerns. We decided to suggest >>>> the >>>>>>> string & parsing logic because of usability. >>>>>>>> tEnv.from("cat.db.table") >>>>>>>> is shorter and easier to write than >>>>>>>> tEnv.from(Identifier.for("cat", "db", "name") >>>>>>>> And also implicitly solves the problem what happens if a user (e.g. >>>>>> used >>>>>>> to other systems) uses that API in a following manner: >>>>>>>> tEnv.from(Identifier.for("db.name") >>>>>>>> I'm happy to revisit it if the general consensus is that it's better >>>> to >>>>>>> use the OO aproach. >>>>>>>> Best, >>>>>>>> Dawid >>>>>>>> >>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote: >>>>>>>> >>>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks >>>>>> pretty >>>>>>>> good to me except that I'm a little concerned about the object >>>>>> reference >>>>>>>> and string parsing in the code, which seems to an anti-pattern to >> OOP. >>>>>>> Have >>>>>>>> we considered using ObjectIdenitifier with optional catalog and db >>>>>> parts, >>>>>>>> esp. if we are worried about arguments of variable length or method >>>>>>>> overloading? It's quite likely that the result of string parsing is >> an >>>>>>>> ObjectIdentifier instance any way. >>>>>>>> >>>>>>>> Having string parsing logic in the code is a little dangerous as it >>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get out >> of >>>>>>> sync. >>>>>>>> Thanks, >>>>>>>> Xuefu >>>>>>>> >>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email] >>>>>>> .invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Thanks dawid, +1 for this approach. >>>>>>>> >>>>>>>> One concern is the removal of registerTableSink & >> registerTableSource >>>>>>>> in TableEnvironment. It has two alternatives: >>>>>>>> 1.the properties approach (DDL, descriptor). >>>>>>>> 2.from/toDataStream. >>>>>>>> >>>>>>>> #1 can only be properties, not java states, and some Connectors >>>>>>>> are difficult to convert all states to properties. >>>>>>>> #2 can contain java state. But can't use TableSource-related >> features, >>>>>>>> like project & filter push down, partition support, etc.. >>>>>>>> >>>>>>>> Any idea about this? >>>>>>>> >>>>>>>> Best, >>>>>>>> Jingsong Lee >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------ >>>>>>>> From:Dawid Wysakowicz <[hidden email]> >>>>>>>> Send Time:2019年9月4日(星期三) 22:20 >>>>>>>> To:dev <[hidden email]> >>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table >>>>>> module >>>>>>>> Hi all, >>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables storing >>>>>>> table >>>>>>>> meta objects permanently. At the same time the majority of current >>>> APIs >>>>>>>> create temporary objects that cannot be serialized. We should >> clarify >>>>>> the >>>>>>>> creation of meta objects (tables, views, functions) in a unified >> way. >>>>>>>> Another current problem in the API is that all the temporary objects >>>>>> are >>>>>>>> stored in a special built-in catalog, which is not very intuitive >> for >>>>>>> many >>>>>>>> users, as they must be aware of that catalog to reference temporary >>>>>>> objects. >>>>>>>> Lastly, different APIs have different ways of providing object >> paths: >>>>>>>> String path…, >>>>>>>> String path, String pathContinued… >>>>>>>> String name >>>>>>>> We should choose one approach and unify it across all APIs. >>>>>>>> I suggest a FLIP to address the above issues. >>>>>>>> Looking forward to your opinions. >>>>>>>> FLIP link: >>>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >> |
Regarding to ConnectTableDescriptor, if in the end it becomes a builder
of CatalogTable, I would be ok with that. But it doesn't look like a builder now, it's more like another form of TableSource/TableSink. Best, Kurt On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> wrote: > Hi everyone, > > thanks for the great discussion with a good outcome. > > I have one last comment about: > > void createTemporaryFunction(String path, UserDefinedFunction function); > > We should encourage users to register a class instead of an instance. We > should enforce a default constructor in the future. If we need metadata > or type inference, the planner can simply instantiate the class and call > the corresponding getters. If people would like to parameterize a > function, they can use global job parameters instead - which are > available in the open() method. > > I suggest: > > void createTemporaryFunction(String path, Class<? extends > UserDefinedFunction> functionClass); > > This is also closer to the SQL DDL that is about to be implemented in: > https://issues.apache.org/jira/browse/FLINK-7151 > > Thanks, > Timo > > > On 09.10.19 17:07, Bowen Li wrote: > > Hi Dawid, > > > > +1 for proposed changes > > > > On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email] > > > > wrote: > > > >> Sorry for a very delayed response. > >> > >> @Kurt Yes, this is the goal to have a function created like new > >> Function(...) also be wrapped into CatalogFunction. This would have to > >> be though a temporary function as we cannot represent that as a set of > >> properties. Similar to the createTemporaryView(DataStream stream). > >> > >> As for the ConnectTableDescriptor I agree this is very similar to > >> CatalogTable. I am not sure though if we should get rid of it. In the > >> end I see it as a builder for a CatalogTable, which is a slightly more > >> internal API, but we might revisit that some time in the future if we > >> find that it makes more sense. > >> > >> @All I updated the FLIP page with some more details from the outcome of > >> the discussions around FLIP-57. Please take a look. I would like to > >> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > >> > >> Best, > >> > >> Dawid > >> > >> > >> On 19/09/2019 09:24, Kurt Young wrote: > >>> IIUC it's good to see that both serializable (tables description from > >> DDL) > >>> and unserializable (tables with DataStream underneath) tables are > treated > >>> unify with CatalogTable. > >>> > >>> Can I also assume functions that either come from a function class > (from > >>> DDL) > >>> or function objects (newed by user) will also treated unify with > >>> CatalogFunction? > >>> > >>> This will greatly simplify and unify current API level concepts and > >> design. > >>> And it seems only one thing left, how do we deal with > >>> ConnectTableDescriptor? > >>> It's actually very similar with serializable CatalogTable, both carry > >> some > >>> text > >>> properties which even are the same. Is there any chance we can further > >> unify > >>> this to CatalogTable? > >>> > >>> object > >>> Best, > >>> Kurt > >>> > >>> > >>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote: > >>> > >>>> Thanks Dawid for the design doc. > >>>> > >>>> In general, I’m +1 to the FLIP. > >>>> > >>>> > >>>> +1 to the single-string and parse way to express object path. > >>>> > >>>> +1 to deprecate registerTableSink & registerTableSource. > >>>> But I would suggest to provide an easy way to register a custom > >>>> source/sink before we drop them (this is another story). > >>>> Currently, it’s not easy to implement a custom connector descriptor. > >>>> > >>>> Best, > >>>> Jark > >>>> > >>>> > >>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道: > >>>>> > >>>>> Hi JingsongLee, > >>>>> From my understanding they can. Underneath they will be > CatalogTables. > >>>> The > >>>>> difference is the lifetime of the tables. Plus some of the user > facing > >>>>> interfaces cannot be persisted e.g. datastream. Therefore we must > have > >> a > >>>>> separate methods for that. In the end the temporary tables are held > in > >>>>> memory as CatalogTables. > >>>>> Best, > >>>>> Dawid > >>>>> > >>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] > >>>> .invalid> > >>>>> wrote: > >>>>> > >>>>>> Hi dawid: > >>>>>> Can temporary tables achieve the same capabilities as catalog table? > >>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>>>>> PartitionStatistics > >>>>>> like partition support: we have added some catalog equivalent > >> interfaces > >>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>>>>> Maybe it's not a good idea to add these interfaces to > >>>>>> TableSource/TableSink. What do you think? > >>>>>> > >>>>>> Best, > >>>>>> Jingsong Lee > >>>>>> > >>>>>> > >>>>>> ------------------------------------------------------------------ > >>>>>> From:Kurt Young <[hidden email]> > >>>>>> Send Time:2019年9月18日(星期三) 17:54 > >>>>>> To:dev <[hidden email]> > >>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > Table > >>>>>> module > >>>>>> > >>>>>> Hi all, > >>>>>> > >>>>>> Sorry to join this party late. Big +1 to this flip, especially for > the > >>>>>> dropping > >>>>>> "registerTableSink & registerTableSource" part. These are indeed > >> legacy > >>>>>> and we should try to unify them through CatalogTable after we > >> introduce > >>>>>> the concept of Catalog. > >>>>>> > >>>>>> From my understanding, what we can registered should all be > metadata, > >>>>>> TableSource/TableSink should only be the one who is responsible to > do > >>>>>> the real work, i.e. reading and writing data according to the schema > >> and > >>>>>> other information like computed column, partition, .e.g. > >>>>>> > >>>>>> Best, > >>>>>> Kurt > >>>>>> > >>>>>> > >>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > [hidden email] > >>>>>> .invalid> > >>>>>> wrote: > >>>>>> > >>>>>>> After some development and thinking, I have a general > understanding. > >>>>>>> +1 to registering a source/sink does not fit into the SQL world. > >>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to > >> compatible > >>>>>>> with old ways. > >>>>>>> > >>>>>>> Best, > >>>>>>> Jingsong Lee > >>>>>>> > >>>>>>> > >>>>>>> ------------------------------------------------------------------ > >>>>>>> From:Timo Walther <[hidden email]> > >>>>>>> Send Time:2019年9月17日(星期二) 08:00 > >>>>>>> To:dev <[hidden email]> > >>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > Table > >>>>>>> module > >>>>>>> > >>>>>>> Hi Dawid, > >>>>>>> > >>>>>>> thanks for the design document. It fixes big concept gaps due to > >>>>>>> historical reasons with proper support for serializability and > >> catalog > >>>>>>> support in mind. > >>>>>>> > >>>>>>> I would not mind a registerTemporarySource/Sink, but the problem > >> that I > >>>>>>> see is that many people think that this is the recommended way of > >>>>>>> registering a table source/sink which is not true. We should guide > >>>> users > >>>>>>> to either use connect() or DDL API which can be validated and > stored > >> in > >>>>>>> catalog. > >>>>>>> > >>>>>>> Also from a concept perspective, registering a source/sink does not > >> fit > >>>>>>> into the SQL world. SQL does not know about source/sinks but only > >> about > >>>>>>> tables. If the responsibility of a TableSource/TableSink is just a > >> pure > >>>>>>> physical data consumer/producer that is not connected to the actual > >>>>>>> logical table schema, we would need a possibility of defining time > >>>>>>> attributes and interpreting/converting a changelog. This should be > >> done > >>>>>>> by the framework with information from the DDL/connect() and not be > >>>>>>> defined in every table source. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Timo > >>>>>>> > >>>>>>> > >>>>>>> On 09.09.19 14:16, JingsongLee wrote: > >>>>>>>> Hi dawid: > >>>>>>>> > >>>>>>>> It is difficult to describe specific examples. > >>>>>>>> Sometimes users will generate some java converters through some > >>>>>>>> Java code, or generate some Java classes through third-party > >>>>>>>> libraries. Of course, these can be best done through properties. > >>>>>>>> But this requires additional work from users.My suggestion is to > >>>>>>>> keep this Java instance class way that is user-friendly. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jingsong Lee > >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>>>>> From:Dawid Wysakowicz <[hidden email]> > >>>>>>>> Send Time:2019年9月6日(星期五) 16:21 > >>>>>>>> To:dev <[hidden email]> > >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >> Table > >>>>>>> module > >>>>>>>> Hi all, > >>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>>>>>> "some Connectors are difficult to convert all states to > properties" > >>>>>>>> All the Flink provided connectors will definitely be expressible > >> with > >>>>>>> properties (In the end you should be able to use them from DDL). I > >>>> think > >>>>>> if > >>>>>>> a TableSource is complex enough that it handles filter push down, > >>>>>> partition > >>>>>>> support etc. should rather be made available both from DDL & > >> java/scala > >>>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String > >>>> path, > >>>>>>> TableSource source) if you have some concrete examples in mind. > >>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>>>>> introducing > >>>>>>> a new identifier representation to differentiate between resolved > and > >>>>>>> unresolved identifiers) with the same concerns. We decided to > suggest > >>>> the > >>>>>>> string & parsing logic because of usability. > >>>>>>>> tEnv.from("cat.db.table") > >>>>>>>> is shorter and easier to write than > >>>>>>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>>>>>> And also implicitly solves the problem what happens if a user > (e.g. > >>>>>> used > >>>>>>> to other systems) uses that API in a following manner: > >>>>>>>> tEnv.from(Identifier.for("db.name") > >>>>>>>> I'm happy to revisit it if the general consensus is that it's > better > >>>> to > >>>>>>> use the OO aproach. > >>>>>>>> Best, > >>>>>>>> Dawid > >>>>>>>> > >>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>>>>>> > >>>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks > >>>>>> pretty > >>>>>>>> good to me except that I'm a little concerned about the object > >>>>>> reference > >>>>>>>> and string parsing in the code, which seems to an anti-pattern to > >> OOP. > >>>>>>> Have > >>>>>>>> we considered using ObjectIdenitifier with optional catalog and db > >>>>>> parts, > >>>>>>>> esp. if we are worried about arguments of variable length or > method > >>>>>>>> overloading? It's quite likely that the result of string parsing > is > >> an > >>>>>>>> ObjectIdentifier instance any way. > >>>>>>>> > >>>>>>>> Having string parsing logic in the code is a little dangerous as > it > >>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get > out > >> of > >>>>>>> sync. > >>>>>>>> Thanks, > >>>>>>>> Xuefu > >>>>>>>> > >>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > [hidden email] > >>>>>>> .invalid> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> Thanks dawid, +1 for this approach. > >>>>>>>> > >>>>>>>> One concern is the removal of registerTableSink & > >> registerTableSource > >>>>>>>> in TableEnvironment. It has two alternatives: > >>>>>>>> 1.the properties approach (DDL, descriptor). > >>>>>>>> 2.from/toDataStream. > >>>>>>>> > >>>>>>>> #1 can only be properties, not java states, and some Connectors > >>>>>>>> are difficult to convert all states to properties. > >>>>>>>> #2 can contain java state. But can't use TableSource-related > >> features, > >>>>>>>> like project & filter push down, partition support, etc.. > >>>>>>>> > >>>>>>>> Any idea about this? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jingsong Lee > >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>>>>> From:Dawid Wysakowicz <[hidden email]> > >>>>>>>> Send Time:2019年9月4日(星期三) 22:20 > >>>>>>>> To:dev <[hidden email]> > >>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table > >>>>>> module > >>>>>>>> Hi all, > >>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables > storing > >>>>>>> table > >>>>>>>> meta objects permanently. At the same time the majority of current > >>>> APIs > >>>>>>>> create temporary objects that cannot be serialized. We should > >> clarify > >>>>>> the > >>>>>>>> creation of meta objects (tables, views, functions) in a unified > >> way. > >>>>>>>> Another current problem in the API is that all the temporary > objects > >>>>>> are > >>>>>>>> stored in a special built-in catalog, which is not very intuitive > >> for > >>>>>>> many > >>>>>>>> users, as they must be aware of that catalog to reference > temporary > >>>>>>> objects. > >>>>>>>> Lastly, different APIs have different ways of providing object > >> paths: > >>>>>>>> String path…, > >>>>>>>> String path, String pathContinued… > >>>>>>>> String name > >>>>>>>> We should choose one approach and unify it across all APIs. > >>>>>>>> I suggest a FLIP to address the above issues. > >>>>>>>> Looking forward to your opinions. > >>>>>>>> FLIP link: > >>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >> > > |
The purpose of ConnectTableDescriptor was to have a programmatic way of
expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree that it needs some rework in the future, once we have finalized the DDL to ensure that both concepts are in sync. Regards, Timo On 10.10.19 16:08, Kurt Young wrote: > Regarding to ConnectTableDescriptor, if in the end it becomes a builder > of CatalogTable, I would be ok with that. But it doesn't look like a builder > now, it's more like another form of TableSource/TableSink. > > Best, > Kurt > > > On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> wrote: > >> Hi everyone, >> >> thanks for the great discussion with a good outcome. >> >> I have one last comment about: >> >> void createTemporaryFunction(String path, UserDefinedFunction function); >> >> We should encourage users to register a class instead of an instance. We >> should enforce a default constructor in the future. If we need metadata >> or type inference, the planner can simply instantiate the class and call >> the corresponding getters. If people would like to parameterize a >> function, they can use global job parameters instead - which are >> available in the open() method. >> >> I suggest: >> >> void createTemporaryFunction(String path, Class<? extends >> UserDefinedFunction> functionClass); >> >> This is also closer to the SQL DDL that is about to be implemented in: >> https://issues.apache.org/jira/browse/FLINK-7151 >> >> Thanks, >> Timo >> >> >> On 09.10.19 17:07, Bowen Li wrote: >>> Hi Dawid, >>> >>> +1 for proposed changes >>> >>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email] >>> >>> wrote: >>> >>>> Sorry for a very delayed response. >>>> >>>> @Kurt Yes, this is the goal to have a function created like new >>>> Function(...) also be wrapped into CatalogFunction. This would have to >>>> be though a temporary function as we cannot represent that as a set of >>>> properties. Similar to the createTemporaryView(DataStream stream). >>>> >>>> As for the ConnectTableDescriptor I agree this is very similar to >>>> CatalogTable. I am not sure though if we should get rid of it. In the >>>> end I see it as a builder for a CatalogTable, which is a slightly more >>>> internal API, but we might revisit that some time in the future if we >>>> find that it makes more sense. >>>> >>>> @All I updated the FLIP page with some more details from the outcome of >>>> the discussions around FLIP-57. Please take a look. I would like to >>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> >>>> On 19/09/2019 09:24, Kurt Young wrote: >>>>> IIUC it's good to see that both serializable (tables description from >>>> DDL) >>>>> and unserializable (tables with DataStream underneath) tables are >> treated >>>>> unify with CatalogTable. >>>>> >>>>> Can I also assume functions that either come from a function class >> (from >>>>> DDL) >>>>> or function objects (newed by user) will also treated unify with >>>>> CatalogFunction? >>>>> >>>>> This will greatly simplify and unify current API level concepts and >>>> design. >>>>> And it seems only one thing left, how do we deal with >>>>> ConnectTableDescriptor? >>>>> It's actually very similar with serializable CatalogTable, both carry >>>> some >>>>> text >>>>> properties which even are the same. Is there any chance we can further >>>> unify >>>>> this to CatalogTable? >>>>> >>>>> object >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote: >>>>> >>>>>> Thanks Dawid for the design doc. >>>>>> >>>>>> In general, I’m +1 to the FLIP. >>>>>> >>>>>> >>>>>> +1 to the single-string and parse way to express object path. >>>>>> >>>>>> +1 to deprecate registerTableSink & registerTableSource. >>>>>> But I would suggest to provide an easy way to register a custom >>>>>> source/sink before we drop them (this is another story). >>>>>> Currently, it’s not easy to implement a custom connector descriptor. >>>>>> >>>>>> Best, >>>>>> Jark >>>>>> >>>>>> >>>>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> 写道: >>>>>>> >>>>>>> Hi JingsongLee, >>>>>>> From my understanding they can. Underneath they will be >> CatalogTables. >>>>>> The >>>>>>> difference is the lifetime of the tables. Plus some of the user >> facing >>>>>>> interfaces cannot be persisted e.g. datastream. Therefore we must >> have >>>> a >>>>>>> separate methods for that. In the end the temporary tables are held >> in >>>>>>> memory as CatalogTables. >>>>>>> Best, >>>>>>> Dawid >>>>>>> >>>>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] >>>>>> .invalid> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi dawid: >>>>>>>> Can temporary tables achieve the same capabilities as catalog table? >>>>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, >>>>>>>> PartitionStatistics >>>>>>>> like partition support: we have added some catalog equivalent >>>> interfaces >>>>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames >>>>>>>> Maybe it's not a good idea to add these interfaces to >>>>>>>> TableSource/TableSink. What do you think? >>>>>>>> >>>>>>>> Best, >>>>>>>> Jingsong Lee >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------ >>>>>>>> From:Kurt Young <[hidden email]> >>>>>>>> Send Time:2019年9月18日(星期三) 17:54 >>>>>>>> To:dev <[hidden email]> >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >> Table >>>>>>>> module >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Sorry to join this party late. Big +1 to this flip, especially for >> the >>>>>>>> dropping >>>>>>>> "registerTableSink & registerTableSource" part. These are indeed >>>> legacy >>>>>>>> and we should try to unify them through CatalogTable after we >>>> introduce >>>>>>>> the concept of Catalog. >>>>>>>> >>>>>>>> From my understanding, what we can registered should all be >> metadata, >>>>>>>> TableSource/TableSink should only be the one who is responsible to >> do >>>>>>>> the real work, i.e. reading and writing data according to the schema >>>> and >>>>>>>> other information like computed column, partition, .e.g. >>>>>>>> >>>>>>>> Best, >>>>>>>> Kurt >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < >> [hidden email] >>>>>>>> .invalid> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> After some development and thinking, I have a general >> understanding. >>>>>>>>> +1 to registering a source/sink does not fit into the SQL world. >>>>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to >>>> compatible >>>>>>>>> with old ways. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jingsong Lee >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------ >>>>>>>>> From:Timo Walther <[hidden email]> >>>>>>>>> Send Time:2019年9月17日(星期二) 08:00 >>>>>>>>> To:dev <[hidden email]> >>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >> Table >>>>>>>>> module >>>>>>>>> >>>>>>>>> Hi Dawid, >>>>>>>>> >>>>>>>>> thanks for the design document. It fixes big concept gaps due to >>>>>>>>> historical reasons with proper support for serializability and >>>> catalog >>>>>>>>> support in mind. >>>>>>>>> >>>>>>>>> I would not mind a registerTemporarySource/Sink, but the problem >>>> that I >>>>>>>>> see is that many people think that this is the recommended way of >>>>>>>>> registering a table source/sink which is not true. We should guide >>>>>> users >>>>>>>>> to either use connect() or DDL API which can be validated and >> stored >>>> in >>>>>>>>> catalog. >>>>>>>>> >>>>>>>>> Also from a concept perspective, registering a source/sink does not >>>> fit >>>>>>>>> into the SQL world. SQL does not know about source/sinks but only >>>> about >>>>>>>>> tables. If the responsibility of a TableSource/TableSink is just a >>>> pure >>>>>>>>> physical data consumer/producer that is not connected to the actual >>>>>>>>> logical table schema, we would need a possibility of defining time >>>>>>>>> attributes and interpreting/converting a changelog. This should be >>>> done >>>>>>>>> by the framework with information from the DDL/connect() and not be >>>>>>>>> defined in every table source. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> >>>>>>>>> On 09.09.19 14:16, JingsongLee wrote: >>>>>>>>>> Hi dawid: >>>>>>>>>> >>>>>>>>>> It is difficult to describe specific examples. >>>>>>>>>> Sometimes users will generate some java converters through some >>>>>>>>>> Java code, or generate some Java classes through third-party >>>>>>>>>> libraries. Of course, these can be best done through properties. >>>>>>>>>> But this requires additional work from users.My suggestion is to >>>>>>>>>> keep this Java instance class way that is user-friendly. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jingsong Lee >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------ >>>>>>>>>> From:Dawid Wysakowicz <[hidden email]> >>>>>>>>>> Send Time:2019年9月6日(星期五) 16:21 >>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>> Table >>>>>>>>> module >>>>>>>>>> Hi all, >>>>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by >>>>>>>>>> "some Connectors are difficult to convert all states to >> properties" >>>>>>>>>> All the Flink provided connectors will definitely be expressible >>>> with >>>>>>>>> properties (In the end you should be able to use them from DDL). I >>>>>> think >>>>>>>> if >>>>>>>>> a TableSource is complex enough that it handles filter push down, >>>>>>>> partition >>>>>>>>> support etc. should rather be made available both from DDL & >>>> java/scala >>>>>>>>> code. I'm happy to reconsider adding registerTemporaryTable(String >>>>>> path, >>>>>>>>> TableSource source) if you have some concrete examples in mind. >>>>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually >>>>>>>> introducing >>>>>>>>> a new identifier representation to differentiate between resolved >> and >>>>>>>>> unresolved identifiers) with the same concerns. We decided to >> suggest >>>>>> the >>>>>>>>> string & parsing logic because of usability. >>>>>>>>>> tEnv.from("cat.db.table") >>>>>>>>>> is shorter and easier to write than >>>>>>>>>> tEnv.from(Identifier.for("cat", "db", "name") >>>>>>>>>> And also implicitly solves the problem what happens if a user >> (e.g. >>>>>>>> used >>>>>>>>> to other systems) uses that API in a following manner: >>>>>>>>>> tEnv.from(Identifier.for("db.name") >>>>>>>>>> I'm happy to revisit it if the general consensus is that it's >> better >>>>>> to >>>>>>>>> use the OO aproach. >>>>>>>>>> Best, >>>>>>>>>> Dawid >>>>>>>>>> >>>>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote: >>>>>>>>>> >>>>>>>>>> Thanks to Dawid for starting the discussion and writeup. It looks >>>>>>>> pretty >>>>>>>>>> good to me except that I'm a little concerned about the object >>>>>>>> reference >>>>>>>>>> and string parsing in the code, which seems to an anti-pattern to >>>> OOP. >>>>>>>>> Have >>>>>>>>>> we considered using ObjectIdenitifier with optional catalog and db >>>>>>>> parts, >>>>>>>>>> esp. if we are worried about arguments of variable length or >> method >>>>>>>>>> overloading? It's quite likely that the result of string parsing >> is >>>> an >>>>>>>>>> ObjectIdentifier instance any way. >>>>>>>>>> >>>>>>>>>> Having string parsing logic in the code is a little dangerous as >> it >>>>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get >> out >>>> of >>>>>>>>> sync. >>>>>>>>>> Thanks, >>>>>>>>>> Xuefu >>>>>>>>>> >>>>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < >> [hidden email] >>>>>>>>> .invalid> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks dawid, +1 for this approach. >>>>>>>>>> >>>>>>>>>> One concern is the removal of registerTableSink & >>>> registerTableSource >>>>>>>>>> in TableEnvironment. It has two alternatives: >>>>>>>>>> 1.the properties approach (DDL, descriptor). >>>>>>>>>> 2.from/toDataStream. >>>>>>>>>> >>>>>>>>>> #1 can only be properties, not java states, and some Connectors >>>>>>>>>> are difficult to convert all states to properties. >>>>>>>>>> #2 can contain java state. But can't use TableSource-related >>>> features, >>>>>>>>>> like project & filter push down, partition support, etc.. >>>>>>>>>> >>>>>>>>>> Any idea about this? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jingsong Lee >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------ >>>>>>>>>> From:Dawid Wysakowicz <[hidden email]> >>>>>>>>>> Send Time:2019年9月4日(星期三) 22:20 >>>>>>>>>> To:dev <[hidden email]> >>>>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table >>>>>>>> module >>>>>>>>>> Hi all, >>>>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables >> storing >>>>>>>>> table >>>>>>>>>> meta objects permanently. At the same time the majority of current >>>>>> APIs >>>>>>>>>> create temporary objects that cannot be serialized. We should >>>> clarify >>>>>>>> the >>>>>>>>>> creation of meta objects (tables, views, functions) in a unified >>>> way. >>>>>>>>>> Another current problem in the API is that all the temporary >> objects >>>>>>>> are >>>>>>>>>> stored in a special built-in catalog, which is not very intuitive >>>> for >>>>>>>>> many >>>>>>>>>> users, as they must be aware of that catalog to reference >> temporary >>>>>>>>> objects. >>>>>>>>>> Lastly, different APIs have different ways of providing object >>>> paths: >>>>>>>>>> String path…, >>>>>>>>>> String path, String pathContinued… >>>>>>>>>> String name >>>>>>>>>> We should choose one approach and unify it across all APIs. >>>>>>>>>> I suggest a FLIP to address the above issues. >>>>>>>>>> Looking forward to your opinions. >>>>>>>>>> FLIP link: >>>>>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >> |
Thanks for the clarification Timo, that's sounds good to me. Let's continue
to discuss other things. Best, Kurt On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> wrote: > The purpose of ConnectTableDescriptor was to have a programmatic way of > expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree > that it needs some rework in the future, once we have finalized the DDL > to ensure that both concepts are in sync. > > Regards, > Timo > > > On 10.10.19 16:08, Kurt Young wrote: > > Regarding to ConnectTableDescriptor, if in the end it becomes a builder > > of CatalogTable, I would be ok with that. But it doesn't look like a > builder > > now, it's more like another form of TableSource/TableSink. > > > > Best, > > Kurt > > > > > > On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> wrote: > > > >> Hi everyone, > >> > >> thanks for the great discussion with a good outcome. > >> > >> I have one last comment about: > >> > >> void createTemporaryFunction(String path, UserDefinedFunction function); > >> > >> We should encourage users to register a class instead of an instance. We > >> should enforce a default constructor in the future. If we need metadata > >> or type inference, the planner can simply instantiate the class and call > >> the corresponding getters. If people would like to parameterize a > >> function, they can use global job parameters instead - which are > >> available in the open() method. > >> > >> I suggest: > >> > >> void createTemporaryFunction(String path, Class<? extends > >> UserDefinedFunction> functionClass); > >> > >> This is also closer to the SQL DDL that is about to be implemented in: > >> https://issues.apache.org/jira/browse/FLINK-7151 > >> > >> Thanks, > >> Timo > >> > >> > >> On 09.10.19 17:07, Bowen Li wrote: > >>> Hi Dawid, > >>> > >>> +1 for proposed changes > >>> > >>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < > [hidden email] > >>> > >>> wrote: > >>> > >>>> Sorry for a very delayed response. > >>>> > >>>> @Kurt Yes, this is the goal to have a function created like new > >>>> Function(...) also be wrapped into CatalogFunction. This would have to > >>>> be though a temporary function as we cannot represent that as a set of > >>>> properties. Similar to the createTemporaryView(DataStream stream). > >>>> > >>>> As for the ConnectTableDescriptor I agree this is very similar to > >>>> CatalogTable. I am not sure though if we should get rid of it. In the > >>>> end I see it as a builder for a CatalogTable, which is a slightly more > >>>> internal API, but we might revisit that some time in the future if we > >>>> find that it makes more sense. > >>>> > >>>> @All I updated the FLIP page with some more details from the outcome > of > >>>> the discussions around FLIP-57. Please take a look. I would like to > >>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > >>>> > >>>> Best, > >>>> > >>>> Dawid > >>>> > >>>> > >>>> On 19/09/2019 09:24, Kurt Young wrote: > >>>>> IIUC it's good to see that both serializable (tables description from > >>>> DDL) > >>>>> and unserializable (tables with DataStream underneath) tables are > >> treated > >>>>> unify with CatalogTable. > >>>>> > >>>>> Can I also assume functions that either come from a function class > >> (from > >>>>> DDL) > >>>>> or function objects (newed by user) will also treated unify with > >>>>> CatalogFunction? > >>>>> > >>>>> This will greatly simplify and unify current API level concepts and > >>>> design. > >>>>> And it seems only one thing left, how do we deal with > >>>>> ConnectTableDescriptor? > >>>>> It's actually very similar with serializable CatalogTable, both carry > >>>> some > >>>>> text > >>>>> properties which even are the same. Is there any chance we can > further > >>>> unify > >>>>> this to CatalogTable? > >>>>> > >>>>> object > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> wrote: > >>>>> > >>>>>> Thanks Dawid for the design doc. > >>>>>> > >>>>>> In general, I’m +1 to the FLIP. > >>>>>> > >>>>>> > >>>>>> +1 to the single-string and parse way to express object path. > >>>>>> > >>>>>> +1 to deprecate registerTableSink & registerTableSource. > >>>>>> But I would suggest to provide an easy way to register a custom > >>>>>> source/sink before we drop them (this is another story). > >>>>>> Currently, it’s not easy to implement a custom connector descriptor. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> > >>>>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> > 写道: > >>>>>>> > >>>>>>> Hi JingsongLee, > >>>>>>> From my understanding they can. Underneath they will be > >> CatalogTables. > >>>>>> The > >>>>>>> difference is the lifetime of the tables. Plus some of the user > >> facing > >>>>>>> interfaces cannot be persisted e.g. datastream. Therefore we must > >> have > >>>> a > >>>>>>> separate methods for that. In the end the temporary tables are held > >> in > >>>>>>> memory as CatalogTables. > >>>>>>> Best, > >>>>>>> Dawid > >>>>>>> > >>>>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] > >>>>>> .invalid> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi dawid: > >>>>>>>> Can temporary tables achieve the same capabilities as catalog > table? > >>>>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>>>>>>> PartitionStatistics > >>>>>>>> like partition support: we have added some catalog equivalent > >>>> interfaces > >>>>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>>>>>>> Maybe it's not a good idea to add these interfaces to > >>>>>>>> TableSource/TableSink. What do you think? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jingsong Lee > >>>>>>>> > >>>>>>>> > >>>>>>>> ------------------------------------------------------------------ > >>>>>>>> From:Kurt Young <[hidden email]> > >>>>>>>> Send Time:2019年9月18日(星期三) 17:54 > >>>>>>>> To:dev <[hidden email]> > >>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >> Table > >>>>>>>> module > >>>>>>>> > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> Sorry to join this party late. Big +1 to this flip, especially for > >> the > >>>>>>>> dropping > >>>>>>>> "registerTableSink & registerTableSource" part. These are indeed > >>>> legacy > >>>>>>>> and we should try to unify them through CatalogTable after we > >>>> introduce > >>>>>>>> the concept of Catalog. > >>>>>>>> > >>>>>>>> From my understanding, what we can registered should all be > >> metadata, > >>>>>>>> TableSource/TableSink should only be the one who is responsible to > >> do > >>>>>>>> the real work, i.e. reading and writing data according to the > schema > >>>> and > >>>>>>>> other information like computed column, partition, .e.g. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Kurt > >>>>>>>> > >>>>>>>> > >>>>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > >> [hidden email] > >>>>>>>> .invalid> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> After some development and thinking, I have a general > >> understanding. > >>>>>>>>> +1 to registering a source/sink does not fit into the SQL world. > >>>>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to > >>>> compatible > >>>>>>>>> with old ways. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Jingsong Lee > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > ------------------------------------------------------------------ > >>>>>>>>> From:Timo Walther <[hidden email]> > >>>>>>>>> Send Time:2019年9月17日(星期二) 08:00 > >>>>>>>>> To:dev <[hidden email]> > >>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >> Table > >>>>>>>>> module > >>>>>>>>> > >>>>>>>>> Hi Dawid, > >>>>>>>>> > >>>>>>>>> thanks for the design document. It fixes big concept gaps due to > >>>>>>>>> historical reasons with proper support for serializability and > >>>> catalog > >>>>>>>>> support in mind. > >>>>>>>>> > >>>>>>>>> I would not mind a registerTemporarySource/Sink, but the problem > >>>> that I > >>>>>>>>> see is that many people think that this is the recommended way of > >>>>>>>>> registering a table source/sink which is not true. We should > guide > >>>>>> users > >>>>>>>>> to either use connect() or DDL API which can be validated and > >> stored > >>>> in > >>>>>>>>> catalog. > >>>>>>>>> > >>>>>>>>> Also from a concept perspective, registering a source/sink does > not > >>>> fit > >>>>>>>>> into the SQL world. SQL does not know about source/sinks but only > >>>> about > >>>>>>>>> tables. If the responsibility of a TableSource/TableSink is just > a > >>>> pure > >>>>>>>>> physical data consumer/producer that is not connected to the > actual > >>>>>>>>> logical table schema, we would need a possibility of defining > time > >>>>>>>>> attributes and interpreting/converting a changelog. This should > be > >>>> done > >>>>>>>>> by the framework with information from the DDL/connect() and not > be > >>>>>>>>> defined in every table source. > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On 09.09.19 14:16, JingsongLee wrote: > >>>>>>>>>> Hi dawid: > >>>>>>>>>> > >>>>>>>>>> It is difficult to describe specific examples. > >>>>>>>>>> Sometimes users will generate some java converters through some > >>>>>>>>>> Java code, or generate some Java classes through third-party > >>>>>>>>>> libraries. Of course, these can be best done through > properties. > >>>>>>>>>> But this requires additional work from users.My suggestion is to > >>>>>>>>>> keep this Java instance class way that is user-friendly. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jingsong Lee > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > ------------------------------------------------------------------ > >>>>>>>>>> From:Dawid Wysakowicz <[hidden email]> > >>>>>>>>>> Send Time:2019年9月6日(星期五) 16:21 > >>>>>>>>>> To:dev <[hidden email]> > >>>>>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>>> Table > >>>>>>>>> module > >>>>>>>>>> Hi all, > >>>>>>>>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>>>>>>>> "some Connectors are difficult to convert all states to > >> properties" > >>>>>>>>>> All the Flink provided connectors will definitely be expressible > >>>> with > >>>>>>>>> properties (In the end you should be able to use them from DDL). > I > >>>>>> think > >>>>>>>> if > >>>>>>>>> a TableSource is complex enough that it handles filter push down, > >>>>>>>> partition > >>>>>>>>> support etc. should rather be made available both from DDL & > >>>> java/scala > >>>>>>>>> code. I'm happy to reconsider adding > registerTemporaryTable(String > >>>>>> path, > >>>>>>>>> TableSource source) if you have some concrete examples in mind. > >>>>>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>>>>>>> introducing > >>>>>>>>> a new identifier representation to differentiate between resolved > >> and > >>>>>>>>> unresolved identifiers) with the same concerns. We decided to > >> suggest > >>>>>> the > >>>>>>>>> string & parsing logic because of usability. > >>>>>>>>>> tEnv.from("cat.db.table") > >>>>>>>>>> is shorter and easier to write than > >>>>>>>>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>>>>>>>> And also implicitly solves the problem what happens if a user > >> (e.g. > >>>>>>>> used > >>>>>>>>> to other systems) uses that API in a following manner: > >>>>>>>>>> tEnv.from(Identifier.for("db.name") > >>>>>>>>>> I'm happy to revisit it if the general consensus is that it's > >> better > >>>>>> to > >>>>>>>>> use the OO aproach. > >>>>>>>>>> Best, > >>>>>>>>>> Dawid > >>>>>>>>>> > >>>>>>>>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>>>>>>>> > >>>>>>>>>> Thanks to Dawid for starting the discussion and writeup. It > looks > >>>>>>>> pretty > >>>>>>>>>> good to me except that I'm a little concerned about the object > >>>>>>>> reference > >>>>>>>>>> and string parsing in the code, which seems to an anti-pattern > to > >>>> OOP. > >>>>>>>>> Have > >>>>>>>>>> we considered using ObjectIdenitifier with optional catalog and > db > >>>>>>>> parts, > >>>>>>>>>> esp. if we are worried about arguments of variable length or > >> method > >>>>>>>>>> overloading? It's quite likely that the result of string parsing > >> is > >>>> an > >>>>>>>>>> ObjectIdentifier instance any way. > >>>>>>>>>> > >>>>>>>>>> Having string parsing logic in the code is a little dangerous as > >> it > >>>>>>>>>> duplicates part of the DDL/DML parsing, and they can easily get > >> out > >>>> of > >>>>>>>>> sync. > >>>>>>>>>> Thanks, > >>>>>>>>>> Xuefu > >>>>>>>>>> > >>>>>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > >> [hidden email] > >>>>>>>>> .invalid> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Thanks dawid, +1 for this approach. > >>>>>>>>>> > >>>>>>>>>> One concern is the removal of registerTableSink & > >>>> registerTableSource > >>>>>>>>>> in TableEnvironment. It has two alternatives: > >>>>>>>>>> 1.the properties approach (DDL, descriptor). > >>>>>>>>>> 2.from/toDataStream. > >>>>>>>>>> > >>>>>>>>>> #1 can only be properties, not java states, and some Connectors > >>>>>>>>>> are difficult to convert all states to properties. > >>>>>>>>>> #2 can contain java state. But can't use TableSource-related > >>>> features, > >>>>>>>>>> like project & filter push down, partition support, etc.. > >>>>>>>>>> > >>>>>>>>>> Any idea about this? > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jingsong Lee > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > ------------------------------------------------------------------ > >>>>>>>>>> From:Dawid Wysakowicz <[hidden email]> > >>>>>>>>>> Send Time:2019年9月4日(星期三) 22:20 > >>>>>>>>>> To:dev <[hidden email]> > >>>>>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in > Table > >>>>>>>> module > >>>>>>>>>> Hi all, > >>>>>>>>>> As part of FLIP-30 a Catalog API was introduced that enables > >> storing > >>>>>>>>> table > >>>>>>>>>> meta objects permanently. At the same time the majority of > current > >>>>>> APIs > >>>>>>>>>> create temporary objects that cannot be serialized. We should > >>>> clarify > >>>>>>>> the > >>>>>>>>>> creation of meta objects (tables, views, functions) in a unified > >>>> way. > >>>>>>>>>> Another current problem in the API is that all the temporary > >> objects > >>>>>>>> are > >>>>>>>>>> stored in a special built-in catalog, which is not very > intuitive > >>>> for > >>>>>>>>> many > >>>>>>>>>> users, as they must be aware of that catalog to reference > >> temporary > >>>>>>>>> objects. > >>>>>>>>>> Lastly, different APIs have different ways of providing object > >>>> paths: > >>>>>>>>>> String path…, > >>>>>>>>>> String path, String pathContinued… > >>>>>>>>>> String name > >>>>>>>>>> We should choose one approach and unify it across all APIs. > >>>>>>>>>> I suggest a FLIP to address the above issues. > >>>>>>>>>> Looking forward to your opinions. > >>>>>>>>>> FLIP link: > >>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >> > > |
Thanks for the comments. @Timo I think your suggestion makes sense. I changed the signature to void createTemporaryFunction(String path, Class<? extends UserDefinedFunction> functionClass); This will mean that after we make QueryOperations string serializable and we expose the type inference in functions we will have only a single object that cannot be persisted in a catalog and thus have only a "Temporary" method: void createTemporaryView(String path, DataStream stream); I updated the FLIP page accordingly. I also listed explicitly methods that we need to deprecate and added a note that the implementation of the methods concerning UserDefinedFunctions has to be postponed until we have type inference in place. As the voting period for FLIP-57 ended, I will start the VOTE thread for FLIP-64 tomorrow morning, unless there are some objections until then. Best, Dawid
On 10/10/2019 16:16, Kurt Young wrote:
Thanks for the clarification Timo, that's sounds good to me. Let's continue to discuss other things. Best, Kurt On Thu, Oct 10, 2019 at 4:14 PM Timo Walther [hidden email] wrote:The purpose of ConnectTableDescriptor was to have a programmatic way of expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree that it needs some rework in the future, once we have finalized the DDL to ensure that both concepts are in sync. Regards, Timo On 10.10.19 16:08, Kurt Young wrote:Regarding to ConnectTableDescriptor, if in the end it becomes a builder of CatalogTable, I would be ok with that. But it doesn't look like abuildernow, it's more like another form of TableSource/TableSink. Best, Kurt On Thu, Oct 10, 2019 at 3:44 PM Timo Walther [hidden email] wrote:Hi everyone, thanks for the great discussion with a good outcome. I have one last comment about: void createTemporaryFunction(String path, UserDefinedFunction function); We should encourage users to register a class instead of an instance. We should enforce a default constructor in the future. If we need metadata or type inference, the planner can simply instantiate the class and call the corresponding getters. If people would like to parameterize a function, they can use global job parameters instead - which are available in the open() method. I suggest: void createTemporaryFunction(String path, Class<? extends UserDefinedFunction> functionClass); This is also closer to the SQL DDL that is about to be implemented in: https://issues.apache.org/jira/browse/FLINK-7151 Thanks, Timo On 09.10.19 17:07, Bowen Li wrote:Hi Dawid, +1 for proposed changes On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz <[hidden email]wrote:Sorry for a very delayed response. @Kurt Yes, this is the goal to have a function created like new Function(...) also be wrapped into CatalogFunction. This would have to be though a temporary function as we cannot represent that as a set of properties. Similar to the createTemporaryView(DataStream stream). As for the ConnectTableDescriptor I agree this is very similar to CatalogTable. I am not sure though if we should get rid of it. In the end I see it as a builder for a CatalogTable, which is a slightly more internal API, but we might revisit that some time in the future if we find that it makes more sense. @All I updated the FLIP page with some more details from the outcomeofthe discussions around FLIP-57. Please take a look. I would like to start a vote on this FLIP as soon as the vote on FLIP-57 goes through. Best, Dawid On 19/09/2019 09:24, Kurt Young wrote:IIUC it's good to see that both serializable (tables description fromDDL)and unserializable (tables with DataStream underneath) tables aretreatedunify with CatalogTable. Can I also assume functions that either come from a function class(fromDDL) or function objects (newed by user) will also treated unify with CatalogFunction? This will greatly simplify and unify current API level concepts anddesign.And it seems only one thing left, how do we deal with ConnectTableDescriptor? It's actually very similar with serializable CatalogTable, both carrysometext properties which even are the same. Is there any chance we canfurtherunifythis to CatalogTable? object Best, Kurt On Thu, Sep 19, 2019 at 3:13 PM Jark Wu [hidden email] wrote:Thanks Dawid for the design doc. In general, I’m +1 to the FLIP. +1 to the single-string and parse way to express object path. +1 to deprecate registerTableSink & registerTableSource. But I would suggest to provide an easy way to register a custom source/sink before we drop them (this is another story). Currently, it’s not easy to implement a custom connector descriptor. Best, Jark在 2019年9月19日,11:37,Dawid Wysakowicz [hidden email]写道:Hi JingsongLee, From my understanding they can. Underneath they will beCatalogTables.Thedifference is the lifetime of the tables. Plus some of the userfacinginterfaces cannot be persisted e.g. datastream. Therefore we musthaveaseparate methods for that. In the end the temporary tables are heldinmemory as CatalogTables. Best, Dawid On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email].invalid>wrote:Hi dawid: Can temporary tables achieve the same capabilities as catalogtable?like statistics: CatalogTableStatistics, CatalogColumnStatistics, PartitionStatistics like partition support: we have added some catalog equivalentinterfaceson TableSource/TableSink: getPartitions, getPartitionFieldNames Maybe it's not a good idea to add these interfaces to TableSource/TableSink. What do you think? Best, Jingsong Lee ------------------------------------------------------------------ From:Kurt Young [hidden email] Send Time:2019年9月18日(星期三) 17:54 To:dev [hidden email] Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects inTablemodule Hi all, Sorry to join this party late. Big +1 to this flip, especially forthedropping "registerTableSink & registerTableSource" part. These are indeedlegacyand we should try to unify them through CatalogTable after weintroducethe concept of Catalog. From my understanding, what we can registered should all bemetadata,TableSource/TableSink should only be the one who is responsible todothe real work, i.e. reading and writing data according to theschemaandother information like computed column, partition, .e.g. Best, Kurt On Wed, Sep 18, 2019 at 5:14 PM JingsongLee <[hidden email].invalid> wrote:After some development and thinking, I have a generalunderstanding.+1 to registering a source/sink does not fit into the SQL world. I am OK to have a deprecated registerTemporarySource/Sink tocompatiblewith old ways. Best, Jingsong Lee------------------------------------------------------------------From:Timo Walther [hidden email] Send Time:2019年9月17日(星期二) 08:00 To:dev [hidden email] Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects inTablemodule Hi Dawid, thanks for the design document. It fixes big concept gaps due to historical reasons with proper support for serializability andcatalogsupport in mind. I would not mind a registerTemporarySource/Sink, but the problemthat Isee is that many people think that this is the recommended way of registering a table source/sink which is not true. We shouldguideusersto either use connect() or DDL API which can be validated andstoredincatalog. Also from a concept perspective, registering a source/sink doesnotfitinto the SQL world. SQL does not know about source/sinks but onlyabouttables. If the responsibility of a TableSource/TableSink is justapurephysical data consumer/producer that is not connected to theactuallogical table schema, we would need a possibility of definingtimeattributes and interpreting/converting a changelog. This shouldbedoneby the framework with information from the DDL/connect() and notbedefined in every table source. Regards, Timo On 09.09.19 14:16, JingsongLee wrote:Hi dawid: It is difficult to describe specific examples. Sometimes users will generate some java converters through some Java code, or generate some Java classes through third-party libraries. Of course, these can be best done throughproperties.But this requires additional work from users.My suggestion is to keep this Java instance class way that is user-friendly. Best, Jingsong Lee------------------------------------------------------------------From:Dawid Wysakowicz [hidden email] Send Time:2019年9月6日(星期五) 16:21 To:dev [hidden email] Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects inTablemoduleHi all, @Jingsong Could you elaborate a bit more what do you mean by "some Connectors are difficult to convert all states toproperties"All the Flink provided connectors will definitely be expressiblewithproperties (In the end you should be able to use them from DDL).Ithinkifa TableSource is complex enough that it handles filter push down,partitionsupport etc. should rather be made available both from DDL &java/scalacode. I'm happy to reconsider addingregisterTemporaryTable(Stringpath,TableSource source) if you have some concrete examples in mind.@Xuefu: We also considered the ObjectIdentifier (or actuallyintroducinga new identifier representation to differentiate between resolvedandunresolved identifiers) with the same concerns. We decided tosuggestthestring & parsing logic because of usability.tEnv.from("cat.db.table") is shorter and easier to write than tEnv.from(Identifier.for("cat", "db", "name") And also implicitly solves the problem what happens if a user(e.g.usedto other systems) uses that API in a following manner:tEnv.from(Identifier.for("db.name") I'm happy to revisit it if the general consensus is that it'sbettertouse the OO aproach.Best, Dawid On 06/09/2019 10:00, Xuefu Z wrote: Thanks to Dawid for starting the discussion and writeup. Itlooksprettygood to me except that I'm a little concerned about the objectreferenceand string parsing in the code, which seems to an anti-patterntoOOP.Havewe considered using ObjectIdenitifier with optional catalog anddbparts,esp. if we are worried about arguments of variable length ormethodoverloading? It's quite likely that the result of string parsingisanObjectIdentifier instance any way. Having string parsing logic in the code is a little dangerous asitduplicates part of the DDL/DML parsing, and they can easily getoutofsync.Thanks, Xuefu On Fri, Sep 6, 2019 at 1:57 PM JingsongLee <[hidden email].invalid>wrote: Thanks dawid, +1 for this approach. One concern is the removal of registerTableSink ®isterTableSourcein TableEnvironment. It has two alternatives: 1.the properties approach (DDL, descriptor). 2.from/toDataStream. #1 can only be properties, not java states, and some Connectors are difficult to convert all states to properties. #2 can contain java state. But can't use TableSource-relatedfeatures,like project & filter push down, partition support, etc.. Any idea about this? Best, Jingsong Lee------------------------------------------------------------------From:Dawid Wysakowicz [hidden email] Send Time:2019年9月4日(星期三) 22:20 To:dev [hidden email] Subject:[DISCUSS] FLIP-64: Support for Temporary Objects inTablemoduleHi all, As part of FLIP-30 a Catalog API was introduced that enablesstoringtablemeta objects permanently. At the same time the majority ofcurrentAPIscreate temporary objects that cannot be serialized. We shouldclarifythecreation of meta objects (tables, views, functions) in a unifiedway.Another current problem in the API is that all the temporaryobjectsarestored in a special built-in catalog, which is not veryintuitiveformanyusers, as they must be aware of that catalog to referencetemporaryobjects.Lastly, different APIs have different ways of providing objectpaths:String path…, String path, String pathContinued… String name We should choose one approach and unify it across all APIs. I suggest a FLIP to address the above issues. Looking forward to your opinions. FLIP link:https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module signature.asc (849 bytes) Download Attachment |
Free forum by Nabble | Edit this page |