Regarding to createTemporaryFunction, could we just keep both choices? I
agree we should encourage users to use the class, and it's the only choice when user using DDL. But for some Table API & SQL programs, I think it's actually more nature to use an object instead of use the classes. But this just depends on personal choices. One use case came to mind which might be affected by this is if we want to support some anonymous lambda functions as UDF. It will need such create function with objects support. Best, Kurt On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz <[hidden email]> wrote: > Thanks for the comments. > > @Timo I think your suggestion makes sense. I changed the signature to > > void createTemporaryFunction(String path, Class<? extends UserDefinedFunction> functionClass); > > This will mean that after we make QueryOperations string serializable and > we expose the type inference in functions we will have only a single object > that cannot be persisted in a catalog and thus have only a "Temporary" > method: > > void createTemporaryView(String path, DataStream stream); > > I updated the FLIP page accordingly. I also listed explicitly methods that > we need to deprecate and added a note that the implementation of the > methods concerning UserDefinedFunctions has to be postponed until we have > type inference in place. > > As the voting period for FLIP-57 ended, I will start the VOTE thread for > FLIP-64 tomorrow morning, unless there are some objections until then. > > Best, > > Dawid > > > On 10/10/2019 16:16, Kurt Young wrote: > > Thanks for the clarification Timo, that's sounds good to me. Let's continue > to discuss other things. > > Best, > Kurt > > > On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> <[hidden email]> wrote: > > > The purpose of ConnectTableDescriptor was to have a programmatic way of > expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree > that it needs some rework in the future, once we have finalized the DDL > to ensure that both concepts are in sync. > > Regards, > Timo > > > On 10.10.19 16:08, Kurt Young wrote: > > Regarding to ConnectTableDescriptor, if in the end it becomes a builder > of CatalogTable, I would be ok with that. But it doesn't look like a > > builder > > now, it's more like another form of TableSource/TableSink. > > Best, > Kurt > > > On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> <[hidden email]> wrote: > > > Hi everyone, > > thanks for the great discussion with a good outcome. > > I have one last comment about: > > void createTemporaryFunction(String path, UserDefinedFunction function); > > We should encourage users to register a class instead of an instance. We > should enforce a default constructor in the future. If we need metadata > or type inference, the planner can simply instantiate the class and call > the corresponding getters. If people would like to parameterize a > function, they can use global job parameters instead - which are > available in the open() method. > > I suggest: > > void createTemporaryFunction(String path, Class<? extends > UserDefinedFunction> functionClass); > > This is also closer to the SQL DDL that is about to be implemented in:https://issues.apache.org/jira/browse/FLINK-7151 > > Thanks, > Timo > > > On 09.10.19 17:07, Bowen Li wrote: > > Hi Dawid, > > +1 for proposed changes > > On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < > > [hidden email] > > wrote: > > > Sorry for a very delayed response. > > @Kurt Yes, this is the goal to have a function created like new > Function(...) also be wrapped into CatalogFunction. This would have to > be though a temporary function as we cannot represent that as a set of > properties. Similar to the createTemporaryView(DataStream stream). > > As for the ConnectTableDescriptor I agree this is very similar to > CatalogTable. I am not sure though if we should get rid of it. In the > end I see it as a builder for a CatalogTable, which is a slightly more > internal API, but we might revisit that some time in the future if we > find that it makes more sense. > > @All I updated the FLIP page with some more details from the outcome > > of > > the discussions around FLIP-57. Please take a look. I would like to > start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > > Best, > > Dawid > > > On 19/09/2019 09:24, Kurt Young wrote: > > IIUC it's good to see that both serializable (tables description from > > DDL) > > and unserializable (tables with DataStream underneath) tables are > > treated > > unify with CatalogTable. > > Can I also assume functions that either come from a function class > > (from > > DDL) > or function objects (newed by user) will also treated unify with > CatalogFunction? > > This will greatly simplify and unify current API level concepts and > > design. > > And it seems only one thing left, how do we deal with > ConnectTableDescriptor? > It's actually very similar with serializable CatalogTable, both carry > > some > > text > properties which even are the same. Is there any chance we can > > further > > unify > > this to CatalogTable? > > object > Best, > Kurt > > > On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> <[hidden email]> wrote: > > > Thanks Dawid for the design doc. > > In general, I’m +1 to the FLIP. > > > +1 to the single-string and parse way to express object path. > > +1 to deprecate registerTableSink & registerTableSource. > But I would suggest to provide an easy way to register a custom > source/sink before we drop them (this is another story). > Currently, it’s not easy to implement a custom connector descriptor. > > Best, > Jark > > > > 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> <[hidden email]> > > 写道: > > Hi JingsongLee, > From my understanding they can. Underneath they will be > > CatalogTables. > > The > > difference is the lifetime of the tables. Plus some of the user > > facing > > interfaces cannot be persisted e.g. datastream. Therefore we must > > have > > a > > separate methods for that. In the end the temporary tables are held > > in > > memory as CatalogTables. > Best, > Dawid > > On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] > > .invalid> > > wrote: > > > Hi dawid: > Can temporary tables achieve the same capabilities as catalog > > table? > > like statistics: CatalogTableStatistics, CatalogColumnStatistics, > PartitionStatistics > like partition support: we have added some catalog equivalent > > interfaces > > on TableSource/TableSink: getPartitions, getPartitionFieldNames > Maybe it's not a good idea to add these interfaces to > TableSource/TableSink. What do you think? > > Best, > Jingsong Lee > > > ------------------------------------------------------------------ > From:Kurt Young <[hidden email]> <[hidden email]> > Send Time:2019年9月18日(星期三) 17:54 > To:dev <[hidden email]> <[hidden email]> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > > Table > > module > > Hi all, > > Sorry to join this party late. Big +1 to this flip, especially for > > the > > dropping > "registerTableSink & registerTableSource" part. These are indeed > > legacy > > and we should try to unify them through CatalogTable after we > > introduce > > the concept of Catalog. > > From my understanding, what we can registered should all be > > metadata, > > TableSource/TableSink should only be the one who is responsible to > > do > > the real work, i.e. reading and writing data according to the > > schema > > and > > other information like computed column, partition, .e.g. > > Best, > Kurt > > > On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > > [hidden email] > > .invalid> > wrote: > > > After some development and thinking, I have a general > > understanding. > > +1 to registering a source/sink does not fit into the SQL world. > I am OK to have a deprecated registerTemporarySource/Sink to > > compatible > > with old ways. > > Best, > Jingsong Lee > > > > > ------------------------------------------------------------------ > > From:Timo Walther <[hidden email]> <[hidden email]> > Send Time:2019年9月17日(星期二) 08:00 > To:dev <[hidden email]> <[hidden email]> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > > Table > > module > > Hi Dawid, > > thanks for the design document. It fixes big concept gaps due to > historical reasons with proper support for serializability and > > catalog > > support in mind. > > I would not mind a registerTemporarySource/Sink, but the problem > > that I > > see is that many people think that this is the recommended way of > registering a table source/sink which is not true. We should > > guide > > users > > to either use connect() or DDL API which can be validated and > > stored > > in > > catalog. > > Also from a concept perspective, registering a source/sink does > > not > > fit > > into the SQL world. SQL does not know about source/sinks but only > > about > > tables. If the responsibility of a TableSource/TableSink is just > > a > > pure > > physical data consumer/producer that is not connected to the > > actual > > logical table schema, we would need a possibility of defining > > time > > attributes and interpreting/converting a changelog. This should > > be > > done > > by the framework with information from the DDL/connect() and not > > be > > defined in every table source. > > Regards, > Timo > > > On 09.09.19 14:16, JingsongLee wrote: > > Hi dawid: > > It is difficult to describe specific examples. > Sometimes users will generate some java converters through some > Java code, or generate some Java classes through third-party > libraries. Of course, these can be best done through > > properties. > > But this requires additional work from users.My suggestion is to > keep this Java instance class way that is user-friendly. > > Best, > Jingsong Lee > > > > > ------------------------------------------------------------------ > > From:Dawid Wysakowicz <[hidden email]> <[hidden email]> > Send Time:2019年9月6日(星期五) 16:21 > To:dev <[hidden email]> <[hidden email]> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > > Table > > module > > Hi all, > @Jingsong Could you elaborate a bit more what do you mean by > "some Connectors are difficult to convert all states to > > properties" > > All the Flink provided connectors will definitely be expressible > > with > > properties (In the end you should be able to use them from DDL). > > I > > think > > if > > a TableSource is complex enough that it handles filter push down, > > partition > > support etc. should rather be made available both from DDL & > > java/scala > > code. I'm happy to reconsider adding > > registerTemporaryTable(String > > path, > > TableSource source) if you have some concrete examples in mind. > > @Xuefu: We also considered the ObjectIdentifier (or actually > > introducing > > a new identifier representation to differentiate between resolved > > and > > unresolved identifiers) with the same concerns. We decided to > > suggest > > the > > string & parsing logic because of usability. > > tEnv.from("cat.db.table") > is shorter and easier to write than > tEnv.from(Identifier.for("cat", "db", "name") > And also implicitly solves the problem what happens if a user > > (e.g. > > used > > to other systems) uses that API in a following manner: > > tEnv.from(Identifier.for("db.name") > I'm happy to revisit it if the general consensus is that it's > > better > > to > > use the OO aproach. > > Best, > Dawid > > On 06/09/2019 10:00, Xuefu Z wrote: > > Thanks to Dawid for starting the discussion and writeup. It > > looks > > pretty > > good to me except that I'm a little concerned about the object > > reference > > and string parsing in the code, which seems to an anti-pattern > > to > > OOP. > > Have > > we considered using ObjectIdenitifier with optional catalog and > > db > > parts, > > esp. if we are worried about arguments of variable length or > > method > > overloading? It's quite likely that the result of string parsing > > is > > an > > ObjectIdentifier instance any way. > > Having string parsing logic in the code is a little dangerous as > > it > > duplicates part of the DDL/DML parsing, and they can easily get > > out > > of > > sync. > > Thanks, > Xuefu > > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > > [hidden email] > > .invalid> > > wrote: > > > Thanks dawid, +1 for this approach. > > One concern is the removal of registerTableSink & > > registerTableSource > > in TableEnvironment. It has two alternatives: > 1.the properties approach (DDL, descriptor). > 2.from/toDataStream. > > #1 can only be properties, not java states, and some Connectors > are difficult to convert all states to properties. > #2 can contain java state. But can't use TableSource-related > > features, > > like project & filter push down, partition support, etc.. > > Any idea about this? > > Best, > Jingsong Lee > > > > > ------------------------------------------------------------------ > > From:Dawid Wysakowicz <[hidden email]> <[hidden email]> > Send Time:2019年9月4日(星期三) 22:20 > To:dev <[hidden email]> <[hidden email]> > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in > > Table > > module > > Hi all, > As part of FLIP-30 a Catalog API was introduced that enables > > storing > > table > > meta objects permanently. At the same time the majority of > > current > > APIs > > create temporary objects that cannot be serialized. We should > > clarify > > the > > creation of meta objects (tables, views, functions) in a unified > > way. > > Another current problem in the API is that all the temporary > > objects > > are > > stored in a special built-in catalog, which is not very > > intuitive > > for > > many > > users, as they must be aware of that catalog to reference > > temporary > > objects. > > Lastly, different APIs have different ways of providing object > > paths: > > String path…, > String path, String pathContinued… > String name > We should choose one approach and unify it across all APIs. > I suggest a FLIP to address the above issues. > Looking forward to your opinions. > FLIP link: > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > |
Hi, Timo
Regarding to createTemporaryFunction, I agree with Kurt, we should encourage the class way, but should keep the instance way. The reason is that this is an important feature to support parameterize function as you said, but this can't be addressed by global job parameters and open() methods. Because users may want to instantiate more than one instance, e.g. `val bf1 = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`, and they are used in one query. The another reason is that this can't make the function call string serializable, because in the Scala Table API path, users can still pass in a function instance, e.g. table.select(bf1('a)) Best, Jark On Thu, 10 Oct 2019 at 23:09, Kurt Young <[hidden email]> wrote: > Regarding to createTemporaryFunction, could we just keep both choices? I > agree > we should encourage users to use the class, and it's the only choice when > user > using DDL. But for some Table API & SQL programs, I think it's actually > more nature > to use an object instead of use the classes. But this just depends on > personal > choices. One use case came to mind which might be affected by this is if we > want > to support some anonymous lambda functions as UDF. It will need such create > function > with objects support. > > Best, > Kurt > > > On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz <[hidden email]> > wrote: > > > Thanks for the comments. > > > > @Timo I think your suggestion makes sense. I changed the signature to > > > > void createTemporaryFunction(String path, Class<? extends > UserDefinedFunction> functionClass); > > > > This will mean that after we make QueryOperations string serializable and > > we expose the type inference in functions we will have only a single > object > > that cannot be persisted in a catalog and thus have only a "Temporary" > > method: > > > > void createTemporaryView(String path, DataStream stream); > > > > I updated the FLIP page accordingly. I also listed explicitly methods > that > > we need to deprecate and added a note that the implementation of the > > methods concerning UserDefinedFunctions has to be postponed until we have > > type inference in place. > > > > As the voting period for FLIP-57 ended, I will start the VOTE thread for > > FLIP-64 tomorrow morning, unless there are some objections until then. > > > > Best, > > > > Dawid > > > > > > On 10/10/2019 16:16, Kurt Young wrote: > > > > Thanks for the clarification Timo, that's sounds good to me. Let's > continue > > to discuss other things. > > > > Best, > > Kurt > > > > > > On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> < > [hidden email]> wrote: > > > > > > The purpose of ConnectTableDescriptor was to have a programmatic way of > > expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree > > that it needs some rework in the future, once we have finalized the DDL > > to ensure that both concepts are in sync. > > > > Regards, > > Timo > > > > > > On 10.10.19 16:08, Kurt Young wrote: > > > > Regarding to ConnectTableDescriptor, if in the end it becomes a builder > > of CatalogTable, I would be ok with that. But it doesn't look like a > > > > builder > > > > now, it's more like another form of TableSource/TableSink. > > > > Best, > > Kurt > > > > > > On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> < > [hidden email]> wrote: > > > > > > Hi everyone, > > > > thanks for the great discussion with a good outcome. > > > > I have one last comment about: > > > > void createTemporaryFunction(String path, UserDefinedFunction function); > > > > We should encourage users to register a class instead of an instance. We > > should enforce a default constructor in the future. If we need metadata > > or type inference, the planner can simply instantiate the class and call > > the corresponding getters. If people would like to parameterize a > > function, they can use global job parameters instead - which are > > available in the open() method. > > > > I suggest: > > > > void createTemporaryFunction(String path, Class<? extends > > UserDefinedFunction> functionClass); > > > > This is also closer to the SQL DDL that is about to be implemented in: > https://issues.apache.org/jira/browse/FLINK-7151 > > > > Thanks, > > Timo > > > > > > On 09.10.19 17:07, Bowen Li wrote: > > > > Hi Dawid, > > > > +1 for proposed changes > > > > On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < > > > > [hidden email] > > > > wrote: > > > > > > Sorry for a very delayed response. > > > > @Kurt Yes, this is the goal to have a function created like new > > Function(...) also be wrapped into CatalogFunction. This would have to > > be though a temporary function as we cannot represent that as a set of > > properties. Similar to the createTemporaryView(DataStream stream). > > > > As for the ConnectTableDescriptor I agree this is very similar to > > CatalogTable. I am not sure though if we should get rid of it. In the > > end I see it as a builder for a CatalogTable, which is a slightly more > > internal API, but we might revisit that some time in the future if we > > find that it makes more sense. > > > > @All I updated the FLIP page with some more details from the outcome > > > > of > > > > the discussions around FLIP-57. Please take a look. I would like to > > start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > > > > Best, > > > > Dawid > > > > > > On 19/09/2019 09:24, Kurt Young wrote: > > > > IIUC it's good to see that both serializable (tables description from > > > > DDL) > > > > and unserializable (tables with DataStream underneath) tables are > > > > treated > > > > unify with CatalogTable. > > > > Can I also assume functions that either come from a function class > > > > (from > > > > DDL) > > or function objects (newed by user) will also treated unify with > > CatalogFunction? > > > > This will greatly simplify and unify current API level concepts and > > > > design. > > > > And it seems only one thing left, how do we deal with > > ConnectTableDescriptor? > > It's actually very similar with serializable CatalogTable, both carry > > > > some > > > > text > > properties which even are the same. Is there any chance we can > > > > further > > > > unify > > > > this to CatalogTable? > > > > object > > Best, > > Kurt > > > > > > On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> < > [hidden email]> wrote: > > > > > > Thanks Dawid for the design doc. > > > > In general, I’m +1 to the FLIP. > > > > > > +1 to the single-string and parse way to express object path. > > > > +1 to deprecate registerTableSink & registerTableSource. > > But I would suggest to provide an easy way to register a custom > > source/sink before we drop them (this is another story). > > Currently, it’s not easy to implement a custom connector descriptor. > > > > Best, > > Jark > > > > > > > > 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> < > [hidden email]> > > > > 写道: > > > > Hi JingsongLee, > > From my understanding they can. Underneath they will be > > > > CatalogTables. > > > > The > > > > difference is the lifetime of the tables. Plus some of the user > > > > facing > > > > interfaces cannot be persisted e.g. datastream. Therefore we must > > > > have > > > > a > > > > separate methods for that. In the end the temporary tables are held > > > > in > > > > memory as CatalogTables. > > Best, > > Dawid > > > > On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] > > > > .invalid> > > > > wrote: > > > > > > Hi dawid: > > Can temporary tables achieve the same capabilities as catalog > > > > table? > > > > like statistics: CatalogTableStatistics, CatalogColumnStatistics, > > PartitionStatistics > > like partition support: we have added some catalog equivalent > > > > interfaces > > > > on TableSource/TableSink: getPartitions, getPartitionFieldNames > > Maybe it's not a good idea to add these interfaces to > > TableSource/TableSink. What do you think? > > > > Best, > > Jingsong Lee > > > > > > ------------------------------------------------------------------ > > From:Kurt Young <[hidden email]> <[hidden email]> > > Send Time:2019年9月18日(星期三) 17:54 > > To:dev <[hidden email]> <[hidden email]> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > > > > Table > > > > module > > > > Hi all, > > > > Sorry to join this party late. Big +1 to this flip, especially for > > > > the > > > > dropping > > "registerTableSink & registerTableSource" part. These are indeed > > > > legacy > > > > and we should try to unify them through CatalogTable after we > > > > introduce > > > > the concept of Catalog. > > > > From my understanding, what we can registered should all be > > > > metadata, > > > > TableSource/TableSink should only be the one who is responsible to > > > > do > > > > the real work, i.e. reading and writing data according to the > > > > schema > > > > and > > > > other information like computed column, partition, .e.g. > > > > Best, > > Kurt > > > > > > On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > > > > [hidden email] > > > > .invalid> > > wrote: > > > > > > After some development and thinking, I have a general > > > > understanding. > > > > +1 to registering a source/sink does not fit into the SQL world. > > I am OK to have a deprecated registerTemporarySource/Sink to > > > > compatible > > > > with old ways. > > > > Best, > > Jingsong Lee > > > > > > > > > > ------------------------------------------------------------------ > > > > From:Timo Walther <[hidden email]> <[hidden email]> > > Send Time:2019年9月17日(星期二) 08:00 > > To:dev <[hidden email]> <[hidden email]> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > > > > Table > > > > module > > > > Hi Dawid, > > > > thanks for the design document. It fixes big concept gaps due to > > historical reasons with proper support for serializability and > > > > catalog > > > > support in mind. > > > > I would not mind a registerTemporarySource/Sink, but the problem > > > > that I > > > > see is that many people think that this is the recommended way of > > registering a table source/sink which is not true. We should > > > > guide > > > > users > > > > to either use connect() or DDL API which can be validated and > > > > stored > > > > in > > > > catalog. > > > > Also from a concept perspective, registering a source/sink does > > > > not > > > > fit > > > > into the SQL world. SQL does not know about source/sinks but only > > > > about > > > > tables. If the responsibility of a TableSource/TableSink is just > > > > a > > > > pure > > > > physical data consumer/producer that is not connected to the > > > > actual > > > > logical table schema, we would need a possibility of defining > > > > time > > > > attributes and interpreting/converting a changelog. This should > > > > be > > > > done > > > > by the framework with information from the DDL/connect() and not > > > > be > > > > defined in every table source. > > > > Regards, > > Timo > > > > > > On 09.09.19 14:16, JingsongLee wrote: > > > > Hi dawid: > > > > It is difficult to describe specific examples. > > Sometimes users will generate some java converters through some > > Java code, or generate some Java classes through third-party > > libraries. Of course, these can be best done through > > > > properties. > > > > But this requires additional work from users.My suggestion is to > > keep this Java instance class way that is user-friendly. > > > > Best, > > Jingsong Lee > > > > > > > > > > ------------------------------------------------------------------ > > > > From:Dawid Wysakowicz <[hidden email]> <[hidden email]> > > Send Time:2019年9月6日(星期五) 16:21 > > To:dev <[hidden email]> <[hidden email]> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > > > > Table > > > > module > > > > Hi all, > > @Jingsong Could you elaborate a bit more what do you mean by > > "some Connectors are difficult to convert all states to > > > > properties" > > > > All the Flink provided connectors will definitely be expressible > > > > with > > > > properties (In the end you should be able to use them from DDL). > > > > I > > > > think > > > > if > > > > a TableSource is complex enough that it handles filter push down, > > > > partition > > > > support etc. should rather be made available both from DDL & > > > > java/scala > > > > code. I'm happy to reconsider adding > > > > registerTemporaryTable(String > > > > path, > > > > TableSource source) if you have some concrete examples in mind. > > > > @Xuefu: We also considered the ObjectIdentifier (or actually > > > > introducing > > > > a new identifier representation to differentiate between resolved > > > > and > > > > unresolved identifiers) with the same concerns. We decided to > > > > suggest > > > > the > > > > string & parsing logic because of usability. > > > > tEnv.from("cat.db.table") > > is shorter and easier to write than > > tEnv.from(Identifier.for("cat", "db", "name") > > And also implicitly solves the problem what happens if a user > > > > (e.g. > > > > used > > > > to other systems) uses that API in a following manner: > > > > tEnv.from(Identifier.for("db.name") > > I'm happy to revisit it if the general consensus is that it's > > > > better > > > > to > > > > use the OO aproach. > > > > Best, > > Dawid > > > > On 06/09/2019 10:00, Xuefu Z wrote: > > > > Thanks to Dawid for starting the discussion and writeup. It > > > > looks > > > > pretty > > > > good to me except that I'm a little concerned about the object > > > > reference > > > > and string parsing in the code, which seems to an anti-pattern > > > > to > > > > OOP. > > > > Have > > > > we considered using ObjectIdenitifier with optional catalog and > > > > db > > > > parts, > > > > esp. if we are worried about arguments of variable length or > > > > method > > > > overloading? It's quite likely that the result of string parsing > > > > is > > > > an > > > > ObjectIdentifier instance any way. > > > > Having string parsing logic in the code is a little dangerous as > > > > it > > > > duplicates part of the DDL/DML parsing, and they can easily get > > > > out > > > > of > > > > sync. > > > > Thanks, > > Xuefu > > > > On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > > > > [hidden email] > > > > .invalid> > > > > wrote: > > > > > > Thanks dawid, +1 for this approach. > > > > One concern is the removal of registerTableSink & > > > > registerTableSource > > > > in TableEnvironment. It has two alternatives: > > 1.the properties approach (DDL, descriptor). > > 2.from/toDataStream. > > > > #1 can only be properties, not java states, and some Connectors > > are difficult to convert all states to properties. > > #2 can contain java state. But can't use TableSource-related > > > > features, > > > > like project & filter push down, partition support, etc.. > > > > Any idea about this? > > > > Best, > > Jingsong Lee > > > > > > > > > > ------------------------------------------------------------------ > > > > From:Dawid Wysakowicz <[hidden email]> <[hidden email]> > > Send Time:2019年9月4日(星期三) 22:20 > > To:dev <[hidden email]> <[hidden email]> > > Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in > > > > Table > > > > module > > > > Hi all, > > As part of FLIP-30 a Catalog API was introduced that enables > > > > storing > > > > table > > > > meta objects permanently. At the same time the majority of > > > > current > > > > APIs > > > > create temporary objects that cannot be serialized. We should > > > > clarify > > > > the > > > > creation of meta objects (tables, views, functions) in a unified > > > > way. > > > > Another current problem in the API is that all the temporary > > > > objects > > > > are > > > > stored in a special built-in catalog, which is not very > > > > intuitive > > > > for > > > > many > > > > users, as they must be aware of that catalog to reference > > > > temporary > > > > objects. > > > > Lastly, different APIs have different ways of providing object > > > > paths: > > > > String path…, > > String path, String pathContinued… > > String name > > We should choose one approach and unify it across all APIs. > > I suggest a FLIP to address the above issues. > > Looking forward to your opinions. > > FLIP link: > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > > > > |
Thanks for your feedback Kurt and Jark.
I'm wondering why we allow setting `val bf1 = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`. This is not very SQL-like. Parameters should be passed when calling the function instead. Users can use the query itself to parametize the function: `SELECT bf(0.0001, a), bf(0.005, a)` If at all, we can think about a `CREATE FUNCTION bf1 WITH (precision=0.001)` DDL syntax. But with registering instances we might shoot ourselves in the foot in the future. Usually, function exist as classes, it seems unnatural to me that users need to instantiate the function first before registering it. What do think? Thanks, Timo On 11.10.19 05:10, Jark Wu wrote: > Hi, Timo > > Regarding to createTemporaryFunction, I agree with Kurt, we should > encourage the class way, but should keep the instance way. > The reason is that this is an important feature to support parameterize > function as you said, but this can't be addressed by global job parameters > and open() methods. > Because users may want to instantiate more than one instance, e.g. `val bf1 > = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`, and they are > used in one query. > > The another reason is that this can't make the function call string > serializable, because in the Scala Table API path, users can still pass in > a function instance, e.g. table.select(bf1('a)) > > Best, > Jark > > On Thu, 10 Oct 2019 at 23:09, Kurt Young <[hidden email]> wrote: > >> Regarding to createTemporaryFunction, could we just keep both choices? I >> agree >> we should encourage users to use the class, and it's the only choice when >> user >> using DDL. But for some Table API & SQL programs, I think it's actually >> more nature >> to use an object instead of use the classes. But this just depends on >> personal >> choices. One use case came to mind which might be affected by this is if we >> want >> to support some anonymous lambda functions as UDF. It will need such create >> function >> with objects support. >> >> Best, >> Kurt >> >> >> On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz <[hidden email]> >> wrote: >> >>> Thanks for the comments. >>> >>> @Timo I think your suggestion makes sense. I changed the signature to >>> >>> void createTemporaryFunction(String path, Class<? extends >> UserDefinedFunction> functionClass); >>> This will mean that after we make QueryOperations string serializable and >>> we expose the type inference in functions we will have only a single >> object >>> that cannot be persisted in a catalog and thus have only a "Temporary" >>> method: >>> >>> void createTemporaryView(String path, DataStream stream); >>> >>> I updated the FLIP page accordingly. I also listed explicitly methods >> that >>> we need to deprecate and added a note that the implementation of the >>> methods concerning UserDefinedFunctions has to be postponed until we have >>> type inference in place. >>> >>> As the voting period for FLIP-57 ended, I will start the VOTE thread for >>> FLIP-64 tomorrow morning, unless there are some objections until then. >>> >>> Best, >>> >>> Dawid >>> >>> >>> On 10/10/2019 16:16, Kurt Young wrote: >>> >>> Thanks for the clarification Timo, that's sounds good to me. Let's >> continue >>> to discuss other things. >>> >>> Best, >>> Kurt >>> >>> >>> On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> < >> [hidden email]> wrote: >>> >>> The purpose of ConnectTableDescriptor was to have a programmatic way of >>> expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree >>> that it needs some rework in the future, once we have finalized the DDL >>> to ensure that both concepts are in sync. >>> >>> Regards, >>> Timo >>> >>> >>> On 10.10.19 16:08, Kurt Young wrote: >>> >>> Regarding to ConnectTableDescriptor, if in the end it becomes a builder >>> of CatalogTable, I would be ok with that. But it doesn't look like a >>> >>> builder >>> >>> now, it's more like another form of TableSource/TableSink. >>> >>> Best, >>> Kurt >>> >>> >>> On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> < >> [hidden email]> wrote: >>> >>> Hi everyone, >>> >>> thanks for the great discussion with a good outcome. >>> >>> I have one last comment about: >>> >>> void createTemporaryFunction(String path, UserDefinedFunction function); >>> >>> We should encourage users to register a class instead of an instance. We >>> should enforce a default constructor in the future. If we need metadata >>> or type inference, the planner can simply instantiate the class and call >>> the corresponding getters. If people would like to parameterize a >>> function, they can use global job parameters instead - which are >>> available in the open() method. >>> >>> I suggest: >>> >>> void createTemporaryFunction(String path, Class<? extends >>> UserDefinedFunction> functionClass); >>> >>> This is also closer to the SQL DDL that is about to be implemented in: >> https://issues.apache.org/jira/browse/FLINK-7151 >>> Thanks, >>> Timo >>> >>> >>> On 09.10.19 17:07, Bowen Li wrote: >>> >>> Hi Dawid, >>> >>> +1 for proposed changes >>> >>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < >>> >>> [hidden email] >>> >>> wrote: >>> >>> >>> Sorry for a very delayed response. >>> >>> @Kurt Yes, this is the goal to have a function created like new >>> Function(...) also be wrapped into CatalogFunction. This would have to >>> be though a temporary function as we cannot represent that as a set of >>> properties. Similar to the createTemporaryView(DataStream stream). >>> >>> As for the ConnectTableDescriptor I agree this is very similar to >>> CatalogTable. I am not sure though if we should get rid of it. In the >>> end I see it as a builder for a CatalogTable, which is a slightly more >>> internal API, but we might revisit that some time in the future if we >>> find that it makes more sense. >>> >>> @All I updated the FLIP page with some more details from the outcome >>> >>> of >>> >>> the discussions around FLIP-57. Please take a look. I would like to >>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. >>> >>> Best, >>> >>> Dawid >>> >>> >>> On 19/09/2019 09:24, Kurt Young wrote: >>> >>> IIUC it's good to see that both serializable (tables description from >>> >>> DDL) >>> >>> and unserializable (tables with DataStream underneath) tables are >>> >>> treated >>> >>> unify with CatalogTable. >>> >>> Can I also assume functions that either come from a function class >>> >>> (from >>> >>> DDL) >>> or function objects (newed by user) will also treated unify with >>> CatalogFunction? >>> >>> This will greatly simplify and unify current API level concepts and >>> >>> design. >>> >>> And it seems only one thing left, how do we deal with >>> ConnectTableDescriptor? >>> It's actually very similar with serializable CatalogTable, both carry >>> >>> some >>> >>> text >>> properties which even are the same. Is there any chance we can >>> >>> further >>> >>> unify >>> >>> this to CatalogTable? >>> >>> object >>> Best, >>> Kurt >>> >>> >>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> < >> [hidden email]> wrote: >>> >>> Thanks Dawid for the design doc. >>> >>> In general, I’m +1 to the FLIP. >>> >>> >>> +1 to the single-string and parse way to express object path. >>> >>> +1 to deprecate registerTableSink & registerTableSource. >>> But I would suggest to provide an easy way to register a custom >>> source/sink before we drop them (this is another story). >>> Currently, it’s not easy to implement a custom connector descriptor. >>> >>> Best, >>> Jark >>> >>> >>> >>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> < >> [hidden email]> >>> 写道: >>> >>> Hi JingsongLee, >>> From my understanding they can. Underneath they will be >>> >>> CatalogTables. >>> >>> The >>> >>> difference is the lifetime of the tables. Plus some of the user >>> >>> facing >>> >>> interfaces cannot be persisted e.g. datastream. Therefore we must >>> >>> have >>> >>> a >>> >>> separate methods for that. In the end the temporary tables are held >>> >>> in >>> >>> memory as CatalogTables. >>> Best, >>> Dawid >>> >>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] >>> >>> .invalid> >>> >>> wrote: >>> >>> >>> Hi dawid: >>> Can temporary tables achieve the same capabilities as catalog >>> >>> table? >>> >>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, >>> PartitionStatistics >>> like partition support: we have added some catalog equivalent >>> >>> interfaces >>> >>> on TableSource/TableSink: getPartitions, getPartitionFieldNames >>> Maybe it's not a good idea to add these interfaces to >>> TableSource/TableSink. What do you think? >>> >>> Best, >>> Jingsong Lee >>> >>> >>> ------------------------------------------------------------------ >>> From:Kurt Young <[hidden email]> <[hidden email]> >>> Send Time:2019年9月18日(星期三) 17:54 >>> To:dev <[hidden email]> <[hidden email]> >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>> >>> Table >>> >>> module >>> >>> Hi all, >>> >>> Sorry to join this party late. Big +1 to this flip, especially for >>> >>> the >>> >>> dropping >>> "registerTableSink & registerTableSource" part. These are indeed >>> >>> legacy >>> >>> and we should try to unify them through CatalogTable after we >>> >>> introduce >>> >>> the concept of Catalog. >>> >>> From my understanding, what we can registered should all be >>> >>> metadata, >>> >>> TableSource/TableSink should only be the one who is responsible to >>> >>> do >>> >>> the real work, i.e. reading and writing data according to the >>> >>> schema >>> >>> and >>> >>> other information like computed column, partition, .e.g. >>> >>> Best, >>> Kurt >>> >>> >>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < >>> >>> [hidden email] >>> >>> .invalid> >>> wrote: >>> >>> >>> After some development and thinking, I have a general >>> >>> understanding. >>> >>> +1 to registering a source/sink does not fit into the SQL world. >>> I am OK to have a deprecated registerTemporarySource/Sink to >>> >>> compatible >>> >>> with old ways. >>> >>> Best, >>> Jingsong Lee >>> >>> >>> >>> >>> ------------------------------------------------------------------ >>> >>> From:Timo Walther <[hidden email]> <[hidden email]> >>> Send Time:2019年9月17日(星期二) 08:00 >>> To:dev <[hidden email]> <[hidden email]> >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>> >>> Table >>> >>> module >>> >>> Hi Dawid, >>> >>> thanks for the design document. It fixes big concept gaps due to >>> historical reasons with proper support for serializability and >>> >>> catalog >>> >>> support in mind. >>> >>> I would not mind a registerTemporarySource/Sink, but the problem >>> >>> that I >>> >>> see is that many people think that this is the recommended way of >>> registering a table source/sink which is not true. We should >>> >>> guide >>> >>> users >>> >>> to either use connect() or DDL API which can be validated and >>> >>> stored >>> >>> in >>> >>> catalog. >>> >>> Also from a concept perspective, registering a source/sink does >>> >>> not >>> >>> fit >>> >>> into the SQL world. SQL does not know about source/sinks but only >>> >>> about >>> >>> tables. If the responsibility of a TableSource/TableSink is just >>> >>> a >>> >>> pure >>> >>> physical data consumer/producer that is not connected to the >>> >>> actual >>> >>> logical table schema, we would need a possibility of defining >>> >>> time >>> >>> attributes and interpreting/converting a changelog. This should >>> >>> be >>> >>> done >>> >>> by the framework with information from the DDL/connect() and not >>> >>> be >>> >>> defined in every table source. >>> >>> Regards, >>> Timo >>> >>> >>> On 09.09.19 14:16, JingsongLee wrote: >>> >>> Hi dawid: >>> >>> It is difficult to describe specific examples. >>> Sometimes users will generate some java converters through some >>> Java code, or generate some Java classes through third-party >>> libraries. Of course, these can be best done through >>> >>> properties. >>> >>> But this requires additional work from users.My suggestion is to >>> keep this Java instance class way that is user-friendly. >>> >>> Best, >>> Jingsong Lee >>> >>> >>> >>> >>> ------------------------------------------------------------------ >>> >>> From:Dawid Wysakowicz <[hidden email]> <[hidden email]> >>> Send Time:2019年9月6日(星期五) 16:21 >>> To:dev <[hidden email]> <[hidden email]> >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>> >>> Table >>> >>> module >>> >>> Hi all, >>> @Jingsong Could you elaborate a bit more what do you mean by >>> "some Connectors are difficult to convert all states to >>> >>> properties" >>> >>> All the Flink provided connectors will definitely be expressible >>> >>> with >>> >>> properties (In the end you should be able to use them from DDL). >>> >>> I >>> >>> think >>> >>> if >>> >>> a TableSource is complex enough that it handles filter push down, >>> >>> partition >>> >>> support etc. should rather be made available both from DDL & >>> >>> java/scala >>> >>> code. I'm happy to reconsider adding >>> >>> registerTemporaryTable(String >>> >>> path, >>> >>> TableSource source) if you have some concrete examples in mind. >>> >>> @Xuefu: We also considered the ObjectIdentifier (or actually >>> >>> introducing >>> >>> a new identifier representation to differentiate between resolved >>> >>> and >>> >>> unresolved identifiers) with the same concerns. We decided to >>> >>> suggest >>> >>> the >>> >>> string & parsing logic because of usability. >>> >>> tEnv.from("cat.db.table") >>> is shorter and easier to write than >>> tEnv.from(Identifier.for("cat", "db", "name") >>> And also implicitly solves the problem what happens if a user >>> >>> (e.g. >>> >>> used >>> >>> to other systems) uses that API in a following manner: >>> >>> tEnv.from(Identifier.for("db.name") >>> I'm happy to revisit it if the general consensus is that it's >>> >>> better >>> >>> to >>> >>> use the OO aproach. >>> >>> Best, >>> Dawid >>> >>> On 06/09/2019 10:00, Xuefu Z wrote: >>> >>> Thanks to Dawid for starting the discussion and writeup. It >>> >>> looks >>> >>> pretty >>> >>> good to me except that I'm a little concerned about the object >>> >>> reference >>> >>> and string parsing in the code, which seems to an anti-pattern >>> >>> to >>> >>> OOP. >>> >>> Have >>> >>> we considered using ObjectIdenitifier with optional catalog and >>> >>> db >>> >>> parts, >>> >>> esp. if we are worried about arguments of variable length or >>> >>> method >>> >>> overloading? It's quite likely that the result of string parsing >>> >>> is >>> >>> an >>> >>> ObjectIdentifier instance any way. >>> >>> Having string parsing logic in the code is a little dangerous as >>> >>> it >>> >>> duplicates part of the DDL/DML parsing, and they can easily get >>> >>> out >>> >>> of >>> >>> sync. >>> >>> Thanks, >>> Xuefu >>> >>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < >>> >>> [hidden email] >>> >>> .invalid> >>> >>> wrote: >>> >>> >>> Thanks dawid, +1 for this approach. >>> >>> One concern is the removal of registerTableSink & >>> >>> registerTableSource >>> >>> in TableEnvironment. It has two alternatives: >>> 1.the properties approach (DDL, descriptor). >>> 2.from/toDataStream. >>> >>> #1 can only be properties, not java states, and some Connectors >>> are difficult to convert all states to properties. >>> #2 can contain java state. But can't use TableSource-related >>> >>> features, >>> >>> like project & filter push down, partition support, etc.. >>> >>> Any idea about this? >>> >>> Best, >>> Jingsong Lee >>> >>> >>> >>> >>> ------------------------------------------------------------------ >>> >>> From:Dawid Wysakowicz <[hidden email]> <[hidden email]> >>> Send Time:2019年9月4日(星期三) 22:20 >>> To:dev <[hidden email]> <[hidden email]> >>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in >>> >>> Table >>> >>> module >>> >>> Hi all, >>> As part of FLIP-30 a Catalog API was introduced that enables >>> >>> storing >>> >>> table >>> >>> meta objects permanently. At the same time the majority of >>> >>> current >>> >>> APIs >>> >>> create temporary objects that cannot be serialized. We should >>> >>> clarify >>> >>> the >>> >>> creation of meta objects (tables, views, functions) in a unified >>> >>> way. >>> >>> Another current problem in the API is that all the temporary >>> >>> objects >>> >>> are >>> >>> stored in a special built-in catalog, which is not very >>> >>> intuitive >>> >>> for >>> >>> many >>> >>> users, as they must be aware of that catalog to reference >>> >>> temporary >>> >>> objects. >>> >>> Lastly, different APIs have different ways of providing object >>> >>> paths: >>> >>> String path…, >>> String path, String pathContinued… >>> String name >>> We should choose one approach and unify it across all APIs. >>> I suggest a FLIP to address the above issues. >>> Looking forward to your opinions. >>> FLIP link: >>> >>> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >>> |
Hi Timo,
The reason to put the parameter in constructor instead of `eval` is that 1) the initialization for the parameters may take long 2) there might be a lot of parameters, it's verbose to specify them in every where it's used. Having some syntax like `CREATE FUNCTION bf1 WITH (precision=0.001)` is great. For catalog functions, I totally agree with that. But for temporal function, why do we have to enforce it is a class? From my understanding, temporal objects are stored in memory and will not be serialized to catalogs. If we want to prohibit function instances, are we also planning to drop the support of `table.select(bf1('a))` in Scala Table API? Regards, Jark On Fri, 11 Oct 2019 at 15:34, Timo Walther <[hidden email]> wrote: > Thanks for your feedback Kurt and Jark. > > I'm wondering why we allow setting `val bf1 = new BloomFilter(0.001)`, > `val bf2 = new BloomFilter(0.005)`. This is not very SQL-like. > > Parameters should be passed when calling the function instead. > > Users can use the query itself to parametize the function: `SELECT > bf(0.0001, a), bf(0.005, a)` > > > If at all, we can think about a `CREATE FUNCTION bf1 WITH > (precision=0.001)` DDL syntax. But with registering instances we might > shoot ourselves in the foot in the future. > > Usually, function exist as classes, it seems unnatural to me that users > need to instantiate the function first before registering it. > > > What do think? > > Thanks, > Timo > > On 11.10.19 05:10, Jark Wu wrote: > > Hi, Timo > > > > Regarding to createTemporaryFunction, I agree with Kurt, we should > > encourage the class way, but should keep the instance way. > > The reason is that this is an important feature to support parameterize > > function as you said, but this can't be addressed by global job > parameters > > and open() methods. > > Because users may want to instantiate more than one instance, e.g. `val > bf1 > > = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`, and they > are > > used in one query. > > > > The another reason is that this can't make the function call string > > serializable, because in the Scala Table API path, users can still pass > in > > a function instance, e.g. table.select(bf1('a)) > > > > Best, > > Jark > > > > On Thu, 10 Oct 2019 at 23:09, Kurt Young <[hidden email]> wrote: > > > >> Regarding to createTemporaryFunction, could we just keep both choices? I > >> agree > >> we should encourage users to use the class, and it's the only choice > when > >> user > >> using DDL. But for some Table API & SQL programs, I think it's actually > >> more nature > >> to use an object instead of use the classes. But this just depends on > >> personal > >> choices. One use case came to mind which might be affected by this is > if we > >> want > >> to support some anonymous lambda functions as UDF. It will need such > create > >> function > >> with objects support. > >> > >> Best, > >> Kurt > >> > >> > >> On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz < > [hidden email]> > >> wrote: > >> > >>> Thanks for the comments. > >>> > >>> @Timo I think your suggestion makes sense. I changed the signature to > >>> > >>> void createTemporaryFunction(String path, Class<? extends > >> UserDefinedFunction> functionClass); > >>> This will mean that after we make QueryOperations string serializable > and > >>> we expose the type inference in functions we will have only a single > >> object > >>> that cannot be persisted in a catalog and thus have only a "Temporary" > >>> method: > >>> > >>> void createTemporaryView(String path, DataStream stream); > >>> > >>> I updated the FLIP page accordingly. I also listed explicitly methods > >> that > >>> we need to deprecate and added a note that the implementation of the > >>> methods concerning UserDefinedFunctions has to be postponed until we > have > >>> type inference in place. > >>> > >>> As the voting period for FLIP-57 ended, I will start the VOTE thread > for > >>> FLIP-64 tomorrow morning, unless there are some objections until then. > >>> > >>> Best, > >>> > >>> Dawid > >>> > >>> > >>> On 10/10/2019 16:16, Kurt Young wrote: > >>> > >>> Thanks for the clarification Timo, that's sounds good to me. Let's > >> continue > >>> to discuss other things. > >>> > >>> Best, > >>> Kurt > >>> > >>> > >>> On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> < > >> [hidden email]> wrote: > >>> > >>> The purpose of ConnectTableDescriptor was to have a programmatic way of > >>> expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree > >>> that it needs some rework in the future, once we have finalized the DDL > >>> to ensure that both concepts are in sync. > >>> > >>> Regards, > >>> Timo > >>> > >>> > >>> On 10.10.19 16:08, Kurt Young wrote: > >>> > >>> Regarding to ConnectTableDescriptor, if in the end it becomes a builder > >>> of CatalogTable, I would be ok with that. But it doesn't look like a > >>> > >>> builder > >>> > >>> now, it's more like another form of TableSource/TableSink. > >>> > >>> Best, > >>> Kurt > >>> > >>> > >>> On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> < > >> [hidden email]> wrote: > >>> > >>> Hi everyone, > >>> > >>> thanks for the great discussion with a good outcome. > >>> > >>> I have one last comment about: > >>> > >>> void createTemporaryFunction(String path, UserDefinedFunction > function); > >>> > >>> We should encourage users to register a class instead of an instance. > We > >>> should enforce a default constructor in the future. If we need metadata > >>> or type inference, the planner can simply instantiate the class and > call > >>> the corresponding getters. If people would like to parameterize a > >>> function, they can use global job parameters instead - which are > >>> available in the open() method. > >>> > >>> I suggest: > >>> > >>> void createTemporaryFunction(String path, Class<? extends > >>> UserDefinedFunction> functionClass); > >>> > >>> This is also closer to the SQL DDL that is about to be implemented in: > >> https://issues.apache.org/jira/browse/FLINK-7151 > >>> Thanks, > >>> Timo > >>> > >>> > >>> On 09.10.19 17:07, Bowen Li wrote: > >>> > >>> Hi Dawid, > >>> > >>> +1 for proposed changes > >>> > >>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < > >>> > >>> [hidden email] > >>> > >>> wrote: > >>> > >>> > >>> Sorry for a very delayed response. > >>> > >>> @Kurt Yes, this is the goal to have a function created like new > >>> Function(...) also be wrapped into CatalogFunction. This would have to > >>> be though a temporary function as we cannot represent that as a set of > >>> properties. Similar to the createTemporaryView(DataStream stream). > >>> > >>> As for the ConnectTableDescriptor I agree this is very similar to > >>> CatalogTable. I am not sure though if we should get rid of it. In the > >>> end I see it as a builder for a CatalogTable, which is a slightly more > >>> internal API, but we might revisit that some time in the future if we > >>> find that it makes more sense. > >>> > >>> @All I updated the FLIP page with some more details from the outcome > >>> > >>> of > >>> > >>> the discussions around FLIP-57. Please take a look. I would like to > >>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. > >>> > >>> Best, > >>> > >>> Dawid > >>> > >>> > >>> On 19/09/2019 09:24, Kurt Young wrote: > >>> > >>> IIUC it's good to see that both serializable (tables description from > >>> > >>> DDL) > >>> > >>> and unserializable (tables with DataStream underneath) tables are > >>> > >>> treated > >>> > >>> unify with CatalogTable. > >>> > >>> Can I also assume functions that either come from a function class > >>> > >>> (from > >>> > >>> DDL) > >>> or function objects (newed by user) will also treated unify with > >>> CatalogFunction? > >>> > >>> This will greatly simplify and unify current API level concepts and > >>> > >>> design. > >>> > >>> And it seems only one thing left, how do we deal with > >>> ConnectTableDescriptor? > >>> It's actually very similar with serializable CatalogTable, both carry > >>> > >>> some > >>> > >>> text > >>> properties which even are the same. Is there any chance we can > >>> > >>> further > >>> > >>> unify > >>> > >>> this to CatalogTable? > >>> > >>> object > >>> Best, > >>> Kurt > >>> > >>> > >>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> < > >> [hidden email]> wrote: > >>> > >>> Thanks Dawid for the design doc. > >>> > >>> In general, I’m +1 to the FLIP. > >>> > >>> > >>> +1 to the single-string and parse way to express object path. > >>> > >>> +1 to deprecate registerTableSink & registerTableSource. > >>> But I would suggest to provide an easy way to register a custom > >>> source/sink before we drop them (this is another story). > >>> Currently, it’s not easy to implement a custom connector descriptor. > >>> > >>> Best, > >>> Jark > >>> > >>> > >>> > >>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> < > >> [hidden email]> > >>> 写道: > >>> > >>> Hi JingsongLee, > >>> From my understanding they can. Underneath they will be > >>> > >>> CatalogTables. > >>> > >>> The > >>> > >>> difference is the lifetime of the tables. Plus some of the user > >>> > >>> facing > >>> > >>> interfaces cannot be persisted e.g. datastream. Therefore we must > >>> > >>> have > >>> > >>> a > >>> > >>> separate methods for that. In the end the temporary tables are held > >>> > >>> in > >>> > >>> memory as CatalogTables. > >>> Best, > >>> Dawid > >>> > >>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] > >>> > >>> .invalid> > >>> > >>> wrote: > >>> > >>> > >>> Hi dawid: > >>> Can temporary tables achieve the same capabilities as catalog > >>> > >>> table? > >>> > >>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>> PartitionStatistics > >>> like partition support: we have added some catalog equivalent > >>> > >>> interfaces > >>> > >>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>> Maybe it's not a good idea to add these interfaces to > >>> TableSource/TableSink. What do you think? > >>> > >>> Best, > >>> Jingsong Lee > >>> > >>> > >>> ------------------------------------------------------------------ > >>> From:Kurt Young <[hidden email]> <[hidden email]> > >>> Send Time:2019年9月18日(星期三) 17:54 > >>> To:dev <[hidden email]> <[hidden email]> > >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>> > >>> Table > >>> > >>> module > >>> > >>> Hi all, > >>> > >>> Sorry to join this party late. Big +1 to this flip, especially for > >>> > >>> the > >>> > >>> dropping > >>> "registerTableSink & registerTableSource" part. These are indeed > >>> > >>> legacy > >>> > >>> and we should try to unify them through CatalogTable after we > >>> > >>> introduce > >>> > >>> the concept of Catalog. > >>> > >>> From my understanding, what we can registered should all be > >>> > >>> metadata, > >>> > >>> TableSource/TableSink should only be the one who is responsible to > >>> > >>> do > >>> > >>> the real work, i.e. reading and writing data according to the > >>> > >>> schema > >>> > >>> and > >>> > >>> other information like computed column, partition, .e.g. > >>> > >>> Best, > >>> Kurt > >>> > >>> > >>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > >>> > >>> [hidden email] > >>> > >>> .invalid> > >>> wrote: > >>> > >>> > >>> After some development and thinking, I have a general > >>> > >>> understanding. > >>> > >>> +1 to registering a source/sink does not fit into the SQL world. > >>> I am OK to have a deprecated registerTemporarySource/Sink to > >>> > >>> compatible > >>> > >>> with old ways. > >>> > >>> Best, > >>> Jingsong Lee > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------ > >>> > >>> From:Timo Walther <[hidden email]> <[hidden email]> > >>> Send Time:2019年9月17日(星期二) 08:00 > >>> To:dev <[hidden email]> <[hidden email]> > >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>> > >>> Table > >>> > >>> module > >>> > >>> Hi Dawid, > >>> > >>> thanks for the design document. It fixes big concept gaps due to > >>> historical reasons with proper support for serializability and > >>> > >>> catalog > >>> > >>> support in mind. > >>> > >>> I would not mind a registerTemporarySource/Sink, but the problem > >>> > >>> that I > >>> > >>> see is that many people think that this is the recommended way of > >>> registering a table source/sink which is not true. We should > >>> > >>> guide > >>> > >>> users > >>> > >>> to either use connect() or DDL API which can be validated and > >>> > >>> stored > >>> > >>> in > >>> > >>> catalog. > >>> > >>> Also from a concept perspective, registering a source/sink does > >>> > >>> not > >>> > >>> fit > >>> > >>> into the SQL world. SQL does not know about source/sinks but only > >>> > >>> about > >>> > >>> tables. If the responsibility of a TableSource/TableSink is just > >>> > >>> a > >>> > >>> pure > >>> > >>> physical data consumer/producer that is not connected to the > >>> > >>> actual > >>> > >>> logical table schema, we would need a possibility of defining > >>> > >>> time > >>> > >>> attributes and interpreting/converting a changelog. This should > >>> > >>> be > >>> > >>> done > >>> > >>> by the framework with information from the DDL/connect() and not > >>> > >>> be > >>> > >>> defined in every table source. > >>> > >>> Regards, > >>> Timo > >>> > >>> > >>> On 09.09.19 14:16, JingsongLee wrote: > >>> > >>> Hi dawid: > >>> > >>> It is difficult to describe specific examples. > >>> Sometimes users will generate some java converters through some > >>> Java code, or generate some Java classes through third-party > >>> libraries. Of course, these can be best done through > >>> > >>> properties. > >>> > >>> But this requires additional work from users.My suggestion is to > >>> keep this Java instance class way that is user-friendly. > >>> > >>> Best, > >>> Jingsong Lee > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------ > >>> > >>> From:Dawid Wysakowicz <[hidden email]> <[hidden email] > > > >>> Send Time:2019年9月6日(星期五) 16:21 > >>> To:dev <[hidden email]> <[hidden email]> > >>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>> > >>> Table > >>> > >>> module > >>> > >>> Hi all, > >>> @Jingsong Could you elaborate a bit more what do you mean by > >>> "some Connectors are difficult to convert all states to > >>> > >>> properties" > >>> > >>> All the Flink provided connectors will definitely be expressible > >>> > >>> with > >>> > >>> properties (In the end you should be able to use them from DDL). > >>> > >>> I > >>> > >>> think > >>> > >>> if > >>> > >>> a TableSource is complex enough that it handles filter push down, > >>> > >>> partition > >>> > >>> support etc. should rather be made available both from DDL & > >>> > >>> java/scala > >>> > >>> code. I'm happy to reconsider adding > >>> > >>> registerTemporaryTable(String > >>> > >>> path, > >>> > >>> TableSource source) if you have some concrete examples in mind. > >>> > >>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>> > >>> introducing > >>> > >>> a new identifier representation to differentiate between resolved > >>> > >>> and > >>> > >>> unresolved identifiers) with the same concerns. We decided to > >>> > >>> suggest > >>> > >>> the > >>> > >>> string & parsing logic because of usability. > >>> > >>> tEnv.from("cat.db.table") > >>> is shorter and easier to write than > >>> tEnv.from(Identifier.for("cat", "db", "name") > >>> And also implicitly solves the problem what happens if a user > >>> > >>> (e.g. > >>> > >>> used > >>> > >>> to other systems) uses that API in a following manner: > >>> > >>> tEnv.from(Identifier.for("db.name") > >>> I'm happy to revisit it if the general consensus is that it's > >>> > >>> better > >>> > >>> to > >>> > >>> use the OO aproach. > >>> > >>> Best, > >>> Dawid > >>> > >>> On 06/09/2019 10:00, Xuefu Z wrote: > >>> > >>> Thanks to Dawid for starting the discussion and writeup. It > >>> > >>> looks > >>> > >>> pretty > >>> > >>> good to me except that I'm a little concerned about the object > >>> > >>> reference > >>> > >>> and string parsing in the code, which seems to an anti-pattern > >>> > >>> to > >>> > >>> OOP. > >>> > >>> Have > >>> > >>> we considered using ObjectIdenitifier with optional catalog and > >>> > >>> db > >>> > >>> parts, > >>> > >>> esp. if we are worried about arguments of variable length or > >>> > >>> method > >>> > >>> overloading? It's quite likely that the result of string parsing > >>> > >>> is > >>> > >>> an > >>> > >>> ObjectIdentifier instance any way. > >>> > >>> Having string parsing logic in the code is a little dangerous as > >>> > >>> it > >>> > >>> duplicates part of the DDL/DML parsing, and they can easily get > >>> > >>> out > >>> > >>> of > >>> > >>> sync. > >>> > >>> Thanks, > >>> Xuefu > >>> > >>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > >>> > >>> [hidden email] > >>> > >>> .invalid> > >>> > >>> wrote: > >>> > >>> > >>> Thanks dawid, +1 for this approach. > >>> > >>> One concern is the removal of registerTableSink & > >>> > >>> registerTableSource > >>> > >>> in TableEnvironment. It has two alternatives: > >>> 1.the properties approach (DDL, descriptor). > >>> 2.from/toDataStream. > >>> > >>> #1 can only be properties, not java states, and some Connectors > >>> are difficult to convert all states to properties. > >>> #2 can contain java state. But can't use TableSource-related > >>> > >>> features, > >>> > >>> like project & filter push down, partition support, etc.. > >>> > >>> Any idea about this? > >>> > >>> Best, > >>> Jingsong Lee > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------ > >>> > >>> From:Dawid Wysakowicz <[hidden email]> <[hidden email] > > > >>> Send Time:2019年9月4日(星期三) 22:20 > >>> To:dev <[hidden email]> <[hidden email]> > >>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in > >>> > >>> Table > >>> > >>> module > >>> > >>> Hi all, > >>> As part of FLIP-30 a Catalog API was introduced that enables > >>> > >>> storing > >>> > >>> table > >>> > >>> meta objects permanently. At the same time the majority of > >>> > >>> current > >>> > >>> APIs > >>> > >>> create temporary objects that cannot be serialized. We should > >>> > >>> clarify > >>> > >>> the > >>> > >>> creation of meta objects (tables, views, functions) in a unified > >>> > >>> way. > >>> > >>> Another current problem in the API is that all the temporary > >>> > >>> objects > >>> > >>> are > >>> > >>> stored in a special built-in catalog, which is not very > >>> > >>> intuitive > >>> > >>> for > >>> > >>> many > >>> > >>> users, as they must be aware of that catalog to reference > >>> > >>> temporary > >>> > >>> objects. > >>> > >>> Lastly, different APIs have different ways of providing object > >>> > >>> paths: > >>> > >>> String path…, > >>> String path, String pathContinued… > >>> String name > >>> We should choose one approach and unify it across all APIs. > >>> I suggest a FLIP to address the above issues. > >>> Looking forward to your opinions. > >>> FLIP link: > >>> > >>> > >>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >>> > > |
Hi Jark,
great to see that there is consensus on something like `CREATE FUNCTION bf1 WITH (precision=0.001)`. In the near future, there should be no difference between a catalog function and a temporary function. This should simplify the stack a lot. How about we postpone this topic after FLIP-65? I understand your points but would also like to have similar semantics for SQL and Table API. We can just skip the function part for now and also wait for the FLIP around the function DDL (FLINK-7151). I would suggest to mention both class and instance in Dawid's FLIP for now. Regards, Timo On 11.10.19 10:32, Jark Wu wrote: > Hi Timo, > > The reason to put the parameter in constructor instead of `eval` is that > 1) the initialization for the parameters may take long > 2) there might be a lot of parameters, it's verbose to specify them in > every where it's used. > > Having some syntax like `CREATE FUNCTION bf1 WITH (precision=0.001)` is > great. > For catalog functions, I totally agree with that. But for temporal > function, why do we have to enforce it is a class? > >From my understanding, temporal objects are stored in memory and will not > be serialized to catalogs. > > If we want to prohibit function instances, are we also planning to drop the > support of `table.select(bf1('a))` in Scala Table API? > > Regards, > Jark > > > On Fri, 11 Oct 2019 at 15:34, Timo Walther <[hidden email]> wrote: > >> Thanks for your feedback Kurt and Jark. >> >> I'm wondering why we allow setting `val bf1 = new BloomFilter(0.001)`, >> `val bf2 = new BloomFilter(0.005)`. This is not very SQL-like. >> >> Parameters should be passed when calling the function instead. >> >> Users can use the query itself to parametize the function: `SELECT >> bf(0.0001, a), bf(0.005, a)` >> >> >> If at all, we can think about a `CREATE FUNCTION bf1 WITH >> (precision=0.001)` DDL syntax. But with registering instances we might >> shoot ourselves in the foot in the future. >> >> Usually, function exist as classes, it seems unnatural to me that users >> need to instantiate the function first before registering it. >> >> >> What do think? >> >> Thanks, >> Timo >> >> On 11.10.19 05:10, Jark Wu wrote: >>> Hi, Timo >>> >>> Regarding to createTemporaryFunction, I agree with Kurt, we should >>> encourage the class way, but should keep the instance way. >>> The reason is that this is an important feature to support parameterize >>> function as you said, but this can't be addressed by global job >> parameters >>> and open() methods. >>> Because users may want to instantiate more than one instance, e.g. `val >> bf1 >>> = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`, and they >> are >>> used in one query. >>> >>> The another reason is that this can't make the function call string >>> serializable, because in the Scala Table API path, users can still pass >> in >>> a function instance, e.g. table.select(bf1('a)) >>> >>> Best, >>> Jark >>> >>> On Thu, 10 Oct 2019 at 23:09, Kurt Young <[hidden email]> wrote: >>> >>>> Regarding to createTemporaryFunction, could we just keep both choices? I >>>> agree >>>> we should encourage users to use the class, and it's the only choice >> when >>>> user >>>> using DDL. But for some Table API & SQL programs, I think it's actually >>>> more nature >>>> to use an object instead of use the classes. But this just depends on >>>> personal >>>> choices. One use case came to mind which might be affected by this is >> if we >>>> want >>>> to support some anonymous lambda functions as UDF. It will need such >> create >>>> function >>>> with objects support. >>>> >>>> Best, >>>> Kurt >>>> >>>> >>>> On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz < >> [hidden email]> >>>> wrote: >>>> >>>>> Thanks for the comments. >>>>> >>>>> @Timo I think your suggestion makes sense. I changed the signature to >>>>> >>>>> void createTemporaryFunction(String path, Class<? extends >>>> UserDefinedFunction> functionClass); >>>>> This will mean that after we make QueryOperations string serializable >> and >>>>> we expose the type inference in functions we will have only a single >>>> object >>>>> that cannot be persisted in a catalog and thus have only a "Temporary" >>>>> method: >>>>> >>>>> void createTemporaryView(String path, DataStream stream); >>>>> >>>>> I updated the FLIP page accordingly. I also listed explicitly methods >>>> that >>>>> we need to deprecate and added a note that the implementation of the >>>>> methods concerning UserDefinedFunctions has to be postponed until we >> have >>>>> type inference in place. >>>>> >>>>> As the voting period for FLIP-57 ended, I will start the VOTE thread >> for >>>>> FLIP-64 tomorrow morning, unless there are some objections until then. >>>>> >>>>> Best, >>>>> >>>>> Dawid >>>>> >>>>> >>>>> On 10/10/2019 16:16, Kurt Young wrote: >>>>> >>>>> Thanks for the clarification Timo, that's sounds good to me. Let's >>>> continue >>>>> to discuss other things. >>>>> >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> < >>>> [hidden email]> wrote: >>>>> The purpose of ConnectTableDescriptor was to have a programmatic way of >>>>> expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree >>>>> that it needs some rework in the future, once we have finalized the DDL >>>>> to ensure that both concepts are in sync. >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> >>>>> On 10.10.19 16:08, Kurt Young wrote: >>>>> >>>>> Regarding to ConnectTableDescriptor, if in the end it becomes a builder >>>>> of CatalogTable, I would be ok with that. But it doesn't look like a >>>>> >>>>> builder >>>>> >>>>> now, it's more like another form of TableSource/TableSink. >>>>> >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> < >>>> [hidden email]> wrote: >>>>> Hi everyone, >>>>> >>>>> thanks for the great discussion with a good outcome. >>>>> >>>>> I have one last comment about: >>>>> >>>>> void createTemporaryFunction(String path, UserDefinedFunction >> function); >>>>> We should encourage users to register a class instead of an instance. >> We >>>>> should enforce a default constructor in the future. If we need metadata >>>>> or type inference, the planner can simply instantiate the class and >> call >>>>> the corresponding getters. If people would like to parameterize a >>>>> function, they can use global job parameters instead - which are >>>>> available in the open() method. >>>>> >>>>> I suggest: >>>>> >>>>> void createTemporaryFunction(String path, Class<? extends >>>>> UserDefinedFunction> functionClass); >>>>> >>>>> This is also closer to the SQL DDL that is about to be implemented in: >>>> https://issues.apache.org/jira/browse/FLINK-7151 >>>>> Thanks, >>>>> Timo >>>>> >>>>> >>>>> On 09.10.19 17:07, Bowen Li wrote: >>>>> >>>>> Hi Dawid, >>>>> >>>>> +1 for proposed changes >>>>> >>>>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < >>>>> >>>>> [hidden email] >>>>> >>>>> wrote: >>>>> >>>>> >>>>> Sorry for a very delayed response. >>>>> >>>>> @Kurt Yes, this is the goal to have a function created like new >>>>> Function(...) also be wrapped into CatalogFunction. This would have to >>>>> be though a temporary function as we cannot represent that as a set of >>>>> properties. Similar to the createTemporaryView(DataStream stream). >>>>> >>>>> As for the ConnectTableDescriptor I agree this is very similar to >>>>> CatalogTable. I am not sure though if we should get rid of it. In the >>>>> end I see it as a builder for a CatalogTable, which is a slightly more >>>>> internal API, but we might revisit that some time in the future if we >>>>> find that it makes more sense. >>>>> >>>>> @All I updated the FLIP page with some more details from the outcome >>>>> >>>>> of >>>>> >>>>> the discussions around FLIP-57. Please take a look. I would like to >>>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes through. >>>>> >>>>> Best, >>>>> >>>>> Dawid >>>>> >>>>> >>>>> On 19/09/2019 09:24, Kurt Young wrote: >>>>> >>>>> IIUC it's good to see that both serializable (tables description from >>>>> >>>>> DDL) >>>>> >>>>> and unserializable (tables with DataStream underneath) tables are >>>>> >>>>> treated >>>>> >>>>> unify with CatalogTable. >>>>> >>>>> Can I also assume functions that either come from a function class >>>>> >>>>> (from >>>>> >>>>> DDL) >>>>> or function objects (newed by user) will also treated unify with >>>>> CatalogFunction? >>>>> >>>>> This will greatly simplify and unify current API level concepts and >>>>> >>>>> design. >>>>> >>>>> And it seems only one thing left, how do we deal with >>>>> ConnectTableDescriptor? >>>>> It's actually very similar with serializable CatalogTable, both carry >>>>> >>>>> some >>>>> >>>>> text >>>>> properties which even are the same. Is there any chance we can >>>>> >>>>> further >>>>> >>>>> unify >>>>> >>>>> this to CatalogTable? >>>>> >>>>> object >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> < >>>> [hidden email]> wrote: >>>>> Thanks Dawid for the design doc. >>>>> >>>>> In general, I’m +1 to the FLIP. >>>>> >>>>> >>>>> +1 to the single-string and parse way to express object path. >>>>> >>>>> +1 to deprecate registerTableSink & registerTableSource. >>>>> But I would suggest to provide an easy way to register a custom >>>>> source/sink before we drop them (this is another story). >>>>> Currently, it’s not easy to implement a custom connector descriptor. >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> >>>>> >>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> < >>>> [hidden email]> >>>>> 写道: >>>>> >>>>> Hi JingsongLee, >>>>> From my understanding they can. Underneath they will be >>>>> >>>>> CatalogTables. >>>>> >>>>> The >>>>> >>>>> difference is the lifetime of the tables. Plus some of the user >>>>> >>>>> facing >>>>> >>>>> interfaces cannot be persisted e.g. datastream. Therefore we must >>>>> >>>>> have >>>>> >>>>> a >>>>> >>>>> separate methods for that. In the end the temporary tables are held >>>>> >>>>> in >>>>> >>>>> memory as CatalogTables. >>>>> Best, >>>>> Dawid >>>>> >>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] >>>>> >>>>> .invalid> >>>>> >>>>> wrote: >>>>> >>>>> >>>>> Hi dawid: >>>>> Can temporary tables achieve the same capabilities as catalog >>>>> >>>>> table? >>>>> >>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, >>>>> PartitionStatistics >>>>> like partition support: we have added some catalog equivalent >>>>> >>>>> interfaces >>>>> >>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames >>>>> Maybe it's not a good idea to add these interfaces to >>>>> TableSource/TableSink. What do you think? >>>>> >>>>> Best, >>>>> Jingsong Lee >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> From:Kurt Young <[hidden email]> <[hidden email]> >>>>> Send Time:2019年9月18日(星期三) 17:54 >>>>> To:dev <[hidden email]> <[hidden email]> >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>>> >>>>> Table >>>>> >>>>> module >>>>> >>>>> Hi all, >>>>> >>>>> Sorry to join this party late. Big +1 to this flip, especially for >>>>> >>>>> the >>>>> >>>>> dropping >>>>> "registerTableSink & registerTableSource" part. These are indeed >>>>> >>>>> legacy >>>>> >>>>> and we should try to unify them through CatalogTable after we >>>>> >>>>> introduce >>>>> >>>>> the concept of Catalog. >>>>> >>>>> From my understanding, what we can registered should all be >>>>> >>>>> metadata, >>>>> >>>>> TableSource/TableSink should only be the one who is responsible to >>>>> >>>>> do >>>>> >>>>> the real work, i.e. reading and writing data according to the >>>>> >>>>> schema >>>>> >>>>> and >>>>> >>>>> other information like computed column, partition, .e.g. >>>>> >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < >>>>> >>>>> [hidden email] >>>>> >>>>> .invalid> >>>>> wrote: >>>>> >>>>> >>>>> After some development and thinking, I have a general >>>>> >>>>> understanding. >>>>> >>>>> +1 to registering a source/sink does not fit into the SQL world. >>>>> I am OK to have a deprecated registerTemporarySource/Sink to >>>>> >>>>> compatible >>>>> >>>>> with old ways. >>>>> >>>>> Best, >>>>> Jingsong Lee >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> >>>>> From:Timo Walther <[hidden email]> <[hidden email]> >>>>> Send Time:2019年9月17日(星期二) 08:00 >>>>> To:dev <[hidden email]> <[hidden email]> >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>>> >>>>> Table >>>>> >>>>> module >>>>> >>>>> Hi Dawid, >>>>> >>>>> thanks for the design document. It fixes big concept gaps due to >>>>> historical reasons with proper support for serializability and >>>>> >>>>> catalog >>>>> >>>>> support in mind. >>>>> >>>>> I would not mind a registerTemporarySource/Sink, but the problem >>>>> >>>>> that I >>>>> >>>>> see is that many people think that this is the recommended way of >>>>> registering a table source/sink which is not true. We should >>>>> >>>>> guide >>>>> >>>>> users >>>>> >>>>> to either use connect() or DDL API which can be validated and >>>>> >>>>> stored >>>>> >>>>> in >>>>> >>>>> catalog. >>>>> >>>>> Also from a concept perspective, registering a source/sink does >>>>> >>>>> not >>>>> >>>>> fit >>>>> >>>>> into the SQL world. SQL does not know about source/sinks but only >>>>> >>>>> about >>>>> >>>>> tables. If the responsibility of a TableSource/TableSink is just >>>>> >>>>> a >>>>> >>>>> pure >>>>> >>>>> physical data consumer/producer that is not connected to the >>>>> >>>>> actual >>>>> >>>>> logical table schema, we would need a possibility of defining >>>>> >>>>> time >>>>> >>>>> attributes and interpreting/converting a changelog. This should >>>>> >>>>> be >>>>> >>>>> done >>>>> >>>>> by the framework with information from the DDL/connect() and not >>>>> >>>>> be >>>>> >>>>> defined in every table source. >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> >>>>> On 09.09.19 14:16, JingsongLee wrote: >>>>> >>>>> Hi dawid: >>>>> >>>>> It is difficult to describe specific examples. >>>>> Sometimes users will generate some java converters through some >>>>> Java code, or generate some Java classes through third-party >>>>> libraries. Of course, these can be best done through >>>>> >>>>> properties. >>>>> >>>>> But this requires additional work from users.My suggestion is to >>>>> keep this Java instance class way that is user-friendly. >>>>> >>>>> Best, >>>>> Jingsong Lee >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> >>>>> From:Dawid Wysakowicz <[hidden email]> <[hidden email] >>>>> Send Time:2019年9月6日(星期五) 16:21 >>>>> To:dev <[hidden email]> <[hidden email]> >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>>> >>>>> Table >>>>> >>>>> module >>>>> >>>>> Hi all, >>>>> @Jingsong Could you elaborate a bit more what do you mean by >>>>> "some Connectors are difficult to convert all states to >>>>> >>>>> properties" >>>>> >>>>> All the Flink provided connectors will definitely be expressible >>>>> >>>>> with >>>>> >>>>> properties (In the end you should be able to use them from DDL). >>>>> >>>>> I >>>>> >>>>> think >>>>> >>>>> if >>>>> >>>>> a TableSource is complex enough that it handles filter push down, >>>>> >>>>> partition >>>>> >>>>> support etc. should rather be made available both from DDL & >>>>> >>>>> java/scala >>>>> >>>>> code. I'm happy to reconsider adding >>>>> >>>>> registerTemporaryTable(String >>>>> >>>>> path, >>>>> >>>>> TableSource source) if you have some concrete examples in mind. >>>>> >>>>> @Xuefu: We also considered the ObjectIdentifier (or actually >>>>> >>>>> introducing >>>>> >>>>> a new identifier representation to differentiate between resolved >>>>> >>>>> and >>>>> >>>>> unresolved identifiers) with the same concerns. We decided to >>>>> >>>>> suggest >>>>> >>>>> the >>>>> >>>>> string & parsing logic because of usability. >>>>> >>>>> tEnv.from("cat.db.table") >>>>> is shorter and easier to write than >>>>> tEnv.from(Identifier.for("cat", "db", "name") >>>>> And also implicitly solves the problem what happens if a user >>>>> >>>>> (e.g. >>>>> >>>>> used >>>>> >>>>> to other systems) uses that API in a following manner: >>>>> >>>>> tEnv.from(Identifier.for("db.name") >>>>> I'm happy to revisit it if the general consensus is that it's >>>>> >>>>> better >>>>> >>>>> to >>>>> >>>>> use the OO aproach. >>>>> >>>>> Best, >>>>> Dawid >>>>> >>>>> On 06/09/2019 10:00, Xuefu Z wrote: >>>>> >>>>> Thanks to Dawid for starting the discussion and writeup. It >>>>> >>>>> looks >>>>> >>>>> pretty >>>>> >>>>> good to me except that I'm a little concerned about the object >>>>> >>>>> reference >>>>> >>>>> and string parsing in the code, which seems to an anti-pattern >>>>> >>>>> to >>>>> >>>>> OOP. >>>>> >>>>> Have >>>>> >>>>> we considered using ObjectIdenitifier with optional catalog and >>>>> >>>>> db >>>>> >>>>> parts, >>>>> >>>>> esp. if we are worried about arguments of variable length or >>>>> >>>>> method >>>>> >>>>> overloading? It's quite likely that the result of string parsing >>>>> >>>>> is >>>>> >>>>> an >>>>> >>>>> ObjectIdentifier instance any way. >>>>> >>>>> Having string parsing logic in the code is a little dangerous as >>>>> >>>>> it >>>>> >>>>> duplicates part of the DDL/DML parsing, and they can easily get >>>>> >>>>> out >>>>> >>>>> of >>>>> >>>>> sync. >>>>> >>>>> Thanks, >>>>> Xuefu >>>>> >>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < >>>>> >>>>> [hidden email] >>>>> >>>>> .invalid> >>>>> >>>>> wrote: >>>>> >>>>> >>>>> Thanks dawid, +1 for this approach. >>>>> >>>>> One concern is the removal of registerTableSink & >>>>> >>>>> registerTableSource >>>>> >>>>> in TableEnvironment. It has two alternatives: >>>>> 1.the properties approach (DDL, descriptor). >>>>> 2.from/toDataStream. >>>>> >>>>> #1 can only be properties, not java states, and some Connectors >>>>> are difficult to convert all states to properties. >>>>> #2 can contain java state. But can't use TableSource-related >>>>> >>>>> features, >>>>> >>>>> like project & filter push down, partition support, etc.. >>>>> >>>>> Any idea about this? >>>>> >>>>> Best, >>>>> Jingsong Lee >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> >>>>> From:Dawid Wysakowicz <[hidden email]> <[hidden email] >>>>> Send Time:2019年9月4日(星期三) 22:20 >>>>> To:dev <[hidden email]> <[hidden email]> >>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in >>>>> >>>>> Table >>>>> >>>>> module >>>>> >>>>> Hi all, >>>>> As part of FLIP-30 a Catalog API was introduced that enables >>>>> >>>>> storing >>>>> >>>>> table >>>>> >>>>> meta objects permanently. At the same time the majority of >>>>> >>>>> current >>>>> >>>>> APIs >>>>> >>>>> create temporary objects that cannot be serialized. We should >>>>> >>>>> clarify >>>>> >>>>> the >>>>> >>>>> creation of meta objects (tables, views, functions) in a unified >>>>> >>>>> way. >>>>> >>>>> Another current problem in the API is that all the temporary >>>>> >>>>> objects >>>>> >>>>> are >>>>> >>>>> stored in a special built-in catalog, which is not very >>>>> >>>>> intuitive >>>>> >>>>> for >>>>> >>>>> many >>>>> >>>>> users, as they must be aware of that catalog to reference >>>>> >>>>> temporary >>>>> >>>>> objects. >>>>> >>>>> Lastly, different APIs have different ways of providing object >>>>> >>>>> paths: >>>>> >>>>> String path…, >>>>> String path, String pathContinued… >>>>> String name >>>>> We should choose one approach and unify it across all APIs. >>>>> I suggest a FLIP to address the above issues. >>>>> Looking forward to your opinions. >>>>> FLIP link: >>>>> >>>>> >>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >> |
Hi Timo,
Sounds good to me. Thanks, Jark On Fri, 11 Oct 2019 at 17:37, Timo Walther <[hidden email]> wrote: > Hi Jark, > > great to see that there is consensus on something like `CREATE FUNCTION > bf1 WITH (precision=0.001)`. > > In the near future, there should be no difference between a catalog > function and a temporary function. This should simplify the stack a lot. > > How about we postpone this topic after FLIP-65? I understand your points > but would also like to have similar semantics for SQL and Table API. We > can just skip the function part for now and also wait for the FLIP > around the function DDL (FLINK-7151). I would suggest to mention both > class and instance in Dawid's FLIP for now. > > Regards, > Timo > > On 11.10.19 10:32, Jark Wu wrote: > > Hi Timo, > > > > The reason to put the parameter in constructor instead of `eval` is that > > 1) the initialization for the parameters may take long > > 2) there might be a lot of parameters, it's verbose to specify them in > > every where it's used. > > > > Having some syntax like `CREATE FUNCTION bf1 WITH (precision=0.001)` is > > great. > > For catalog functions, I totally agree with that. But for temporal > > function, why do we have to enforce it is a class? > > >From my understanding, temporal objects are stored in memory and will > not > > be serialized to catalogs. > > > > If we want to prohibit function instances, are we also planning to drop > the > > support of `table.select(bf1('a))` in Scala Table API? > > > > Regards, > > Jark > > > > > > On Fri, 11 Oct 2019 at 15:34, Timo Walther <[hidden email]> wrote: > > > >> Thanks for your feedback Kurt and Jark. > >> > >> I'm wondering why we allow setting `val bf1 = new BloomFilter(0.001)`, > >> `val bf2 = new BloomFilter(0.005)`. This is not very SQL-like. > >> > >> Parameters should be passed when calling the function instead. > >> > >> Users can use the query itself to parametize the function: `SELECT > >> bf(0.0001, a), bf(0.005, a)` > >> > >> > >> If at all, we can think about a `CREATE FUNCTION bf1 WITH > >> (precision=0.001)` DDL syntax. But with registering instances we might > >> shoot ourselves in the foot in the future. > >> > >> Usually, function exist as classes, it seems unnatural to me that users > >> need to instantiate the function first before registering it. > >> > >> > >> What do think? > >> > >> Thanks, > >> Timo > >> > >> On 11.10.19 05:10, Jark Wu wrote: > >>> Hi, Timo > >>> > >>> Regarding to createTemporaryFunction, I agree with Kurt, we should > >>> encourage the class way, but should keep the instance way. > >>> The reason is that this is an important feature to support parameterize > >>> function as you said, but this can't be addressed by global job > >> parameters > >>> and open() methods. > >>> Because users may want to instantiate more than one instance, e.g. `val > >> bf1 > >>> = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`, and they > >> are > >>> used in one query. > >>> > >>> The another reason is that this can't make the function call string > >>> serializable, because in the Scala Table API path, users can still pass > >> in > >>> a function instance, e.g. table.select(bf1('a)) > >>> > >>> Best, > >>> Jark > >>> > >>> On Thu, 10 Oct 2019 at 23:09, Kurt Young <[hidden email]> wrote: > >>> > >>>> Regarding to createTemporaryFunction, could we just keep both > choices? I > >>>> agree > >>>> we should encourage users to use the class, and it's the only choice > >> when > >>>> user > >>>> using DDL. But for some Table API & SQL programs, I think it's > actually > >>>> more nature > >>>> to use an object instead of use the classes. But this just depends on > >>>> personal > >>>> choices. One use case came to mind which might be affected by this is > >> if we > >>>> want > >>>> to support some anonymous lambda functions as UDF. It will need such > >> create > >>>> function > >>>> with objects support. > >>>> > >>>> Best, > >>>> Kurt > >>>> > >>>> > >>>> On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz < > >> [hidden email]> > >>>> wrote: > >>>> > >>>>> Thanks for the comments. > >>>>> > >>>>> @Timo I think your suggestion makes sense. I changed the signature to > >>>>> > >>>>> void createTemporaryFunction(String path, Class<? extends > >>>> UserDefinedFunction> functionClass); > >>>>> This will mean that after we make QueryOperations string serializable > >> and > >>>>> we expose the type inference in functions we will have only a single > >>>> object > >>>>> that cannot be persisted in a catalog and thus have only a > "Temporary" > >>>>> method: > >>>>> > >>>>> void createTemporaryView(String path, DataStream stream); > >>>>> > >>>>> I updated the FLIP page accordingly. I also listed explicitly methods > >>>> that > >>>>> we need to deprecate and added a note that the implementation of the > >>>>> methods concerning UserDefinedFunctions has to be postponed until we > >> have > >>>>> type inference in place. > >>>>> > >>>>> As the voting period for FLIP-57 ended, I will start the VOTE thread > >> for > >>>>> FLIP-64 tomorrow morning, unless there are some objections until > then. > >>>>> > >>>>> Best, > >>>>> > >>>>> Dawid > >>>>> > >>>>> > >>>>> On 10/10/2019 16:16, Kurt Young wrote: > >>>>> > >>>>> Thanks for the clarification Timo, that's sounds good to me. Let's > >>>> continue > >>>>> to discuss other things. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> < > >>>> [hidden email]> wrote: > >>>>> The purpose of ConnectTableDescriptor was to have a programmatic way > of > >>>>> expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree > >>>>> that it needs some rework in the future, once we have finalized the > DDL > >>>>> to ensure that both concepts are in sync. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 10.10.19 16:08, Kurt Young wrote: > >>>>> > >>>>> Regarding to ConnectTableDescriptor, if in the end it becomes a > builder > >>>>> of CatalogTable, I would be ok with that. But it doesn't look like a > >>>>> > >>>>> builder > >>>>> > >>>>> now, it's more like another form of TableSource/TableSink. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> < > >>>> [hidden email]> wrote: > >>>>> Hi everyone, > >>>>> > >>>>> thanks for the great discussion with a good outcome. > >>>>> > >>>>> I have one last comment about: > >>>>> > >>>>> void createTemporaryFunction(String path, UserDefinedFunction > >> function); > >>>>> We should encourage users to register a class instead of an instance. > >> We > >>>>> should enforce a default constructor in the future. If we need > metadata > >>>>> or type inference, the planner can simply instantiate the class and > >> call > >>>>> the corresponding getters. If people would like to parameterize a > >>>>> function, they can use global job parameters instead - which are > >>>>> available in the open() method. > >>>>> > >>>>> I suggest: > >>>>> > >>>>> void createTemporaryFunction(String path, Class<? extends > >>>>> UserDefinedFunction> functionClass); > >>>>> > >>>>> This is also closer to the SQL DDL that is about to be implemented > in: > >>>> https://issues.apache.org/jira/browse/FLINK-7151 > >>>>> Thanks, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 09.10.19 17:07, Bowen Li wrote: > >>>>> > >>>>> Hi Dawid, > >>>>> > >>>>> +1 for proposed changes > >>>>> > >>>>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < > >>>>> > >>>>> [hidden email] > >>>>> > >>>>> wrote: > >>>>> > >>>>> > >>>>> Sorry for a very delayed response. > >>>>> > >>>>> @Kurt Yes, this is the goal to have a function created like new > >>>>> Function(...) also be wrapped into CatalogFunction. This would have > to > >>>>> be though a temporary function as we cannot represent that as a set > of > >>>>> properties. Similar to the createTemporaryView(DataStream stream). > >>>>> > >>>>> As for the ConnectTableDescriptor I agree this is very similar to > >>>>> CatalogTable. I am not sure though if we should get rid of it. In the > >>>>> end I see it as a builder for a CatalogTable, which is a slightly > more > >>>>> internal API, but we might revisit that some time in the future if we > >>>>> find that it makes more sense. > >>>>> > >>>>> @All I updated the FLIP page with some more details from the outcome > >>>>> > >>>>> of > >>>>> > >>>>> the discussions around FLIP-57. Please take a look. I would like to > >>>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes > through. > >>>>> > >>>>> Best, > >>>>> > >>>>> Dawid > >>>>> > >>>>> > >>>>> On 19/09/2019 09:24, Kurt Young wrote: > >>>>> > >>>>> IIUC it's good to see that both serializable (tables description from > >>>>> > >>>>> DDL) > >>>>> > >>>>> and unserializable (tables with DataStream underneath) tables are > >>>>> > >>>>> treated > >>>>> > >>>>> unify with CatalogTable. > >>>>> > >>>>> Can I also assume functions that either come from a function class > >>>>> > >>>>> (from > >>>>> > >>>>> DDL) > >>>>> or function objects (newed by user) will also treated unify with > >>>>> CatalogFunction? > >>>>> > >>>>> This will greatly simplify and unify current API level concepts and > >>>>> > >>>>> design. > >>>>> > >>>>> And it seems only one thing left, how do we deal with > >>>>> ConnectTableDescriptor? > >>>>> It's actually very similar with serializable CatalogTable, both carry > >>>>> > >>>>> some > >>>>> > >>>>> text > >>>>> properties which even are the same. Is there any chance we can > >>>>> > >>>>> further > >>>>> > >>>>> unify > >>>>> > >>>>> this to CatalogTable? > >>>>> > >>>>> object > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> < > >>>> [hidden email]> wrote: > >>>>> Thanks Dawid for the design doc. > >>>>> > >>>>> In general, I’m +1 to the FLIP. > >>>>> > >>>>> > >>>>> +1 to the single-string and parse way to express object path. > >>>>> > >>>>> +1 to deprecate registerTableSink & registerTableSource. > >>>>> But I would suggest to provide an easy way to register a custom > >>>>> source/sink before we drop them (this is another story). > >>>>> Currently, it’s not easy to implement a custom connector descriptor. > >>>>> > >>>>> Best, > >>>>> Jark > >>>>> > >>>>> > >>>>> > >>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> < > >>>> [hidden email]> > >>>>> 写道: > >>>>> > >>>>> Hi JingsongLee, > >>>>> From my understanding they can. Underneath they will be > >>>>> > >>>>> CatalogTables. > >>>>> > >>>>> The > >>>>> > >>>>> difference is the lifetime of the tables. Plus some of the user > >>>>> > >>>>> facing > >>>>> > >>>>> interfaces cannot be persisted e.g. datastream. Therefore we must > >>>>> > >>>>> have > >>>>> > >>>>> a > >>>>> > >>>>> separate methods for that. In the end the temporary tables are held > >>>>> > >>>>> in > >>>>> > >>>>> memory as CatalogTables. > >>>>> Best, > >>>>> Dawid > >>>>> > >>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] > >>>>> > >>>>> .invalid> > >>>>> > >>>>> wrote: > >>>>> > >>>>> > >>>>> Hi dawid: > >>>>> Can temporary tables achieve the same capabilities as catalog > >>>>> > >>>>> table? > >>>>> > >>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, > >>>>> PartitionStatistics > >>>>> like partition support: we have added some catalog equivalent > >>>>> > >>>>> interfaces > >>>>> > >>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames > >>>>> Maybe it's not a good idea to add these interfaces to > >>>>> TableSource/TableSink. What do you think? > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> From:Kurt Young <[hidden email]> <[hidden email]> > >>>>> Send Time:2019年9月18日(星期三) 17:54 > >>>>> To:dev <[hidden email]> <[hidden email]> > >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>>>> > >>>>> Table > >>>>> > >>>>> module > >>>>> > >>>>> Hi all, > >>>>> > >>>>> Sorry to join this party late. Big +1 to this flip, especially for > >>>>> > >>>>> the > >>>>> > >>>>> dropping > >>>>> "registerTableSink & registerTableSource" part. These are indeed > >>>>> > >>>>> legacy > >>>>> > >>>>> and we should try to unify them through CatalogTable after we > >>>>> > >>>>> introduce > >>>>> > >>>>> the concept of Catalog. > >>>>> > >>>>> From my understanding, what we can registered should all be > >>>>> > >>>>> metadata, > >>>>> > >>>>> TableSource/TableSink should only be the one who is responsible to > >>>>> > >>>>> do > >>>>> > >>>>> the real work, i.e. reading and writing data according to the > >>>>> > >>>>> schema > >>>>> > >>>>> and > >>>>> > >>>>> other information like computed column, partition, .e.g. > >>>>> > >>>>> Best, > >>>>> Kurt > >>>>> > >>>>> > >>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < > >>>>> > >>>>> [hidden email] > >>>>> > >>>>> .invalid> > >>>>> wrote: > >>>>> > >>>>> > >>>>> After some development and thinking, I have a general > >>>>> > >>>>> understanding. > >>>>> > >>>>> +1 to registering a source/sink does not fit into the SQL world. > >>>>> I am OK to have a deprecated registerTemporarySource/Sink to > >>>>> > >>>>> compatible > >>>>> > >>>>> with old ways. > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> > >>>>> From:Timo Walther <[hidden email]> <[hidden email]> > >>>>> Send Time:2019年9月17日(星期二) 08:00 > >>>>> To:dev <[hidden email]> <[hidden email]> > >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>>>> > >>>>> Table > >>>>> > >>>>> module > >>>>> > >>>>> Hi Dawid, > >>>>> > >>>>> thanks for the design document. It fixes big concept gaps due to > >>>>> historical reasons with proper support for serializability and > >>>>> > >>>>> catalog > >>>>> > >>>>> support in mind. > >>>>> > >>>>> I would not mind a registerTemporarySource/Sink, but the problem > >>>>> > >>>>> that I > >>>>> > >>>>> see is that many people think that this is the recommended way of > >>>>> registering a table source/sink which is not true. We should > >>>>> > >>>>> guide > >>>>> > >>>>> users > >>>>> > >>>>> to either use connect() or DDL API which can be validated and > >>>>> > >>>>> stored > >>>>> > >>>>> in > >>>>> > >>>>> catalog. > >>>>> > >>>>> Also from a concept perspective, registering a source/sink does > >>>>> > >>>>> not > >>>>> > >>>>> fit > >>>>> > >>>>> into the SQL world. SQL does not know about source/sinks but only > >>>>> > >>>>> about > >>>>> > >>>>> tables. If the responsibility of a TableSource/TableSink is just > >>>>> > >>>>> a > >>>>> > >>>>> pure > >>>>> > >>>>> physical data consumer/producer that is not connected to the > >>>>> > >>>>> actual > >>>>> > >>>>> logical table schema, we would need a possibility of defining > >>>>> > >>>>> time > >>>>> > >>>>> attributes and interpreting/converting a changelog. This should > >>>>> > >>>>> be > >>>>> > >>>>> done > >>>>> > >>>>> by the framework with information from the DDL/connect() and not > >>>>> > >>>>> be > >>>>> > >>>>> defined in every table source. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 09.09.19 14:16, JingsongLee wrote: > >>>>> > >>>>> Hi dawid: > >>>>> > >>>>> It is difficult to describe specific examples. > >>>>> Sometimes users will generate some java converters through some > >>>>> Java code, or generate some Java classes through third-party > >>>>> libraries. Of course, these can be best done through > >>>>> > >>>>> properties. > >>>>> > >>>>> But this requires additional work from users.My suggestion is to > >>>>> keep this Java instance class way that is user-friendly. > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> > >>>>> From:Dawid Wysakowicz <[hidden email]> < > [hidden email] > >>>>> Send Time:2019年9月6日(星期五) 16:21 > >>>>> To:dev <[hidden email]> <[hidden email]> > >>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in > >>>>> > >>>>> Table > >>>>> > >>>>> module > >>>>> > >>>>> Hi all, > >>>>> @Jingsong Could you elaborate a bit more what do you mean by > >>>>> "some Connectors are difficult to convert all states to > >>>>> > >>>>> properties" > >>>>> > >>>>> All the Flink provided connectors will definitely be expressible > >>>>> > >>>>> with > >>>>> > >>>>> properties (In the end you should be able to use them from DDL). > >>>>> > >>>>> I > >>>>> > >>>>> think > >>>>> > >>>>> if > >>>>> > >>>>> a TableSource is complex enough that it handles filter push down, > >>>>> > >>>>> partition > >>>>> > >>>>> support etc. should rather be made available both from DDL & > >>>>> > >>>>> java/scala > >>>>> > >>>>> code. I'm happy to reconsider adding > >>>>> > >>>>> registerTemporaryTable(String > >>>>> > >>>>> path, > >>>>> > >>>>> TableSource source) if you have some concrete examples in mind. > >>>>> > >>>>> @Xuefu: We also considered the ObjectIdentifier (or actually > >>>>> > >>>>> introducing > >>>>> > >>>>> a new identifier representation to differentiate between resolved > >>>>> > >>>>> and > >>>>> > >>>>> unresolved identifiers) with the same concerns. We decided to > >>>>> > >>>>> suggest > >>>>> > >>>>> the > >>>>> > >>>>> string & parsing logic because of usability. > >>>>> > >>>>> tEnv.from("cat.db.table") > >>>>> is shorter and easier to write than > >>>>> tEnv.from(Identifier.for("cat", "db", "name") > >>>>> And also implicitly solves the problem what happens if a user > >>>>> > >>>>> (e.g. > >>>>> > >>>>> used > >>>>> > >>>>> to other systems) uses that API in a following manner: > >>>>> > >>>>> tEnv.from(Identifier.for("db.name") > >>>>> I'm happy to revisit it if the general consensus is that it's > >>>>> > >>>>> better > >>>>> > >>>>> to > >>>>> > >>>>> use the OO aproach. > >>>>> > >>>>> Best, > >>>>> Dawid > >>>>> > >>>>> On 06/09/2019 10:00, Xuefu Z wrote: > >>>>> > >>>>> Thanks to Dawid for starting the discussion and writeup. It > >>>>> > >>>>> looks > >>>>> > >>>>> pretty > >>>>> > >>>>> good to me except that I'm a little concerned about the object > >>>>> > >>>>> reference > >>>>> > >>>>> and string parsing in the code, which seems to an anti-pattern > >>>>> > >>>>> to > >>>>> > >>>>> OOP. > >>>>> > >>>>> Have > >>>>> > >>>>> we considered using ObjectIdenitifier with optional catalog and > >>>>> > >>>>> db > >>>>> > >>>>> parts, > >>>>> > >>>>> esp. if we are worried about arguments of variable length or > >>>>> > >>>>> method > >>>>> > >>>>> overloading? It's quite likely that the result of string parsing > >>>>> > >>>>> is > >>>>> > >>>>> an > >>>>> > >>>>> ObjectIdentifier instance any way. > >>>>> > >>>>> Having string parsing logic in the code is a little dangerous as > >>>>> > >>>>> it > >>>>> > >>>>> duplicates part of the DDL/DML parsing, and they can easily get > >>>>> > >>>>> out > >>>>> > >>>>> of > >>>>> > >>>>> sync. > >>>>> > >>>>> Thanks, > >>>>> Xuefu > >>>>> > >>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < > >>>>> > >>>>> [hidden email] > >>>>> > >>>>> .invalid> > >>>>> > >>>>> wrote: > >>>>> > >>>>> > >>>>> Thanks dawid, +1 for this approach. > >>>>> > >>>>> One concern is the removal of registerTableSink & > >>>>> > >>>>> registerTableSource > >>>>> > >>>>> in TableEnvironment. It has two alternatives: > >>>>> 1.the properties approach (DDL, descriptor). > >>>>> 2.from/toDataStream. > >>>>> > >>>>> #1 can only be properties, not java states, and some Connectors > >>>>> are difficult to convert all states to properties. > >>>>> #2 can contain java state. But can't use TableSource-related > >>>>> > >>>>> features, > >>>>> > >>>>> like project & filter push down, partition support, etc.. > >>>>> > >>>>> Any idea about this? > >>>>> > >>>>> Best, > >>>>> Jingsong Lee > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> ------------------------------------------------------------------ > >>>>> > >>>>> From:Dawid Wysakowicz <[hidden email]> < > [hidden email] > >>>>> Send Time:2019年9月4日(星期三) 22:20 > >>>>> To:dev <[hidden email]> <[hidden email]> > >>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in > >>>>> > >>>>> Table > >>>>> > >>>>> module > >>>>> > >>>>> Hi all, > >>>>> As part of FLIP-30 a Catalog API was introduced that enables > >>>>> > >>>>> storing > >>>>> > >>>>> table > >>>>> > >>>>> meta objects permanently. At the same time the majority of > >>>>> > >>>>> current > >>>>> > >>>>> APIs > >>>>> > >>>>> create temporary objects that cannot be serialized. We should > >>>>> > >>>>> clarify > >>>>> > >>>>> the > >>>>> > >>>>> creation of meta objects (tables, views, functions) in a unified > >>>>> > >>>>> way. > >>>>> > >>>>> Another current problem in the API is that all the temporary > >>>>> > >>>>> objects > >>>>> > >>>>> are > >>>>> > >>>>> stored in a special built-in catalog, which is not very > >>>>> > >>>>> intuitive > >>>>> > >>>>> for > >>>>> > >>>>> many > >>>>> > >>>>> users, as they must be aware of that catalog to reference > >>>>> > >>>>> temporary > >>>>> > >>>>> objects. > >>>>> > >>>>> Lastly, different APIs have different ways of providing object > >>>>> > >>>>> paths: > >>>>> > >>>>> String path…, > >>>>> String path, String pathContinued… > >>>>> String name > >>>>> We should choose one approach and unify it across all APIs. > >>>>> I suggest a FLIP to address the above issues. > >>>>> Looking forward to your opinions. > >>>>> FLIP link: > >>>>> > >>>>> > >>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > >> > > |
Thank you all for the discussion. I added a note that we might revisit
the signature with an instance. I started a separate voting thread for the current state of the FLIP. I will appreciate your votes. Best, Dawid On 11/10/2019 13:21, Jark Wu wrote: > Hi Timo, > > Sounds good to me. > > Thanks, > Jark > > On Fri, 11 Oct 2019 at 17:37, Timo Walther <[hidden email]> wrote: > >> Hi Jark, >> >> great to see that there is consensus on something like `CREATE FUNCTION >> bf1 WITH (precision=0.001)`. >> >> In the near future, there should be no difference between a catalog >> function and a temporary function. This should simplify the stack a lot. >> >> How about we postpone this topic after FLIP-65? I understand your points >> but would also like to have similar semantics for SQL and Table API. We >> can just skip the function part for now and also wait for the FLIP >> around the function DDL (FLINK-7151). I would suggest to mention both >> class and instance in Dawid's FLIP for now. >> >> Regards, >> Timo >> >> On 11.10.19 10:32, Jark Wu wrote: >>> Hi Timo, >>> >>> The reason to put the parameter in constructor instead of `eval` is that >>> 1) the initialization for the parameters may take long >>> 2) there might be a lot of parameters, it's verbose to specify them in >>> every where it's used. >>> >>> Having some syntax like `CREATE FUNCTION bf1 WITH (precision=0.001)` is >>> great. >>> For catalog functions, I totally agree with that. But for temporal >>> function, why do we have to enforce it is a class? >>> >From my understanding, temporal objects are stored in memory and will >> not >>> be serialized to catalogs. >>> >>> If we want to prohibit function instances, are we also planning to drop >> the >>> support of `table.select(bf1('a))` in Scala Table API? >>> >>> Regards, >>> Jark >>> >>> >>> On Fri, 11 Oct 2019 at 15:34, Timo Walther <[hidden email]> wrote: >>> >>>> Thanks for your feedback Kurt and Jark. >>>> >>>> I'm wondering why we allow setting `val bf1 = new BloomFilter(0.001)`, >>>> `val bf2 = new BloomFilter(0.005)`. This is not very SQL-like. >>>> >>>> Parameters should be passed when calling the function instead. >>>> >>>> Users can use the query itself to parametize the function: `SELECT >>>> bf(0.0001, a), bf(0.005, a)` >>>> >>>> >>>> If at all, we can think about a `CREATE FUNCTION bf1 WITH >>>> (precision=0.001)` DDL syntax. But with registering instances we might >>>> shoot ourselves in the foot in the future. >>>> >>>> Usually, function exist as classes, it seems unnatural to me that users >>>> need to instantiate the function first before registering it. >>>> >>>> >>>> What do think? >>>> >>>> Thanks, >>>> Timo >>>> >>>> On 11.10.19 05:10, Jark Wu wrote: >>>>> Hi, Timo >>>>> >>>>> Regarding to createTemporaryFunction, I agree with Kurt, we should >>>>> encourage the class way, but should keep the instance way. >>>>> The reason is that this is an important feature to support parameterize >>>>> function as you said, but this can't be addressed by global job >>>> parameters >>>>> and open() methods. >>>>> Because users may want to instantiate more than one instance, e.g. `val >>>> bf1 >>>>> = new BloomFilter(0.001)`, `val bf2 = new BloomFilter(0.005)`, and they >>>> are >>>>> used in one query. >>>>> >>>>> The another reason is that this can't make the function call string >>>>> serializable, because in the Scala Table API path, users can still pass >>>> in >>>>> a function instance, e.g. table.select(bf1('a)) >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> On Thu, 10 Oct 2019 at 23:09, Kurt Young <[hidden email]> wrote: >>>>> >>>>>> Regarding to createTemporaryFunction, could we just keep both >> choices? I >>>>>> agree >>>>>> we should encourage users to use the class, and it's the only choice >>>> when >>>>>> user >>>>>> using DDL. But for some Table API & SQL programs, I think it's >> actually >>>>>> more nature >>>>>> to use an object instead of use the classes. But this just depends on >>>>>> personal >>>>>> choices. One use case came to mind which might be affected by this is >>>> if we >>>>>> want >>>>>> to support some anonymous lambda functions as UDF. It will need such >>>> create >>>>>> function >>>>>> with objects support. >>>>>> >>>>>> Best, >>>>>> Kurt >>>>>> >>>>>> >>>>>> On Thu, Oct 10, 2019 at 4:51 PM Dawid Wysakowicz < >>>> [hidden email]> >>>>>> wrote: >>>>>> >>>>>>> Thanks for the comments. >>>>>>> >>>>>>> @Timo I think your suggestion makes sense. I changed the signature to >>>>>>> >>>>>>> void createTemporaryFunction(String path, Class<? extends >>>>>> UserDefinedFunction> functionClass); >>>>>>> This will mean that after we make QueryOperations string serializable >>>> and >>>>>>> we expose the type inference in functions we will have only a single >>>>>> object >>>>>>> that cannot be persisted in a catalog and thus have only a >> "Temporary" >>>>>>> method: >>>>>>> >>>>>>> void createTemporaryView(String path, DataStream stream); >>>>>>> >>>>>>> I updated the FLIP page accordingly. I also listed explicitly methods >>>>>> that >>>>>>> we need to deprecate and added a note that the implementation of the >>>>>>> methods concerning UserDefinedFunctions has to be postponed until we >>>> have >>>>>>> type inference in place. >>>>>>> >>>>>>> As the voting period for FLIP-57 ended, I will start the VOTE thread >>>> for >>>>>>> FLIP-64 tomorrow morning, unless there are some objections until >> then. >>>>>>> Best, >>>>>>> >>>>>>> Dawid >>>>>>> >>>>>>> >>>>>>> On 10/10/2019 16:16, Kurt Young wrote: >>>>>>> >>>>>>> Thanks for the clarification Timo, that's sounds good to me. Let's >>>>>> continue >>>>>>> to discuss other things. >>>>>>> >>>>>>> Best, >>>>>>> Kurt >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 10, 2019 at 4:14 PM Timo Walther <[hidden email]> < >>>>>> [hidden email]> wrote: >>>>>>> The purpose of ConnectTableDescriptor was to have a programmatic way >> of >>>>>>> expressing `CREATE TABLE` with JavaDocs and IDE support. But I agree >>>>>>> that it needs some rework in the future, once we have finalized the >> DDL >>>>>>> to ensure that both concepts are in sync. >>>>>>> >>>>>>> Regards, >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> On 10.10.19 16:08, Kurt Young wrote: >>>>>>> >>>>>>> Regarding to ConnectTableDescriptor, if in the end it becomes a >> builder >>>>>>> of CatalogTable, I would be ok with that. But it doesn't look like a >>>>>>> >>>>>>> builder >>>>>>> >>>>>>> now, it's more like another form of TableSource/TableSink. >>>>>>> >>>>>>> Best, >>>>>>> Kurt >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 10, 2019 at 3:44 PM Timo Walther <[hidden email]> < >>>>>> [hidden email]> wrote: >>>>>>> Hi everyone, >>>>>>> >>>>>>> thanks for the great discussion with a good outcome. >>>>>>> >>>>>>> I have one last comment about: >>>>>>> >>>>>>> void createTemporaryFunction(String path, UserDefinedFunction >>>> function); >>>>>>> We should encourage users to register a class instead of an instance. >>>> We >>>>>>> should enforce a default constructor in the future. If we need >> metadata >>>>>>> or type inference, the planner can simply instantiate the class and >>>> call >>>>>>> the corresponding getters. If people would like to parameterize a >>>>>>> function, they can use global job parameters instead - which are >>>>>>> available in the open() method. >>>>>>> >>>>>>> I suggest: >>>>>>> >>>>>>> void createTemporaryFunction(String path, Class<? extends >>>>>>> UserDefinedFunction> functionClass); >>>>>>> >>>>>>> This is also closer to the SQL DDL that is about to be implemented >> in: >>>>>> https://issues.apache.org/jira/browse/FLINK-7151 >>>>>>> Thanks, >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> On 09.10.19 17:07, Bowen Li wrote: >>>>>>> >>>>>>> Hi Dawid, >>>>>>> >>>>>>> +1 for proposed changes >>>>>>> >>>>>>> On Wed, Oct 9, 2019 at 12:15 PM Dawid Wysakowicz < >>>>>>> >>>>>>> [hidden email] >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Sorry for a very delayed response. >>>>>>> >>>>>>> @Kurt Yes, this is the goal to have a function created like new >>>>>>> Function(...) also be wrapped into CatalogFunction. This would have >> to >>>>>>> be though a temporary function as we cannot represent that as a set >> of >>>>>>> properties. Similar to the createTemporaryView(DataStream stream). >>>>>>> >>>>>>> As for the ConnectTableDescriptor I agree this is very similar to >>>>>>> CatalogTable. I am not sure though if we should get rid of it. In the >>>>>>> end I see it as a builder for a CatalogTable, which is a slightly >> more >>>>>>> internal API, but we might revisit that some time in the future if we >>>>>>> find that it makes more sense. >>>>>>> >>>>>>> @All I updated the FLIP page with some more details from the outcome >>>>>>> >>>>>>> of >>>>>>> >>>>>>> the discussions around FLIP-57. Please take a look. I would like to >>>>>>> start a vote on this FLIP as soon as the vote on FLIP-57 goes >> through. >>>>>>> Best, >>>>>>> >>>>>>> Dawid >>>>>>> >>>>>>> >>>>>>> On 19/09/2019 09:24, Kurt Young wrote: >>>>>>> >>>>>>> IIUC it's good to see that both serializable (tables description from >>>>>>> >>>>>>> DDL) >>>>>>> >>>>>>> and unserializable (tables with DataStream underneath) tables are >>>>>>> >>>>>>> treated >>>>>>> >>>>>>> unify with CatalogTable. >>>>>>> >>>>>>> Can I also assume functions that either come from a function class >>>>>>> >>>>>>> (from >>>>>>> >>>>>>> DDL) >>>>>>> or function objects (newed by user) will also treated unify with >>>>>>> CatalogFunction? >>>>>>> >>>>>>> This will greatly simplify and unify current API level concepts and >>>>>>> >>>>>>> design. >>>>>>> >>>>>>> And it seems only one thing left, how do we deal with >>>>>>> ConnectTableDescriptor? >>>>>>> It's actually very similar with serializable CatalogTable, both carry >>>>>>> >>>>>>> some >>>>>>> >>>>>>> text >>>>>>> properties which even are the same. Is there any chance we can >>>>>>> >>>>>>> further >>>>>>> >>>>>>> unify >>>>>>> >>>>>>> this to CatalogTable? >>>>>>> >>>>>>> object >>>>>>> Best, >>>>>>> Kurt >>>>>>> >>>>>>> >>>>>>> On Thu, Sep 19, 2019 at 3:13 PM Jark Wu <[hidden email]> < >>>>>> [hidden email]> wrote: >>>>>>> Thanks Dawid for the design doc. >>>>>>> >>>>>>> In general, I’m +1 to the FLIP. >>>>>>> >>>>>>> >>>>>>> +1 to the single-string and parse way to express object path. >>>>>>> >>>>>>> +1 to deprecate registerTableSink & registerTableSource. >>>>>>> But I would suggest to provide an easy way to register a custom >>>>>>> source/sink before we drop them (this is another story). >>>>>>> Currently, it’s not easy to implement a custom connector descriptor. >>>>>>> >>>>>>> Best, >>>>>>> Jark >>>>>>> >>>>>>> >>>>>>> >>>>>>> 在 2019年9月19日,11:37,Dawid Wysakowicz <[hidden email]> < >>>>>> [hidden email]> >>>>>>> 写道: >>>>>>> >>>>>>> Hi JingsongLee, >>>>>>> From my understanding they can. Underneath they will be >>>>>>> >>>>>>> CatalogTables. >>>>>>> >>>>>>> The >>>>>>> >>>>>>> difference is the lifetime of the tables. Plus some of the user >>>>>>> >>>>>>> facing >>>>>>> >>>>>>> interfaces cannot be persisted e.g. datastream. Therefore we must >>>>>>> >>>>>>> have >>>>>>> >>>>>>> a >>>>>>> >>>>>>> separate methods for that. In the end the temporary tables are held >>>>>>> >>>>>>> in >>>>>>> >>>>>>> memory as CatalogTables. >>>>>>> Best, >>>>>>> Dawid >>>>>>> >>>>>>> On Thu, 19 Sep 2019, 10:08 JingsongLee, <[hidden email] >>>>>>> >>>>>>> .invalid> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi dawid: >>>>>>> Can temporary tables achieve the same capabilities as catalog >>>>>>> >>>>>>> table? >>>>>>> >>>>>>> like statistics: CatalogTableStatistics, CatalogColumnStatistics, >>>>>>> PartitionStatistics >>>>>>> like partition support: we have added some catalog equivalent >>>>>>> >>>>>>> interfaces >>>>>>> >>>>>>> on TableSource/TableSink: getPartitions, getPartitionFieldNames >>>>>>> Maybe it's not a good idea to add these interfaces to >>>>>>> TableSource/TableSink. What do you think? >>>>>>> >>>>>>> Best, >>>>>>> Jingsong Lee >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------ >>>>>>> From:Kurt Young <[hidden email]> <[hidden email]> >>>>>>> Send Time:2019年9月18日(星期三) 17:54 >>>>>>> To:dev <[hidden email]> <[hidden email]> >>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>>>>> >>>>>>> Table >>>>>>> >>>>>>> module >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> Sorry to join this party late. Big +1 to this flip, especially for >>>>>>> >>>>>>> the >>>>>>> >>>>>>> dropping >>>>>>> "registerTableSink & registerTableSource" part. These are indeed >>>>>>> >>>>>>> legacy >>>>>>> >>>>>>> and we should try to unify them through CatalogTable after we >>>>>>> >>>>>>> introduce >>>>>>> >>>>>>> the concept of Catalog. >>>>>>> >>>>>>> From my understanding, what we can registered should all be >>>>>>> >>>>>>> metadata, >>>>>>> >>>>>>> TableSource/TableSink should only be the one who is responsible to >>>>>>> >>>>>>> do >>>>>>> >>>>>>> the real work, i.e. reading and writing data according to the >>>>>>> >>>>>>> schema >>>>>>> >>>>>>> and >>>>>>> >>>>>>> other information like computed column, partition, .e.g. >>>>>>> >>>>>>> Best, >>>>>>> Kurt >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee < >>>>>>> >>>>>>> [hidden email] >>>>>>> >>>>>>> .invalid> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> After some development and thinking, I have a general >>>>>>> >>>>>>> understanding. >>>>>>> >>>>>>> +1 to registering a source/sink does not fit into the SQL world. >>>>>>> I am OK to have a deprecated registerTemporarySource/Sink to >>>>>>> >>>>>>> compatible >>>>>>> >>>>>>> with old ways. >>>>>>> >>>>>>> Best, >>>>>>> Jingsong Lee >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------ >>>>>>> >>>>>>> From:Timo Walther <[hidden email]> <[hidden email]> >>>>>>> Send Time:2019年9月17日(星期二) 08:00 >>>>>>> To:dev <[hidden email]> <[hidden email]> >>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>>>>> >>>>>>> Table >>>>>>> >>>>>>> module >>>>>>> >>>>>>> Hi Dawid, >>>>>>> >>>>>>> thanks for the design document. It fixes big concept gaps due to >>>>>>> historical reasons with proper support for serializability and >>>>>>> >>>>>>> catalog >>>>>>> >>>>>>> support in mind. >>>>>>> >>>>>>> I would not mind a registerTemporarySource/Sink, but the problem >>>>>>> >>>>>>> that I >>>>>>> >>>>>>> see is that many people think that this is the recommended way of >>>>>>> registering a table source/sink which is not true. We should >>>>>>> >>>>>>> guide >>>>>>> >>>>>>> users >>>>>>> >>>>>>> to either use connect() or DDL API which can be validated and >>>>>>> >>>>>>> stored >>>>>>> >>>>>>> in >>>>>>> >>>>>>> catalog. >>>>>>> >>>>>>> Also from a concept perspective, registering a source/sink does >>>>>>> >>>>>>> not >>>>>>> >>>>>>> fit >>>>>>> >>>>>>> into the SQL world. SQL does not know about source/sinks but only >>>>>>> >>>>>>> about >>>>>>> >>>>>>> tables. If the responsibility of a TableSource/TableSink is just >>>>>>> >>>>>>> a >>>>>>> >>>>>>> pure >>>>>>> >>>>>>> physical data consumer/producer that is not connected to the >>>>>>> >>>>>>> actual >>>>>>> >>>>>>> logical table schema, we would need a possibility of defining >>>>>>> >>>>>>> time >>>>>>> >>>>>>> attributes and interpreting/converting a changelog. This should >>>>>>> >>>>>>> be >>>>>>> >>>>>>> done >>>>>>> >>>>>>> by the framework with information from the DDL/connect() and not >>>>>>> >>>>>>> be >>>>>>> >>>>>>> defined in every table source. >>>>>>> >>>>>>> Regards, >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> On 09.09.19 14:16, JingsongLee wrote: >>>>>>> >>>>>>> Hi dawid: >>>>>>> >>>>>>> It is difficult to describe specific examples. >>>>>>> Sometimes users will generate some java converters through some >>>>>>> Java code, or generate some Java classes through third-party >>>>>>> libraries. Of course, these can be best done through >>>>>>> >>>>>>> properties. >>>>>>> >>>>>>> But this requires additional work from users.My suggestion is to >>>>>>> keep this Java instance class way that is user-friendly. >>>>>>> >>>>>>> Best, >>>>>>> Jingsong Lee >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------ >>>>>>> >>>>>>> From:Dawid Wysakowicz <[hidden email]> < >> [hidden email] >>>>>>> Send Time:2019年9月6日(星期五) 16:21 >>>>>>> To:dev <[hidden email]> <[hidden email]> >>>>>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in >>>>>>> >>>>>>> Table >>>>>>> >>>>>>> module >>>>>>> >>>>>>> Hi all, >>>>>>> @Jingsong Could you elaborate a bit more what do you mean by >>>>>>> "some Connectors are difficult to convert all states to >>>>>>> >>>>>>> properties" >>>>>>> >>>>>>> All the Flink provided connectors will definitely be expressible >>>>>>> >>>>>>> with >>>>>>> >>>>>>> properties (In the end you should be able to use them from DDL). >>>>>>> >>>>>>> I >>>>>>> >>>>>>> think >>>>>>> >>>>>>> if >>>>>>> >>>>>>> a TableSource is complex enough that it handles filter push down, >>>>>>> >>>>>>> partition >>>>>>> >>>>>>> support etc. should rather be made available both from DDL & >>>>>>> >>>>>>> java/scala >>>>>>> >>>>>>> code. I'm happy to reconsider adding >>>>>>> >>>>>>> registerTemporaryTable(String >>>>>>> >>>>>>> path, >>>>>>> >>>>>>> TableSource source) if you have some concrete examples in mind. >>>>>>> >>>>>>> @Xuefu: We also considered the ObjectIdentifier (or actually >>>>>>> >>>>>>> introducing >>>>>>> >>>>>>> a new identifier representation to differentiate between resolved >>>>>>> >>>>>>> and >>>>>>> >>>>>>> unresolved identifiers) with the same concerns. We decided to >>>>>>> >>>>>>> suggest >>>>>>> >>>>>>> the >>>>>>> >>>>>>> string & parsing logic because of usability. >>>>>>> >>>>>>> tEnv.from("cat.db.table") >>>>>>> is shorter and easier to write than >>>>>>> tEnv.from(Identifier.for("cat", "db", "name") >>>>>>> And also implicitly solves the problem what happens if a user >>>>>>> >>>>>>> (e.g. >>>>>>> >>>>>>> used >>>>>>> >>>>>>> to other systems) uses that API in a following manner: >>>>>>> >>>>>>> tEnv.from(Identifier.for("db.name") >>>>>>> I'm happy to revisit it if the general consensus is that it's >>>>>>> >>>>>>> better >>>>>>> >>>>>>> to >>>>>>> >>>>>>> use the OO aproach. >>>>>>> >>>>>>> Best, >>>>>>> Dawid >>>>>>> >>>>>>> On 06/09/2019 10:00, Xuefu Z wrote: >>>>>>> >>>>>>> Thanks to Dawid for starting the discussion and writeup. It >>>>>>> >>>>>>> looks >>>>>>> >>>>>>> pretty >>>>>>> >>>>>>> good to me except that I'm a little concerned about the object >>>>>>> >>>>>>> reference >>>>>>> >>>>>>> and string parsing in the code, which seems to an anti-pattern >>>>>>> >>>>>>> to >>>>>>> >>>>>>> OOP. >>>>>>> >>>>>>> Have >>>>>>> >>>>>>> we considered using ObjectIdenitifier with optional catalog and >>>>>>> >>>>>>> db >>>>>>> >>>>>>> parts, >>>>>>> >>>>>>> esp. if we are worried about arguments of variable length or >>>>>>> >>>>>>> method >>>>>>> >>>>>>> overloading? It's quite likely that the result of string parsing >>>>>>> >>>>>>> is >>>>>>> >>>>>>> an >>>>>>> >>>>>>> ObjectIdentifier instance any way. >>>>>>> >>>>>>> Having string parsing logic in the code is a little dangerous as >>>>>>> >>>>>>> it >>>>>>> >>>>>>> duplicates part of the DDL/DML parsing, and they can easily get >>>>>>> >>>>>>> out >>>>>>> >>>>>>> of >>>>>>> >>>>>>> sync. >>>>>>> >>>>>>> Thanks, >>>>>>> Xuefu >>>>>>> >>>>>>> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee < >>>>>>> >>>>>>> [hidden email] >>>>>>> >>>>>>> .invalid> >>>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Thanks dawid, +1 for this approach. >>>>>>> >>>>>>> One concern is the removal of registerTableSink & >>>>>>> >>>>>>> registerTableSource >>>>>>> >>>>>>> in TableEnvironment. It has two alternatives: >>>>>>> 1.the properties approach (DDL, descriptor). >>>>>>> 2.from/toDataStream. >>>>>>> >>>>>>> #1 can only be properties, not java states, and some Connectors >>>>>>> are difficult to convert all states to properties. >>>>>>> #2 can contain java state. But can't use TableSource-related >>>>>>> >>>>>>> features, >>>>>>> >>>>>>> like project & filter push down, partition support, etc.. >>>>>>> >>>>>>> Any idea about this? >>>>>>> >>>>>>> Best, >>>>>>> Jingsong Lee >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------ >>>>>>> >>>>>>> From:Dawid Wysakowicz <[hidden email]> < >> [hidden email] >>>>>>> Send Time:2019年9月4日(星期三) 22:20 >>>>>>> To:dev <[hidden email]> <[hidden email]> >>>>>>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in >>>>>>> >>>>>>> Table >>>>>>> >>>>>>> module >>>>>>> >>>>>>> Hi all, >>>>>>> As part of FLIP-30 a Catalog API was introduced that enables >>>>>>> >>>>>>> storing >>>>>>> >>>>>>> table >>>>>>> >>>>>>> meta objects permanently. At the same time the majority of >>>>>>> >>>>>>> current >>>>>>> >>>>>>> APIs >>>>>>> >>>>>>> create temporary objects that cannot be serialized. We should >>>>>>> >>>>>>> clarify >>>>>>> >>>>>>> the >>>>>>> >>>>>>> creation of meta objects (tables, views, functions) in a unified >>>>>>> >>>>>>> way. >>>>>>> >>>>>>> Another current problem in the API is that all the temporary >>>>>>> >>>>>>> objects >>>>>>> >>>>>>> are >>>>>>> >>>>>>> stored in a special built-in catalog, which is not very >>>>>>> >>>>>>> intuitive >>>>>>> >>>>>>> for >>>>>>> >>>>>>> many >>>>>>> >>>>>>> users, as they must be aware of that catalog to reference >>>>>>> >>>>>>> temporary >>>>>>> >>>>>>> objects. >>>>>>> >>>>>>> Lastly, different APIs have different ways of providing object >>>>>>> >>>>>>> paths: >>>>>>> >>>>>>> String path…, >>>>>>> String path, String pathContinued… >>>>>>> String name >>>>>>> We should choose one approach and unify it across all APIs. >>>>>>> I suggest a FLIP to address the above issues. >>>>>>> Looking forward to your opinions. >>>>>>> FLIP link: >>>>>>> >>>>>>> >>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module >> signature.asc (849 bytes) Download Attachment |
Free forum by Nabble | Edit this page |