Hi all,
I'd like to start a voting thread for FLIP-57 [1], which we've reached consensus in [2]. This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26. Thanks, Bowen [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 |
+1
Best, Kurt On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> wrote: > Hi all, > > I'd like to start a voting thread for FLIP-57 [1], which we've reached > consensus in [2]. > > This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26. > > Thanks, > Bowen > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > [2] > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > |
+1
Best, Terry Wang > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > +1 > > Best, > Kurt > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> wrote: > >> Hi all, >> >> I'd like to start a voting thread for FLIP-57 [1], which we've reached >> consensus in [2]. >> >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26. >> >> Thanks, >> Bowen >> >> [1] >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog >> [2] >> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 >> |
+1. LGTM
On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> wrote: > +1 > > Best, > Terry Wang > > > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > > > +1 > > > > Best, > > Kurt > > > > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> wrote: > > > >> Hi all, > >> > >> I'd like to start a voting thread for FLIP-57 [1], which we've reached > >> consensus in [2]. > >> > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26. > >> > >> Thanks, > >> Bowen > >> > >> [1] > >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > >> [2] > >> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > >> > > -- Xuefu Zhang "In Honey We Trust!" |
Hi,
I really like the flip and think it clarifies important aspects of the system. I have two, I hope small suggestions, which will not take much time to agree on. 1. Could we follow the MySQL approach in regards to the existence of cat/db for temporary functions? That means not to check it, so e.g. it's possible to create a temporary function in a database that does not exist. I think it's really useful e.g in cases when user wants to perform experiments but does not have access to the db yet or temporarily does not have connection to a catalog. 2. Could we not change the ObjectIdentifier? Could we not loosen the requirements for all catalog objects such as tables, views, types just for the functions? It's really important later on from e.g the serializability perspective. The important aspect of the ObjectIdentifier is that we know that it has been resolved. The suggested changes break that assumption. What do you think about adding an interface FunctionIdentifier { String getName(); /** Return 3-part identifier. Empty in case of a built-in function. */ Optional<ObjectIdentifier> getObjectIdentifier() } class ObjectIdentifier implements FunctionIdentifier { Optional<ObjectIdentifier> getObjectIdentifier() { return Optional.of(this); } } class SystemFunctionIdentifier implements FunctionIdentifier {...} WDYT? On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > +1. LGTM > > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> wrote: > > > +1 > > > > Best, > > Terry Wang > > > > > > > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > > > > > +1 > > > > > > Best, > > > Kurt > > > > > > > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> wrote: > > > > > >> Hi all, > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've reached > > >> consensus in [2]. > > >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26. > > >> > > >> Thanks, > > >> Bowen > > >> > > >> [1] > > >> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > >> [2] > > >> > > >> > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > >> > > > > > > -- > Xuefu Zhang > > "In Honey We Trust!" > |
Hi Dawid,
Re 1): I agree making it easy for users to run experiments is important. However, I'm not sure allowing users to register temp functions in nonexistent catalog/db is the optimal way. It seems a bit hacky, and breaks the current contract between Flink and users that catalog/db must be valid in order to operate on. How about we instead focus on making it convenient to create catalogs? Users actually can already do it with ease via program or SQL CLI yaml file for an in-memory catalog which has neither extra dependency nor external connections. What we can further improve is DDL for catalogs, and I raised it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by Terry now. In that case, if users would like to experiment via SQL, they can easily create an in memory catalog/database using DDL, then play with temp functions. Re 2): For the assumption, IIUIC, function ObjectIdentifier has not been resolved when stack call reaches FunctionCatalog#lookupFunction(), but only been parsed? I agree keeping ObjectIdentifier as-is would be good. I'm ok with the suggested classes, though making ObjectIdentifier a subclass of FunctionIdentifier seem a bit counter intuitive. Another potentially simpler way is: ``` // in class FunctionLookup class Result { Optional<ObjectIdentifier> getObjectIdentifier() { ... } String getName() { ... } ... } ``` WDYT? On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <[hidden email]> wrote: > Hi, > I really like the flip and think it clarifies important aspects of the > system. > > I have two, I hope small suggestions, which will not take much time to > agree on. > > 1. Could we follow the MySQL approach in regards to the existence of cat/db > for temporary functions? That means not to check it, so e.g. it's possible > to create a temporary function in a database that does not exist. I think > it's really useful e.g in cases when user wants to perform experiments but > does not have access to the db yet or temporarily does not have connection > to a catalog. > 2. Could we not change the ObjectIdentifier? Could we not loosen the > requirements for all catalog objects such as tables, views, types just for > the functions? It's really important later on from e.g the serializability > perspective. The important aspect of the ObjectIdentifier is that we know > that it has been resolved. The suggested changes break that assumption. > > What do you think about adding an interface FunctionIdentifier { > > String getName(); > > /** > Return 3-part identifier. Empty in case of a built-in function. > */ > Optional<ObjectIdentifier> getObjectIdentifier() > } > > class ObjectIdentifier implements FunctionIdentifier { > Optional<ObjectIdentifier> getObjectIdentifier() { > return Optional.of(this); > } > } > > class SystemFunctionIdentifier implements FunctionIdentifier {...} > > WDYT? > > On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > > > +1. LGTM > > > > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> wrote: > > > > > +1 > > > > > > Best, > > > Terry Wang > > > > > > > > > > > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > > > > > > > +1 > > > > > > > > Best, > > > > Kurt > > > > > > > > > > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> > wrote: > > > > > > > >> Hi all, > > > >> > > > >> I'd like to start a voting thread for FLIP-57 [1], which we've > reached > > > >> consensus in [2]. > > > >> > > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26. > > > >> > > > >> Thanks, > > > >> Bowen > > > >> > > > >> [1] > > > >> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > > >> [2] > > > >> > > > >> > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > > >> > > > > > > > > > > -- > > Xuefu Zhang > > > > "In Honey We Trust!" > > > |
Sorry, I missed some parts of the solution. The complete alternative is the
following, basically having separate APIs in FunctionLookup for ambiguous and precise function lookup since planner is able to tell which API to call with parsed queries, and have a unified result: ``` class FunctionLookup { Optional<Result> lookupAmbiguousFunction(String name); Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); class Result { Optional<ObjectIdentifier> getObjectIdentifier() { ... } String getName() { ... } // ... } } ``` Thanks, Bowen On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> wrote: > Hi Dawid, > > Re 1): I agree making it easy for users to run experiments is important. > However, I'm not sure allowing users to register temp functions in > nonexistent catalog/db is the optimal way. It seems a bit hacky, and breaks > the current contract between Flink and users that catalog/db must be valid > in order to operate on. > > How about we instead focus on making it convenient to create catalogs? > Users actually can already do it with ease via program or SQL CLI yaml file > for an in-memory catalog which has neither extra dependency nor external > connections. What we can further improve is DDL for catalogs, and I raised > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by Terry > now. > > In that case, if users would like to experiment via SQL, they can easily > create an in memory catalog/database using DDL, then play with temp > functions. > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not been > resolved when stack call reaches FunctionCatalog#lookupFunction(), but only > been parsed? > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with the > suggested classes, though making ObjectIdentifier a subclass of > FunctionIdentifier seem a bit counter intuitive. > > Another potentially simpler way is: > > ``` > // in class FunctionLookup > class Result { > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > String getName() { ... } > ... > } > ``` > > WDYT? > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > [hidden email]> wrote: > >> Hi, >> I really like the flip and think it clarifies important aspects of the >> system. >> >> I have two, I hope small suggestions, which will not take much time to >> agree on. >> >> 1. Could we follow the MySQL approach in regards to the existence of >> cat/db >> for temporary functions? That means not to check it, so e.g. it's possible >> to create a temporary function in a database that does not exist. I think >> it's really useful e.g in cases when user wants to perform experiments but >> does not have access to the db yet or temporarily does not have connection >> to a catalog. >> 2. Could we not change the ObjectIdentifier? Could we not loosen the >> requirements for all catalog objects such as tables, views, types just for >> the functions? It's really important later on from e.g the serializability >> perspective. The important aspect of the ObjectIdentifier is that we know >> that it has been resolved. The suggested changes break that assumption. >> >> What do you think about adding an interface FunctionIdentifier { >> >> String getName(); >> >> /** >> Return 3-part identifier. Empty in case of a built-in function. >> */ >> Optional<ObjectIdentifier> getObjectIdentifier() >> } >> >> class ObjectIdentifier implements FunctionIdentifier { >> Optional<ObjectIdentifier> getObjectIdentifier() { >> return Optional.of(this); >> } >> } >> >> class SystemFunctionIdentifier implements FunctionIdentifier {...} >> >> WDYT? >> >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: >> >> > +1. LGTM >> > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> wrote: >> > >> > > +1 >> > > >> > > Best, >> > > Terry Wang >> > > >> > > >> > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: >> > > > >> > > > +1 >> > > > >> > > > Best, >> > > > Kurt >> > > > >> > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> >> wrote: >> > > > >> > > >> Hi all, >> > > >> >> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've >> reached >> > > >> consensus in [2]. >> > > >> >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep >> 26. >> > > >> >> > > >> Thanks, >> > > >> Bowen >> > > >> >> > > >> [1] >> > > >> >> > > >> >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog >> > > >> [2] >> > > >> >> > > >> >> > > >> > >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 >> > > >> >> > > >> > > >> > >> > -- >> > Xuefu Zhang >> > >> > "In Honey We Trust!" >> > >> > |
Ad. 1
I wouldn't say it is hacky. Moreover, how do you want ensure that the dB always exists when a temporary object is used?( in this particular case function). Do you want to query for the database existence whenever e.g a temporary function is used? I think important aspect here is that the database can be dropped from external system, not just flink or a different flink session. E.g in case of hive, you cannot create a temporary table in a database that does not exist, that's true. But if you create a temporary table in a database and drop that database from a different session, you can still query the previously created temporary table from the original session. It does not sound like a consistent behaviour to me. Why don't we make this behaviour of not binding a temporary objects to the lifetime of a database explicit as part of the temporary objects contract? In the end they exist in different layers. Permanent objects & databases in a catalog (in case of hive megastore) whereas temporary objects in flink sessions. That's also true for the original hive client. The temporary objects live in the hive client whereas databases are created in the metastore. Ad.2 I'm open for suggestions here. The one thing I wanted to achieve here is so that we do not change the contract of ObjectIdentifier. One important thing to remember here is that we need the function identifier to be part of the FunctionDefinition object and not only as the result of the function lookup. At some point we want to be able to store QueryOperations in the catalogs. They can contain function calls within which we need to have the identifier. I agree my initial suggestion is over complicated. How about we have just the FunctionIdentifier as top level class without making the ObjectIdentifier extend from it? I think it's pretty much the same what you suggested. The only difference is that it would be a top level class with a more descriptive name. On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: > Sorry, I missed some parts of the solution. The complete alternative is the > following, basically having separate APIs in FunctionLookup for ambiguous > and precise function lookup since planner is able to tell which API to call > with parsed queries, and have a unified result: > > ``` > class FunctionLookup { > > Optional<Result> lookupAmbiguousFunction(String name); > > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > > > class Result { > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > String getName() { ... } > // ... > } > > } > ``` > > Thanks, > Bowen > > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> wrote: > > > Hi Dawid, > > > > Re 1): I agree making it easy for users to run experiments is important. > > However, I'm not sure allowing users to register temp functions in > > nonexistent catalog/db is the optimal way. It seems a bit hacky, and > breaks > > the current contract between Flink and users that catalog/db must be > valid > > in order to operate on. > > > > How about we instead focus on making it convenient to create catalogs? > > Users actually can already do it with ease via program or SQL CLI yaml > file > > for an in-memory catalog which has neither extra dependency nor external > > connections. What we can further improve is DDL for catalogs, and I > raised > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by Terry > > now. > > > > In that case, if users would like to experiment via SQL, they can easily > > create an in memory catalog/database using DDL, then play with temp > > functions. > > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not been > > resolved when stack call reaches FunctionCatalog#lookupFunction(), but > only > > been parsed? > > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with the > > suggested classes, though making ObjectIdentifier a subclass of > > FunctionIdentifier seem a bit counter intuitive. > > > > Another potentially simpler way is: > > > > ``` > > // in class FunctionLookup > > class Result { > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > String getName() { ... } > > ... > > } > > ``` > > > > WDYT? > > > > > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > > [hidden email]> wrote: > > > >> Hi, > >> I really like the flip and think it clarifies important aspects of the > >> system. > >> > >> I have two, I hope small suggestions, which will not take much time to > >> agree on. > >> > >> 1. Could we follow the MySQL approach in regards to the existence of > >> cat/db > >> for temporary functions? That means not to check it, so e.g. it's > possible > >> to create a temporary function in a database that does not exist. I > think > >> it's really useful e.g in cases when user wants to perform experiments > but > >> does not have access to the db yet or temporarily does not have > connection > >> to a catalog. > >> 2. Could we not change the ObjectIdentifier? Could we not loosen the > >> requirements for all catalog objects such as tables, views, types just > for > >> the functions? It's really important later on from e.g the > serializability > >> perspective. The important aspect of the ObjectIdentifier is that we > know > >> that it has been resolved. The suggested changes break that assumption. > >> > >> What do you think about adding an interface FunctionIdentifier { > >> > >> String getName(); > >> > >> /** > >> Return 3-part identifier. Empty in case of a built-in function. > >> */ > >> Optional<ObjectIdentifier> getObjectIdentifier() > >> } > >> > >> class ObjectIdentifier implements FunctionIdentifier { > >> Optional<ObjectIdentifier> getObjectIdentifier() { > >> return Optional.of(this); > >> } > >> } > >> > >> class SystemFunctionIdentifier implements FunctionIdentifier {...} > >> > >> WDYT? > >> > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > >> > >> > +1. LGTM > >> > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> > wrote: > >> > > >> > > +1 > >> > > > >> > > Best, > >> > > Terry Wang > >> > > > >> > > > >> > > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > >> > > > > >> > > > +1 > >> > > > > >> > > > Best, > >> > > > Kurt > >> > > > > >> > > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> > >> wrote: > >> > > > > >> > > >> Hi all, > >> > > >> > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've > >> reached > >> > > >> consensus in [2]. > >> > > >> > >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep > >> 26. > >> > > >> > >> > > >> Thanks, > >> > > >> Bowen > >> > > >> > >> > > >> [1] > >> > > >> > >> > > >> > >> > > > >> > > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > >> > > >> [2] > >> > > >> > >> > > >> > >> > > > >> > > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > >> > > >> > >> > > > >> > > > >> > > >> > -- > >> > Xuefu Zhang > >> > > >> > "In Honey We Trust!" > >> > > >> > > > |
Re 1) As described in the FLIP, a temp function lookup will first make sure
the db exists. If the db doesn't exist, a lazy drop is triggered to remove that temp function. I agree Hive doesn't handle it consistently, and we are not copying Hive. IMHO, allowing registering temp functions in nonexistent catalog/db is hacky and problematic. For instance, "SHOW FUNCTIONS" would list system functions and functions in the current catalog/db, since users cannot designate a nonexistent catalog/db as current ones, how can they list functions in nonexistent catalog/db? They may end up never knowing what temp functions they've created unless trying out with queries or we introducing some more nonstandard SQL statements. The same applies to other temp objects like temp tables. Re 2) A standalone FunctionIdentifier sounds good to me On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <[hidden email]> wrote: > Ad. 1 > I wouldn't say it is hacky. > Moreover, how do you want ensure that the dB always exists when a temporary > object is used?( in this particular case function). Do you want to query > for the database existence whenever e.g a temporary function is used? I > think important aspect here is that the database can be dropped from > external system, not just flink or a different flink session. > > E.g in case of hive, you cannot create a temporary table in a database that > does not exist, that's true. But if you create a temporary table in a > database and drop that database from a different session, you can still > query the previously created temporary table from the original session. It > does not sound like a consistent behaviour to me. Why don't we make this > behaviour of not binding a temporary objects to the lifetime of a database > explicit as part of the temporary objects contract? In the end they exist > in different layers. Permanent objects & databases in a catalog (in case of > hive megastore) whereas temporary objects in flink sessions. That's also > true for the original hive client. The temporary objects live in the hive > client whereas databases are created in the metastore. > > Ad.2 > I'm open for suggestions here. The one thing I wanted to achieve here is so > that we do not change the contract of ObjectIdentifier. One important thing > to remember here is that we need the function identifier to be part of the > FunctionDefinition object and not only as the result of the function > lookup. At some point we want to be able to store QueryOperations in the > catalogs. They can contain function calls within which we need to have the > identifier. > > I agree my initial suggestion is over complicated. How about we have just > the FunctionIdentifier as top level class without making the > ObjectIdentifier extend from it? I think it's pretty much the same what you > suggested. The only difference is that it would be a top level class with a > more descriptive name. > > > On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: > > > Sorry, I missed some parts of the solution. The complete alternative is > the > > following, basically having separate APIs in FunctionLookup for ambiguous > > and precise function lookup since planner is able to tell which API to > call > > with parsed queries, and have a unified result: > > > > ``` > > class FunctionLookup { > > > > Optional<Result> lookupAmbiguousFunction(String name); > > > > > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > > > > > > class Result { > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > String getName() { ... } > > // ... > > } > > > > } > > ``` > > > > Thanks, > > Bowen > > > > > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> wrote: > > > > > Hi Dawid, > > > > > > Re 1): I agree making it easy for users to run experiments is > important. > > > However, I'm not sure allowing users to register temp functions in > > > nonexistent catalog/db is the optimal way. It seems a bit hacky, and > > breaks > > > the current contract between Flink and users that catalog/db must be > > valid > > > in order to operate on. > > > > > > How about we instead focus on making it convenient to create catalogs? > > > Users actually can already do it with ease via program or SQL CLI yaml > > file > > > for an in-memory catalog which has neither extra dependency nor > external > > > connections. What we can further improve is DDL for catalogs, and I > > raised > > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by > Terry > > > now. > > > > > > In that case, if users would like to experiment via SQL, they can > easily > > > create an in memory catalog/database using DDL, then play with temp > > > functions. > > > > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not > been > > > resolved when stack call reaches FunctionCatalog#lookupFunction(), but > > only > > > been parsed? > > > > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with the > > > suggested classes, though making ObjectIdentifier a subclass of > > > FunctionIdentifier seem a bit counter intuitive. > > > > > > Another potentially simpler way is: > > > > > > ``` > > > // in class FunctionLookup > > > class Result { > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > String getName() { ... } > > > ... > > > } > > > ``` > > > > > > WDYT? > > > > > > > > > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > > > [hidden email]> wrote: > > > > > >> Hi, > > >> I really like the flip and think it clarifies important aspects of the > > >> system. > > >> > > >> I have two, I hope small suggestions, which will not take much time to > > >> agree on. > > >> > > >> 1. Could we follow the MySQL approach in regards to the existence of > > >> cat/db > > >> for temporary functions? That means not to check it, so e.g. it's > > possible > > >> to create a temporary function in a database that does not exist. I > > think > > >> it's really useful e.g in cases when user wants to perform experiments > > but > > >> does not have access to the db yet or temporarily does not have > > connection > > >> to a catalog. > > >> 2. Could we not change the ObjectIdentifier? Could we not loosen the > > >> requirements for all catalog objects such as tables, views, types just > > for > > >> the functions? It's really important later on from e.g the > > serializability > > >> perspective. The important aspect of the ObjectIdentifier is that we > > know > > >> that it has been resolved. The suggested changes break that > assumption. > > >> > > >> What do you think about adding an interface FunctionIdentifier { > > >> > > >> String getName(); > > >> > > >> /** > > >> Return 3-part identifier. Empty in case of a built-in function. > > >> */ > > >> Optional<ObjectIdentifier> getObjectIdentifier() > > >> } > > >> > > >> class ObjectIdentifier implements FunctionIdentifier { > > >> Optional<ObjectIdentifier> getObjectIdentifier() { > > >> return Optional.of(this); > > >> } > > >> } > > >> > > >> class SystemFunctionIdentifier implements FunctionIdentifier {...} > > >> > > >> WDYT? > > >> > > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > > >> > > >> > +1. LGTM > > >> > > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> > > wrote: > > >> > > > >> > > +1 > > >> > > > > >> > > Best, > > >> > > Terry Wang > > >> > > > > >> > > > > >> > > > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > >> > > > > > >> > > > +1 > > >> > > > > > >> > > > Best, > > >> > > > Kurt > > >> > > > > > >> > > > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> > > >> wrote: > > >> > > > > > >> > > >> Hi all, > > >> > > >> > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've > > >> reached > > >> > > >> consensus in [2]. > > >> > > >> > > >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, > Sep > > >> 26. > > >> > > >> > > >> > > >> Thanks, > > >> > > >> Bowen > > >> > > >> > > >> > > >> [1] > > >> > > >> > > >> > > >> > > >> > > > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > >> > > >> [2] > > >> > > >> > > >> > > >> > > >> > > > > >> > > > >> > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > >> > > >> > > >> > > > > >> > > > > >> > > > >> > -- > > >> > Xuefu Zhang > > >> > > > >> > "In Honey We Trust!" > > >> > > > >> > > > > > > |
I'm not sure how much benefit #1 can bring us. If users just want to try
out temporary functions, they can create temporary system functions which don't require a catalog/DB. IIUC, the main reason why we allow temporary catalog function is to let users override permanent catalog functions. Therefore a temporary function in a non-existing catalog won't serve that purpose. Besides, each session is provided with a default catalog and DB. So even if users simply want to create some catalog functions they can forget about after the session, wouldn't the default catalog/DB be enough for such experiments? On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote: > Re 1) As described in the FLIP, a temp function lookup will first make sure > the db exists. If the db doesn't exist, a lazy drop is triggered to remove > that temp function. > > I agree Hive doesn't handle it consistently, and we are not copying Hive. > > IMHO, allowing registering temp functions in nonexistent catalog/db is > hacky and problematic. For instance, "SHOW FUNCTIONS" would list system > functions and functions in the current catalog/db, since users cannot > designate a nonexistent catalog/db as current ones, how can they list > functions in nonexistent catalog/db? They may end up never knowing what > temp functions they've created unless trying out with queries or we > introducing some more nonstandard SQL statements. The same applies to other > temp objects like temp tables. > > Re 2) A standalone FunctionIdentifier sounds good to me > > On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > [hidden email]> > wrote: > > > Ad. 1 > > I wouldn't say it is hacky. > > Moreover, how do you want ensure that the dB always exists when a > temporary > > object is used?( in this particular case function). Do you want to query > > for the database existence whenever e.g a temporary function is used? I > > think important aspect here is that the database can be dropped from > > external system, not just flink or a different flink session. > > > > E.g in case of hive, you cannot create a temporary table in a database > that > > does not exist, that's true. But if you create a temporary table in a > > database and drop that database from a different session, you can still > > query the previously created temporary table from the original session. > It > > does not sound like a consistent behaviour to me. Why don't we make this > > behaviour of not binding a temporary objects to the lifetime of a > database > > explicit as part of the temporary objects contract? In the end they exist > > in different layers. Permanent objects & databases in a catalog (in case > of > > hive megastore) whereas temporary objects in flink sessions. That's also > > true for the original hive client. The temporary objects live in the hive > > client whereas databases are created in the metastore. > > > > Ad.2 > > I'm open for suggestions here. The one thing I wanted to achieve here is > so > > that we do not change the contract of ObjectIdentifier. One important > thing > > to remember here is that we need the function identifier to be part of > the > > FunctionDefinition object and not only as the result of the function > > lookup. At some point we want to be able to store QueryOperations in the > > catalogs. They can contain function calls within which we need to have > the > > identifier. > > > > I agree my initial suggestion is over complicated. How about we have just > > the FunctionIdentifier as top level class without making the > > ObjectIdentifier extend from it? I think it's pretty much the same what > you > > suggested. The only difference is that it would be a top level class > with a > > more descriptive name. > > > > > > On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: > > > > > Sorry, I missed some parts of the solution. The complete alternative is > > the > > > following, basically having separate APIs in FunctionLookup for > ambiguous > > > and precise function lookup since planner is able to tell which API to > > call > > > with parsed queries, and have a unified result: > > > > > > ``` > > > class FunctionLookup { > > > > > > Optional<Result> lookupAmbiguousFunction(String name); > > > > > > > > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > > > > > > > > > class Result { > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > String getName() { ... } > > > // ... > > > } > > > > > > } > > > ``` > > > > > > Thanks, > > > Bowen > > > > > > > > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> wrote: > > > > > > > Hi Dawid, > > > > > > > > Re 1): I agree making it easy for users to run experiments is > > important. > > > > However, I'm not sure allowing users to register temp functions in > > > > nonexistent catalog/db is the optimal way. It seems a bit hacky, and > > > breaks > > > > the current contract between Flink and users that catalog/db must be > > > valid > > > > in order to operate on. > > > > > > > > How about we instead focus on making it convenient to create > catalogs? > > > > Users actually can already do it with ease via program or SQL CLI > yaml > > > file > > > > for an in-memory catalog which has neither extra dependency nor > > external > > > > connections. What we can further improve is DDL for catalogs, and I > > > raised > > > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by > > Terry > > > > now. > > > > > > > > In that case, if users would like to experiment via SQL, they can > > easily > > > > create an in memory catalog/database using DDL, then play with temp > > > > functions. > > > > > > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not > > been > > > > resolved when stack call reaches FunctionCatalog#lookupFunction(), > but > > > only > > > > been parsed? > > > > > > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with the > > > > suggested classes, though making ObjectIdentifier a subclass of > > > > FunctionIdentifier seem a bit counter intuitive. > > > > > > > > Another potentially simpler way is: > > > > > > > > ``` > > > > // in class FunctionLookup > > > > class Result { > > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > > String getName() { ... } > > > > ... > > > > } > > > > ``` > > > > > > > > WDYT? > > > > > > > > > > > > > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > > > > [hidden email]> wrote: > > > > > > > >> Hi, > > > >> I really like the flip and think it clarifies important aspects of > the > > > >> system. > > > >> > > > >> I have two, I hope small suggestions, which will not take much time > to > > > >> agree on. > > > >> > > > >> 1. Could we follow the MySQL approach in regards to the existence of > > > >> cat/db > > > >> for temporary functions? That means not to check it, so e.g. it's > > > possible > > > >> to create a temporary function in a database that does not exist. I > > > think > > > >> it's really useful e.g in cases when user wants to perform > experiments > > > but > > > >> does not have access to the db yet or temporarily does not have > > > connection > > > >> to a catalog. > > > >> 2. Could we not change the ObjectIdentifier? Could we not loosen the > > > >> requirements for all catalog objects such as tables, views, types > just > > > for > > > >> the functions? It's really important later on from e.g the > > > serializability > > > >> perspective. The important aspect of the ObjectIdentifier is that we > > > know > > > >> that it has been resolved. The suggested changes break that > > assumption. > > > >> > > > >> What do you think about adding an interface FunctionIdentifier { > > > >> > > > >> String getName(); > > > >> > > > >> /** > > > >> Return 3-part identifier. Empty in case of a built-in function. > > > >> */ > > > >> Optional<ObjectIdentifier> getObjectIdentifier() > > > >> } > > > >> > > > >> class ObjectIdentifier implements FunctionIdentifier { > > > >> Optional<ObjectIdentifier> getObjectIdentifier() { > > > >> return Optional.of(this); > > > >> } > > > >> } > > > >> > > > >> class SystemFunctionIdentifier implements FunctionIdentifier {...} > > > >> > > > >> WDYT? > > > >> > > > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > > > >> > > > >> > +1. LGTM > > > >> > > > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> > > > wrote: > > > >> > > > > >> > > +1 > > > >> > > > > > >> > > Best, > > > >> > > Terry Wang > > > >> > > > > > >> > > > > > >> > > > > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > > >> > > > > > > >> > > > +1 > > > >> > > > > > > >> > > > Best, > > > >> > > > Kurt > > > >> > > > > > > >> > > > > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email] > > > > > >> wrote: > > > >> > > > > > > >> > > >> Hi all, > > > >> > > >> > > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which > we've > > > >> reached > > > >> > > >> consensus in [2]. > > > >> > > >> > > > >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, > > Sep > > > >> 26. > > > >> > > >> > > > >> > > >> Thanks, > > > >> > > >> Bowen > > > >> > > >> > > > >> > > >> [1] > > > >> > > >> > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > > >> > > >> [2] > > > >> > > >> > > > >> > > >> > > > >> > > > > > >> > > > > >> > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > > >> > > >> > > > >> > > > > > >> > > > > > >> > > > > >> > -- > > > >> > Xuefu Zhang > > > >> > > > > >> > "In Honey We Trust!" > > > >> > > > > >> > > > > > > > > > > -- Best regards! Rui Li |
@Dawid, do you have any other concerns? If not, I hope we can close the
voting. On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote: > I'm not sure how much benefit #1 can bring us. If users just want to try > out temporary functions, they can create temporary system functions which > don't require a catalog/DB. IIUC, the main reason why we allow temporary > catalog function is to let users override permanent catalog functions. > Therefore a temporary function in a non-existing catalog won't serve that > purpose. Besides, each session is provided with a default catalog and DB. > So even if users simply want to create some catalog functions they can > forget about after the session, wouldn't the default catalog/DB be enough > for such experiments? > > On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote: > > > Re 1) As described in the FLIP, a temp function lookup will first make > sure > > the db exists. If the db doesn't exist, a lazy drop is triggered to > remove > > that temp function. > > > > I agree Hive doesn't handle it consistently, and we are not copying Hive. > > > > IMHO, allowing registering temp functions in nonexistent catalog/db is > > hacky and problematic. For instance, "SHOW FUNCTIONS" would list system > > functions and functions in the current catalog/db, since users cannot > > designate a nonexistent catalog/db as current ones, how can they list > > functions in nonexistent catalog/db? They may end up never knowing what > > temp functions they've created unless trying out with queries or we > > introducing some more nonstandard SQL statements. The same applies to > other > > temp objects like temp tables. > > > > Re 2) A standalone FunctionIdentifier sounds good to me > > > > On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > > [hidden email]> > > wrote: > > > > > Ad. 1 > > > I wouldn't say it is hacky. > > > Moreover, how do you want ensure that the dB always exists when a > > temporary > > > object is used?( in this particular case function). Do you want to > query > > > for the database existence whenever e.g a temporary function is used? I > > > think important aspect here is that the database can be dropped from > > > external system, not just flink or a different flink session. > > > > > > E.g in case of hive, you cannot create a temporary table in a database > > that > > > does not exist, that's true. But if you create a temporary table in a > > > database and drop that database from a different session, you can still > > > query the previously created temporary table from the original session. > > It > > > does not sound like a consistent behaviour to me. Why don't we make > this > > > behaviour of not binding a temporary objects to the lifetime of a > > database > > > explicit as part of the temporary objects contract? In the end they > exist > > > in different layers. Permanent objects & databases in a catalog (in > case > > of > > > hive megastore) whereas temporary objects in flink sessions. That's > also > > > true for the original hive client. The temporary objects live in the > hive > > > client whereas databases are created in the metastore. > > > > > > Ad.2 > > > I'm open for suggestions here. The one thing I wanted to achieve here > is > > so > > > that we do not change the contract of ObjectIdentifier. One important > > thing > > > to remember here is that we need the function identifier to be part of > > the > > > FunctionDefinition object and not only as the result of the function > > > lookup. At some point we want to be able to store QueryOperations in > the > > > catalogs. They can contain function calls within which we need to have > > the > > > identifier. > > > > > > I agree my initial suggestion is over complicated. How about we have > just > > > the FunctionIdentifier as top level class without making the > > > ObjectIdentifier extend from it? I think it's pretty much the same what > > you > > > suggested. The only difference is that it would be a top level class > > with a > > > more descriptive name. > > > > > > > > > On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: > > > > > > > Sorry, I missed some parts of the solution. The complete alternative > is > > > the > > > > following, basically having separate APIs in FunctionLookup for > > ambiguous > > > > and precise function lookup since planner is able to tell which API > to > > > call > > > > with parsed queries, and have a unified result: > > > > > > > > ``` > > > > class FunctionLookup { > > > > > > > > Optional<Result> lookupAmbiguousFunction(String name); > > > > > > > > > > > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > > > > > > > > > > > > class Result { > > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > > String getName() { ... } > > > > // ... > > > > } > > > > > > > > } > > > > ``` > > > > > > > > Thanks, > > > > Bowen > > > > > > > > > > > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> > wrote: > > > > > > > > > Hi Dawid, > > > > > > > > > > Re 1): I agree making it easy for users to run experiments is > > > important. > > > > > However, I'm not sure allowing users to register temp functions in > > > > > nonexistent catalog/db is the optimal way. It seems a bit hacky, > and > > > > breaks > > > > > the current contract between Flink and users that catalog/db must > be > > > > valid > > > > > in order to operate on. > > > > > > > > > > How about we instead focus on making it convenient to create > > catalogs? > > > > > Users actually can already do it with ease via program or SQL CLI > > yaml > > > > file > > > > > for an in-memory catalog which has neither extra dependency nor > > > external > > > > > connections. What we can further improve is DDL for catalogs, and I > > > > raised > > > > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by > > > Terry > > > > > now. > > > > > > > > > > In that case, if users would like to experiment via SQL, they can > > > easily > > > > > create an in memory catalog/database using DDL, then play with temp > > > > > functions. > > > > > > > > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not > > > been > > > > > resolved when stack call reaches FunctionCatalog#lookupFunction(), > > but > > > > only > > > > > been parsed? > > > > > > > > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with > the > > > > > suggested classes, though making ObjectIdentifier a subclass of > > > > > FunctionIdentifier seem a bit counter intuitive. > > > > > > > > > > Another potentially simpler way is: > > > > > > > > > > ``` > > > > > // in class FunctionLookup > > > > > class Result { > > > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > > > String getName() { ... } > > > > > ... > > > > > } > > > > > ``` > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > > > > > [hidden email]> wrote: > > > > > > > > > >> Hi, > > > > >> I really like the flip and think it clarifies important aspects of > > the > > > > >> system. > > > > >> > > > > >> I have two, I hope small suggestions, which will not take much > time > > to > > > > >> agree on. > > > > >> > > > > >> 1. Could we follow the MySQL approach in regards to the existence > of > > > > >> cat/db > > > > >> for temporary functions? That means not to check it, so e.g. it's > > > > possible > > > > >> to create a temporary function in a database that does not exist. > I > > > > think > > > > >> it's really useful e.g in cases when user wants to perform > > experiments > > > > but > > > > >> does not have access to the db yet or temporarily does not have > > > > connection > > > > >> to a catalog. > > > > >> 2. Could we not change the ObjectIdentifier? Could we not loosen > the > > > > >> requirements for all catalog objects such as tables, views, types > > just > > > > for > > > > >> the functions? It's really important later on from e.g the > > > > serializability > > > > >> perspective. The important aspect of the ObjectIdentifier is that > we > > > > know > > > > >> that it has been resolved. The suggested changes break that > > > assumption. > > > > >> > > > > >> What do you think about adding an interface FunctionIdentifier { > > > > >> > > > > >> String getName(); > > > > >> > > > > >> /** > > > > >> Return 3-part identifier. Empty in case of a built-in function. > > > > >> */ > > > > >> Optional<ObjectIdentifier> getObjectIdentifier() > > > > >> } > > > > >> > > > > >> class ObjectIdentifier implements FunctionIdentifier { > > > > >> Optional<ObjectIdentifier> getObjectIdentifier() { > > > > >> return Optional.of(this); > > > > >> } > > > > >> } > > > > >> > > > > >> class SystemFunctionIdentifier implements FunctionIdentifier {...} > > > > >> > > > > >> WDYT? > > > > >> > > > > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > > > > >> > > > > >> > +1. LGTM > > > > >> > > > > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> > > > > wrote: > > > > >> > > > > > >> > > +1 > > > > >> > > > > > > >> > > Best, > > > > >> > > Terry Wang > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > > > >> > > > > > > > >> > > > +1 > > > > >> > > > > > > > >> > > > Best, > > > > >> > > > Kurt > > > > >> > > > > > > > >> > > > > > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > [hidden email] > > > > > > > >> wrote: > > > > >> > > > > > > > >> > > >> Hi all, > > > > >> > > >> > > > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which > > we've > > > > >> reached > > > > >> > > >> consensus in [2]. > > > > >> > > >> > > > > >> > > >> This voting will be open for minimum 3 days till 6:30pm > UTC, > > > Sep > > > > >> 26. > > > > >> > > >> > > > > >> > > >> Thanks, > > > > >> > > >> Bowen > > > > >> > > >> > > > > >> > > >> [1] > > > > >> > > >> > > > > >> > > >> > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > > > >> > > >> [2] > > > > >> > > >> > > > > >> > > >> > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > > > >> > > >> > > > > >> > > > > > > >> > > > > > > >> > > > > > >> > -- > > > > >> > Xuefu Zhang > > > > >> > > > > > >> > "In Honey We Trust!" > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > -- > Best regards! > Rui Li > |
Hi all,
Sorry for the late reply. I think it might lead to confusing situations if temporary functions (or any temporary db objects for that matter) are bound to the life cycle of an (external) db/catalog. Imaging a situation where you create a temp function in a db in an external catalog and use it but at some point it does not work anymore because some other dropped the database from the external catalog. Shouldn't temporary objects be only controlled by the owner of a session? I agree that creating temp objects in non-existing db/catalogs sounds a bit strange, but IMO the opposite (the db/catalog must exist for a temp function to be created/exist) can have significant implications like the one I described. I think it would be quite easy for users to understand that temporary objects are solely owned by them (and their session). The problem of listing temporary objects could be solved by adding a ALL [TEMPORARY] clause: SHOW ALL FUNCTIONS; could show all functions regardless of the catalog/database including temporary functions. SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions regardless of the catalog/database. Best, Fabian Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <[hidden email]>: > @Dawid, do you have any other concerns? If not, I hope we can close the > voting. > > > On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote: > > > I'm not sure how much benefit #1 can bring us. If users just want to try > > out temporary functions, they can create temporary system functions which > > don't require a catalog/DB. IIUC, the main reason why we allow temporary > > catalog function is to let users override permanent catalog functions. > > Therefore a temporary function in a non-existing catalog won't serve that > > purpose. Besides, each session is provided with a default catalog and DB. > > So even if users simply want to create some catalog functions they can > > forget about after the session, wouldn't the default catalog/DB be enough > > for such experiments? > > > > On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote: > > > > > Re 1) As described in the FLIP, a temp function lookup will first make > > sure > > > the db exists. If the db doesn't exist, a lazy drop is triggered to > > remove > > > that temp function. > > > > > > I agree Hive doesn't handle it consistently, and we are not copying > Hive. > > > > > > IMHO, allowing registering temp functions in nonexistent catalog/db is > > > hacky and problematic. For instance, "SHOW FUNCTIONS" would list system > > > functions and functions in the current catalog/db, since users cannot > > > designate a nonexistent catalog/db as current ones, how can they list > > > functions in nonexistent catalog/db? They may end up never knowing what > > > temp functions they've created unless trying out with queries or we > > > introducing some more nonstandard SQL statements. The same applies to > > other > > > temp objects like temp tables. > > > > > > Re 2) A standalone FunctionIdentifier sounds good to me > > > > > > On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > > > [hidden email]> > > > wrote: > > > > > > > Ad. 1 > > > > I wouldn't say it is hacky. > > > > Moreover, how do you want ensure that the dB always exists when a > > > temporary > > > > object is used?( in this particular case function). Do you want to > > query > > > > for the database existence whenever e.g a temporary function is > used? I > > > > think important aspect here is that the database can be dropped from > > > > external system, not just flink or a different flink session. > > > > > > > > E.g in case of hive, you cannot create a temporary table in a > database > > > that > > > > does not exist, that's true. But if you create a temporary table in a > > > > database and drop that database from a different session, you can > still > > > > query the previously created temporary table from the original > session. > > > It > > > > does not sound like a consistent behaviour to me. Why don't we make > > this > > > > behaviour of not binding a temporary objects to the lifetime of a > > > database > > > > explicit as part of the temporary objects contract? In the end they > > exist > > > > in different layers. Permanent objects & databases in a catalog (in > > case > > > of > > > > hive megastore) whereas temporary objects in flink sessions. That's > > also > > > > true for the original hive client. The temporary objects live in the > > hive > > > > client whereas databases are created in the metastore. > > > > > > > > Ad.2 > > > > I'm open for suggestions here. The one thing I wanted to achieve here > > is > > > so > > > > that we do not change the contract of ObjectIdentifier. One important > > > thing > > > > to remember here is that we need the function identifier to be part > of > > > the > > > > FunctionDefinition object and not only as the result of the function > > > > lookup. At some point we want to be able to store QueryOperations in > > the > > > > catalogs. They can contain function calls within which we need to > have > > > the > > > > identifier. > > > > > > > > I agree my initial suggestion is over complicated. How about we have > > just > > > > the FunctionIdentifier as top level class without making the > > > > ObjectIdentifier extend from it? I think it's pretty much the same > what > > > you > > > > suggested. The only difference is that it would be a top level class > > > with a > > > > more descriptive name. > > > > > > > > > > > > On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: > > > > > > > > > Sorry, I missed some parts of the solution. The complete > alternative > > is > > > > the > > > > > following, basically having separate APIs in FunctionLookup for > > > ambiguous > > > > > and precise function lookup since planner is able to tell which API > > to > > > > call > > > > > with parsed queries, and have a unified result: > > > > > > > > > > ``` > > > > > class FunctionLookup { > > > > > > > > > > Optional<Result> lookupAmbiguousFunction(String name); > > > > > > > > > > > > > > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > > > > > > > > > > > > > > > class Result { > > > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > > > String getName() { ... } > > > > > // ... > > > > > } > > > > > > > > > > } > > > > > ``` > > > > > > > > > > Thanks, > > > > > Bowen > > > > > > > > > > > > > > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> > > wrote: > > > > > > > > > > > Hi Dawid, > > > > > > > > > > > > Re 1): I agree making it easy for users to run experiments is > > > > important. > > > > > > However, I'm not sure allowing users to register temp functions > in > > > > > > nonexistent catalog/db is the optimal way. It seems a bit hacky, > > and > > > > > breaks > > > > > > the current contract between Flink and users that catalog/db must > > be > > > > > valid > > > > > > in order to operate on. > > > > > > > > > > > > How about we instead focus on making it convenient to create > > > catalogs? > > > > > > Users actually can already do it with ease via program or SQL CLI > > > yaml > > > > > file > > > > > > for an in-memory catalog which has neither extra dependency nor > > > > external > > > > > > connections. What we can further improve is DDL for catalogs, > and I > > > > > raised > > > > > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven > by > > > > Terry > > > > > > now. > > > > > > > > > > > > In that case, if users would like to experiment via SQL, they can > > > > easily > > > > > > create an in memory catalog/database using DDL, then play with > temp > > > > > > functions. > > > > > > > > > > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has > not > > > > been > > > > > > resolved when stack call reaches > FunctionCatalog#lookupFunction(), > > > but > > > > > only > > > > > > been parsed? > > > > > > > > > > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with > > the > > > > > > suggested classes, though making ObjectIdentifier a subclass of > > > > > > FunctionIdentifier seem a bit counter intuitive. > > > > > > > > > > > > Another potentially simpler way is: > > > > > > > > > > > > ``` > > > > > > // in class FunctionLookup > > > > > > class Result { > > > > > > Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > > > > > String getName() { ... } > > > > > > ... > > > > > > } > > > > > > ``` > > > > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > > > > > > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > > > > > > [hidden email]> wrote: > > > > > > > > > > > >> Hi, > > > > > >> I really like the flip and think it clarifies important aspects > of > > > the > > > > > >> system. > > > > > >> > > > > > >> I have two, I hope small suggestions, which will not take much > > time > > > to > > > > > >> agree on. > > > > > >> > > > > > >> 1. Could we follow the MySQL approach in regards to the > existence > > of > > > > > >> cat/db > > > > > >> for temporary functions? That means not to check it, so e.g. > it's > > > > > possible > > > > > >> to create a temporary function in a database that does not > exist. > > I > > > > > think > > > > > >> it's really useful e.g in cases when user wants to perform > > > experiments > > > > > but > > > > > >> does not have access to the db yet or temporarily does not have > > > > > connection > > > > > >> to a catalog. > > > > > >> 2. Could we not change the ObjectIdentifier? Could we not loosen > > the > > > > > >> requirements for all catalog objects such as tables, views, > types > > > just > > > > > for > > > > > >> the functions? It's really important later on from e.g the > > > > > serializability > > > > > >> perspective. The important aspect of the ObjectIdentifier is > that > > we > > > > > know > > > > > >> that it has been resolved. The suggested changes break that > > > > assumption. > > > > > >> > > > > > >> What do you think about adding an interface FunctionIdentifier { > > > > > >> > > > > > >> String getName(); > > > > > >> > > > > > >> /** > > > > > >> Return 3-part identifier. Empty in case of a built-in > function. > > > > > >> */ > > > > > >> Optional<ObjectIdentifier> getObjectIdentifier() > > > > > >> } > > > > > >> > > > > > >> class ObjectIdentifier implements FunctionIdentifier { > > > > > >> Optional<ObjectIdentifier> getObjectIdentifier() { > > > > > >> return Optional.of(this); > > > > > >> } > > > > > >> } > > > > > >> > > > > > >> class SystemFunctionIdentifier implements FunctionIdentifier > {...} > > > > > >> > > > > > >> WDYT? > > > > > >> > > > > > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > > > > > >> > > > > > >> > +1. LGTM > > > > > >> > > > > > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > [hidden email]> > > > > > wrote: > > > > > >> > > > > > > >> > > +1 > > > > > >> > > > > > > > >> > > Best, > > > > > >> > > Terry Wang > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > > > > >> > > > > > > > > >> > > > +1 > > > > > >> > > > > > > > > >> > > > Best, > > > > > >> > > > Kurt > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > > [hidden email] > > > > > > > > > >> wrote: > > > > > >> > > > > > > > > >> > > >> Hi all, > > > > > >> > > >> > > > > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which > > > we've > > > > > >> reached > > > > > >> > > >> consensus in [2]. > > > > > >> > > >> > > > > > >> > > >> This voting will be open for minimum 3 days till 6:30pm > > UTC, > > > > Sep > > > > > >> 26. > > > > > >> > > >> > > > > > >> > > >> Thanks, > > > > > >> > > >> Bowen > > > > > >> > > >> > > > > > >> > > >> [1] > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > > > > >> > > >> [2] > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > >> > -- > > > > > >> > Xuefu Zhang > > > > > >> > > > > > > >> > "In Honey We Trust!" > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > -- > > Best regards! > > Rui Li > > > |
Hi all,
I support Fabian's arguments. In my opinion, temporary objects should just be an additional layer on top of the regular catalog/database lookup logic. Thus, a temporary table or function has always highest precedence and should be stable within the local session. Otherwise it could magically disappear while someone else is performing modifications in the catalog. Furthermore, this feature is very useful for prototyping as users can simply express that a catalog/database is present even through they might not have access to it currently. Regards, Timo On 30.09.19 14:57, Fabian Hueske wrote: > Hi all, > > Sorry for the late reply. > > I think it might lead to confusing situations if temporary functions (or > any temporary db objects for that matter) are bound to the life cycle of an > (external) db/catalog. > Imaging a situation where you create a temp function in a db in an external > catalog and use it but at some point it does not work anymore because some > other dropped the database from the external catalog. > Shouldn't temporary objects be only controlled by the owner of a session? > > I agree that creating temp objects in non-existing db/catalogs sounds a bit > strange, but IMO the opposite (the db/catalog must exist for a temp > function to be created/exist) can have significant implications like the > one I described. > I think it would be quite easy for users to understand that temporary > objects are solely owned by them (and their session). > The problem of listing temporary objects could be solved by adding a ALL > [TEMPORARY] clause: > > SHOW ALL FUNCTIONS; could show all functions regardless of the > catalog/database including temporary functions. > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions regardless > of the catalog/database. > > Best, > Fabian > > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <[hidden email]>: > >> @Dawid, do you have any other concerns? If not, I hope we can close the >> voting. >> >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote: >> >>> I'm not sure how much benefit #1 can bring us. If users just want to try >>> out temporary functions, they can create temporary system functions which >>> don't require a catalog/DB. IIUC, the main reason why we allow temporary >>> catalog function is to let users override permanent catalog functions. >>> Therefore a temporary function in a non-existing catalog won't serve that >>> purpose. Besides, each session is provided with a default catalog and DB. >>> So even if users simply want to create some catalog functions they can >>> forget about after the session, wouldn't the default catalog/DB be enough >>> for such experiments? >>> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote: >>> >>>> Re 1) As described in the FLIP, a temp function lookup will first make >>> sure >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to >>> remove >>>> that temp function. >>>> >>>> I agree Hive doesn't handle it consistently, and we are not copying >> Hive. >>>> IMHO, allowing registering temp functions in nonexistent catalog/db is >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list system >>>> functions and functions in the current catalog/db, since users cannot >>>> designate a nonexistent catalog/db as current ones, how can they list >>>> functions in nonexistent catalog/db? They may end up never knowing what >>>> temp functions they've created unless trying out with queries or we >>>> introducing some more nonstandard SQL statements. The same applies to >>> other >>>> temp objects like temp tables. >>>> >>>> Re 2) A standalone FunctionIdentifier sounds good to me >>>> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < >>>> [hidden email]> >>>> wrote: >>>> >>>>> Ad. 1 >>>>> I wouldn't say it is hacky. >>>>> Moreover, how do you want ensure that the dB always exists when a >>>> temporary >>>>> object is used?( in this particular case function). Do you want to >>> query >>>>> for the database existence whenever e.g a temporary function is >> used? I >>>>> think important aspect here is that the database can be dropped from >>>>> external system, not just flink or a different flink session. >>>>> >>>>> E.g in case of hive, you cannot create a temporary table in a >> database >>>> that >>>>> does not exist, that's true. But if you create a temporary table in a >>>>> database and drop that database from a different session, you can >> still >>>>> query the previously created temporary table from the original >> session. >>>> It >>>>> does not sound like a consistent behaviour to me. Why don't we make >>> this >>>>> behaviour of not binding a temporary objects to the lifetime of a >>>> database >>>>> explicit as part of the temporary objects contract? In the end they >>> exist >>>>> in different layers. Permanent objects & databases in a catalog (in >>> case >>>> of >>>>> hive megastore) whereas temporary objects in flink sessions. That's >>> also >>>>> true for the original hive client. The temporary objects live in the >>> hive >>>>> client whereas databases are created in the metastore. >>>>> >>>>> Ad.2 >>>>> I'm open for suggestions here. The one thing I wanted to achieve here >>> is >>>> so >>>>> that we do not change the contract of ObjectIdentifier. One important >>>> thing >>>>> to remember here is that we need the function identifier to be part >> of >>>> the >>>>> FunctionDefinition object and not only as the result of the function >>>>> lookup. At some point we want to be able to store QueryOperations in >>> the >>>>> catalogs. They can contain function calls within which we need to >> have >>>> the >>>>> identifier. >>>>> >>>>> I agree my initial suggestion is over complicated. How about we have >>> just >>>>> the FunctionIdentifier as top level class without making the >>>>> ObjectIdentifier extend from it? I think it's pretty much the same >> what >>>> you >>>>> suggested. The only difference is that it would be a top level class >>>> with a >>>>> more descriptive name. >>>>> >>>>> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: >>>>> >>>>>> Sorry, I missed some parts of the solution. The complete >> alternative >>> is >>>>> the >>>>>> following, basically having separate APIs in FunctionLookup for >>>> ambiguous >>>>>> and precise function lookup since planner is able to tell which API >>> to >>>>> call >>>>>> with parsed queries, and have a unified result: >>>>>> >>>>>> ``` >>>>>> class FunctionLookup { >>>>>> >>>>>> Optional<Result> lookupAmbiguousFunction(String name); >>>>>> >>>>>> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); >>>>>> >>>>>> >>>>>> class Result { >>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } >>>>>> String getName() { ... } >>>>>> // ... >>>>>> } >>>>>> >>>>>> } >>>>>> ``` >>>>>> >>>>>> Thanks, >>>>>> Bowen >>>>>> >>>>>> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> >>> wrote: >>>>>>> Hi Dawid, >>>>>>> >>>>>>> Re 1): I agree making it easy for users to run experiments is >>>>> important. >>>>>>> However, I'm not sure allowing users to register temp functions >> in >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky, >>> and >>>>>> breaks >>>>>>> the current contract between Flink and users that catalog/db must >>> be >>>>>> valid >>>>>>> in order to operate on. >>>>>>> >>>>>>> How about we instead focus on making it convenient to create >>>> catalogs? >>>>>>> Users actually can already do it with ease via program or SQL CLI >>>> yaml >>>>>> file >>>>>>> for an in-memory catalog which has neither extra dependency nor >>>>> external >>>>>>> connections. What we can further improve is DDL for catalogs, >> and I >>>>>> raised >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven >> by >>>>> Terry >>>>>>> now. >>>>>>> >>>>>>> In that case, if users would like to experiment via SQL, they can >>>>> easily >>>>>>> create an in memory catalog/database using DDL, then play with >> temp >>>>>>> functions. >>>>>>> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has >> not >>>>> been >>>>>>> resolved when stack call reaches >> FunctionCatalog#lookupFunction(), >>>> but >>>>>> only >>>>>>> been parsed? >>>>>>> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok with >>> the >>>>>>> suggested classes, though making ObjectIdentifier a subclass of >>>>>>> FunctionIdentifier seem a bit counter intuitive. >>>>>>> >>>>>>> Another potentially simpler way is: >>>>>>> >>>>>>> ``` >>>>>>> // in class FunctionLookup >>>>>>> class Result { >>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } >>>>>>> String getName() { ... } >>>>>>> ... >>>>>>> } >>>>>>> ``` >>>>>>> >>>>>>> WDYT? >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < >>>>>>> [hidden email]> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> I really like the flip and think it clarifies important aspects >> of >>>> the >>>>>>>> system. >>>>>>>> >>>>>>>> I have two, I hope small suggestions, which will not take much >>> time >>>> to >>>>>>>> agree on. >>>>>>>> >>>>>>>> 1. Could we follow the MySQL approach in regards to the >> existence >>> of >>>>>>>> cat/db >>>>>>>> for temporary functions? That means not to check it, so e.g. >> it's >>>>>> possible >>>>>>>> to create a temporary function in a database that does not >> exist. >>> I >>>>>> think >>>>>>>> it's really useful e.g in cases when user wants to perform >>>> experiments >>>>>> but >>>>>>>> does not have access to the db yet or temporarily does not have >>>>>> connection >>>>>>>> to a catalog. >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not loosen >>> the >>>>>>>> requirements for all catalog objects such as tables, views, >> types >>>> just >>>>>> for >>>>>>>> the functions? It's really important later on from e.g the >>>>>> serializability >>>>>>>> perspective. The important aspect of the ObjectIdentifier is >> that >>> we >>>>>> know >>>>>>>> that it has been resolved. The suggested changes break that >>>>> assumption. >>>>>>>> What do you think about adding an interface FunctionIdentifier { >>>>>>>> >>>>>>>> String getName(); >>>>>>>> >>>>>>>> /** >>>>>>>> Return 3-part identifier. Empty in case of a built-in >> function. >>>>>>>> */ >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() >>>>>>>> } >>>>>>>> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier { >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >>>>>>>> return Optional.of(this); >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier >> {...} >>>>>>>> WDYT? >>>>>>>> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: >>>>>>>> >>>>>>>>> +1. LGTM >>>>>>>>> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < >> [hidden email]> >>>>>> wrote: >>>>>>>>>> +1 >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Terry Wang >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: >>>>>>>>>>> >>>>>>>>>>> +1 >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Kurt >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < >>> [hidden email] >>>>>>>> wrote: >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which >>>> we've >>>>>>>> reached >>>>>>>>>>>> consensus in [2]. >>>>>>>>>>>> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm >>> UTC, >>>>> Sep >>>>>>>> 26. >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Bowen >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> >>>>>>>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog >>>>>>>>>>>> [2] >>>>>>>>>>>> >>>>>>>>>>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Xuefu Zhang >>>>>>>>> >>>>>>>>> "In Honey We Trust!" >>>>>>>>> >>> >>> -- >>> Best regards! >>> Rui Li >>> |
Hi,
I think above are some valid points, and we can adopt the suggestions. To elaborate a bit on the new SQL syntax, it would imply that, unlike "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY] FUNCTIONS" would return functions' fully qualified names with catalog and db names. On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> wrote: > Hi all, > > I support Fabian's arguments. In my opinion, temporary objects should > just be an additional layer on top of the regular catalog/database > lookup logic. Thus, a temporary table or function has always highest > precedence and should be stable within the local session. Otherwise it > could magically disappear while someone else is performing modifications > in the catalog. > > Furthermore, this feature is very useful for prototyping as users can > simply express that a catalog/database is present even through they > might not have access to it currently. > > Regards, > Timo > > > On 30.09.19 14:57, Fabian Hueske wrote: > > Hi all, > > > > Sorry for the late reply. > > > > I think it might lead to confusing situations if temporary functions (or > > any temporary db objects for that matter) are bound to the life cycle of > an > > (external) db/catalog. > > Imaging a situation where you create a temp function in a db in an > external > > catalog and use it but at some point it does not work anymore because > some > > other dropped the database from the external catalog. > > Shouldn't temporary objects be only controlled by the owner of a session? > > > > I agree that creating temp objects in non-existing db/catalogs sounds a > bit > > strange, but IMO the opposite (the db/catalog must exist for a temp > > function to be created/exist) can have significant implications like the > > one I described. > > I think it would be quite easy for users to understand that temporary > > objects are solely owned by them (and their session). > > The problem of listing temporary objects could be solved by adding a ALL > > [TEMPORARY] clause: > > > > SHOW ALL FUNCTIONS; could show all functions regardless of the > > catalog/database including temporary functions. > > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions > regardless > > of the catalog/database. > > > > Best, > > Fabian > > > > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < > [hidden email]>: > > > >> @Dawid, do you have any other concerns? If not, I hope we can close the > >> voting. > >> > >> > >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote: > >> > >>> I'm not sure how much benefit #1 can bring us. If users just want to > try > >>> out temporary functions, they can create temporary system functions > which > >>> don't require a catalog/DB. IIUC, the main reason why we allow > temporary > >>> catalog function is to let users override permanent catalog functions. > >>> Therefore a temporary function in a non-existing catalog won't serve > that > >>> purpose. Besides, each session is provided with a default catalog and > DB. > >>> So even if users simply want to create some catalog functions they can > >>> forget about after the session, wouldn't the default catalog/DB be > enough > >>> for such experiments? > >>> > >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote: > >>> > >>>> Re 1) As described in the FLIP, a temp function lookup will first make > >>> sure > >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to > >>> remove > >>>> that temp function. > >>>> > >>>> I agree Hive doesn't handle it consistently, and we are not copying > >> Hive. > >>>> IMHO, allowing registering temp functions in nonexistent catalog/db is > >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list > system > >>>> functions and functions in the current catalog/db, since users cannot > >>>> designate a nonexistent catalog/db as current ones, how can they list > >>>> functions in nonexistent catalog/db? They may end up never knowing > what > >>>> temp functions they've created unless trying out with queries or we > >>>> introducing some more nonstandard SQL statements. The same applies to > >>> other > >>>> temp objects like temp tables. > >>>> > >>>> Re 2) A standalone FunctionIdentifier sounds good to me > >>>> > >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > >>>> [hidden email]> > >>>> wrote: > >>>> > >>>>> Ad. 1 > >>>>> I wouldn't say it is hacky. > >>>>> Moreover, how do you want ensure that the dB always exists when a > >>>> temporary > >>>>> object is used?( in this particular case function). Do you want to > >>> query > >>>>> for the database existence whenever e.g a temporary function is > >> used? I > >>>>> think important aspect here is that the database can be dropped from > >>>>> external system, not just flink or a different flink session. > >>>>> > >>>>> E.g in case of hive, you cannot create a temporary table in a > >> database > >>>> that > >>>>> does not exist, that's true. But if you create a temporary table in a > >>>>> database and drop that database from a different session, you can > >> still > >>>>> query the previously created temporary table from the original > >> session. > >>>> It > >>>>> does not sound like a consistent behaviour to me. Why don't we make > >>> this > >>>>> behaviour of not binding a temporary objects to the lifetime of a > >>>> database > >>>>> explicit as part of the temporary objects contract? In the end they > >>> exist > >>>>> in different layers. Permanent objects & databases in a catalog (in > >>> case > >>>> of > >>>>> hive megastore) whereas temporary objects in flink sessions. That's > >>> also > >>>>> true for the original hive client. The temporary objects live in the > >>> hive > >>>>> client whereas databases are created in the metastore. > >>>>> > >>>>> Ad.2 > >>>>> I'm open for suggestions here. The one thing I wanted to achieve here > >>> is > >>>> so > >>>>> that we do not change the contract of ObjectIdentifier. One important > >>>> thing > >>>>> to remember here is that we need the function identifier to be part > >> of > >>>> the > >>>>> FunctionDefinition object and not only as the result of the function > >>>>> lookup. At some point we want to be able to store QueryOperations in > >>> the > >>>>> catalogs. They can contain function calls within which we need to > >> have > >>>> the > >>>>> identifier. > >>>>> > >>>>> I agree my initial suggestion is over complicated. How about we have > >>> just > >>>>> the FunctionIdentifier as top level class without making the > >>>>> ObjectIdentifier extend from it? I think it's pretty much the same > >> what > >>>> you > >>>>> suggested. The only difference is that it would be a top level class > >>>> with a > >>>>> more descriptive name. > >>>>> > >>>>> > >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: > >>>>> > >>>>>> Sorry, I missed some parts of the solution. The complete > >> alternative > >>> is > >>>>> the > >>>>>> following, basically having separate APIs in FunctionLookup for > >>>> ambiguous > >>>>>> and precise function lookup since planner is able to tell which API > >>> to > >>>>> call > >>>>>> with parsed queries, and have a unified result: > >>>>>> > >>>>>> ``` > >>>>>> class FunctionLookup { > >>>>>> > >>>>>> Optional<Result> lookupAmbiguousFunction(String name); > >>>>>> > >>>>>> > >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > >>>>>> > >>>>>> > >>>>>> class Result { > >>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > >>>>>> String getName() { ... } > >>>>>> // ... > >>>>>> } > >>>>>> > >>>>>> } > >>>>>> ``` > >>>>>> > >>>>>> Thanks, > >>>>>> Bowen > >>>>>> > >>>>>> > >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> > >>> wrote: > >>>>>>> Hi Dawid, > >>>>>>> > >>>>>>> Re 1): I agree making it easy for users to run experiments is > >>>>> important. > >>>>>>> However, I'm not sure allowing users to register temp functions > >> in > >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky, > >>> and > >>>>>> breaks > >>>>>>> the current contract between Flink and users that catalog/db must > >>> be > >>>>>> valid > >>>>>>> in order to operate on. > >>>>>>> > >>>>>>> How about we instead focus on making it convenient to create > >>>> catalogs? > >>>>>>> Users actually can already do it with ease via program or SQL CLI > >>>> yaml > >>>>>> file > >>>>>>> for an in-memory catalog which has neither extra dependency nor > >>>>> external > >>>>>>> connections. What we can further improve is DDL for catalogs, > >> and I > >>>>>> raised > >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven > >> by > >>>>> Terry > >>>>>>> now. > >>>>>>> > >>>>>>> In that case, if users would like to experiment via SQL, they can > >>>>> easily > >>>>>>> create an in memory catalog/database using DDL, then play with > >> temp > >>>>>>> functions. > >>>>>>> > >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has > >> not > >>>>> been > >>>>>>> resolved when stack call reaches > >> FunctionCatalog#lookupFunction(), > >>>> but > >>>>>> only > >>>>>>> been parsed? > >>>>>>> > >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok with > >>> the > >>>>>>> suggested classes, though making ObjectIdentifier a subclass of > >>>>>>> FunctionIdentifier seem a bit counter intuitive. > >>>>>>> > >>>>>>> Another potentially simpler way is: > >>>>>>> > >>>>>>> ``` > >>>>>>> // in class FunctionLookup > >>>>>>> class Result { > >>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > >>>>>>> String getName() { ... } > >>>>>>> ... > >>>>>>> } > >>>>>>> ``` > >>>>>>> > >>>>>>> WDYT? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > >>>>>>> [hidden email]> wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> I really like the flip and think it clarifies important aspects > >> of > >>>> the > >>>>>>>> system. > >>>>>>>> > >>>>>>>> I have two, I hope small suggestions, which will not take much > >>> time > >>>> to > >>>>>>>> agree on. > >>>>>>>> > >>>>>>>> 1. Could we follow the MySQL approach in regards to the > >> existence > >>> of > >>>>>>>> cat/db > >>>>>>>> for temporary functions? That means not to check it, so e.g. > >> it's > >>>>>> possible > >>>>>>>> to create a temporary function in a database that does not > >> exist. > >>> I > >>>>>> think > >>>>>>>> it's really useful e.g in cases when user wants to perform > >>>> experiments > >>>>>> but > >>>>>>>> does not have access to the db yet or temporarily does not have > >>>>>> connection > >>>>>>>> to a catalog. > >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not loosen > >>> the > >>>>>>>> requirements for all catalog objects such as tables, views, > >> types > >>>> just > >>>>>> for > >>>>>>>> the functions? It's really important later on from e.g the > >>>>>> serializability > >>>>>>>> perspective. The important aspect of the ObjectIdentifier is > >> that > >>> we > >>>>>> know > >>>>>>>> that it has been resolved. The suggested changes break that > >>>>> assumption. > >>>>>>>> What do you think about adding an interface FunctionIdentifier { > >>>>>>>> > >>>>>>>> String getName(); > >>>>>>>> > >>>>>>>> /** > >>>>>>>> Return 3-part identifier. Empty in case of a built-in > >> function. > >>>>>>>> */ > >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() > >>>>>>>> } > >>>>>>>> > >>>>>>>> class ObjectIdentifier implements FunctionIdentifier { > >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > >>>>>>>> return Optional.of(this); > >>>>>>>> } > >>>>>>>> } > >>>>>>>> > >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier > >> {...} > >>>>>>>> WDYT? > >>>>>>>> > >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > >>>>>>>> > >>>>>>>>> +1. LGTM > >>>>>>>>> > >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > >> [hidden email]> > >>>>>> wrote: > >>>>>>>>>> +1 > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Terry Wang > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > >>>>>>>>>>> > >>>>>>>>>>> +1 > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Kurt > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > >>> [hidden email] > >>>>>>>> wrote: > >>>>>>>>>>>> Hi all, > >>>>>>>>>>>> > >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which > >>>> we've > >>>>>>>> reached > >>>>>>>>>>>> consensus in [2]. > >>>>>>>>>>>> > >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm > >>> UTC, > >>>>> Sep > >>>>>>>> 26. > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Bowen > >>>>>>>>>>>> > >>>>>>>>>>>> [1] > >>>>>>>>>>>> > >>>>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > >>>>>>>>>>>> [2] > >>>>>>>>>>>> > >>>>>>>>>>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > >>>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Xuefu Zhang > >>>>>>>>> > >>>>>>>>> "In Honey We Trust!" > >>>>>>>>> > >>> > >>> -- > >>> Best regards! > >>> Rui Li > >>> > > |
Hi all,
I've updated the FLIP wiki with the following changes: - Lifespan of temp functions are not tied to those of catalogs and databases. Users can create temp functions even though catalogs/dbs in their fully qualified names don't even exist. - some new SQL commands - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in functions, and names of temp and catalog functions in the current catalog and db - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built functions, and fully qualified names of temp and catalog functions in all catalogs and dbs - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp functions in all catalog and db - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system functions Let me know if you have any questions. Seems we have resolved all concerns. If there's no more ones, I'd like to close the vote by this time tomorrow. Cheers, Bowen On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote: > Hi, > > I think above are some valid points, and we can adopt the suggestions. > > To elaborate a bit on the new SQL syntax, it would imply that, unlike > "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY] > FUNCTIONS" would return functions' fully qualified names with catalog and > db names. > > > > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> wrote: > >> Hi all, >> >> I support Fabian's arguments. In my opinion, temporary objects should >> just be an additional layer on top of the regular catalog/database >> lookup logic. Thus, a temporary table or function has always highest >> precedence and should be stable within the local session. Otherwise it >> could magically disappear while someone else is performing modifications >> in the catalog. >> >> Furthermore, this feature is very useful for prototyping as users can >> simply express that a catalog/database is present even through they >> might not have access to it currently. >> >> Regards, >> Timo >> >> >> On 30.09.19 14:57, Fabian Hueske wrote: >> > Hi all, >> > >> > Sorry for the late reply. >> > >> > I think it might lead to confusing situations if temporary functions (or >> > any temporary db objects for that matter) are bound to the life cycle >> of an >> > (external) db/catalog. >> > Imaging a situation where you create a temp function in a db in an >> external >> > catalog and use it but at some point it does not work anymore because >> some >> > other dropped the database from the external catalog. >> > Shouldn't temporary objects be only controlled by the owner of a >> session? >> > >> > I agree that creating temp objects in non-existing db/catalogs sounds a >> bit >> > strange, but IMO the opposite (the db/catalog must exist for a temp >> > function to be created/exist) can have significant implications like the >> > one I described. >> > I think it would be quite easy for users to understand that temporary >> > objects are solely owned by them (and their session). >> > The problem of listing temporary objects could be solved by adding a ALL >> > [TEMPORARY] clause: >> > >> > SHOW ALL FUNCTIONS; could show all functions regardless of the >> > catalog/database including temporary functions. >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions >> regardless >> > of the catalog/database. >> > >> > Best, >> > Fabian >> > >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < >> [hidden email]>: >> > >> >> @Dawid, do you have any other concerns? If not, I hope we can close the >> >> voting. >> >> >> >> >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote: >> >> >> >>> I'm not sure how much benefit #1 can bring us. If users just want to >> try >> >>> out temporary functions, they can create temporary system functions >> which >> >>> don't require a catalog/DB. IIUC, the main reason why we allow >> temporary >> >>> catalog function is to let users override permanent catalog functions. >> >>> Therefore a temporary function in a non-existing catalog won't serve >> that >> >>> purpose. Besides, each session is provided with a default catalog and >> DB. >> >>> So even if users simply want to create some catalog functions they can >> >>> forget about after the session, wouldn't the default catalog/DB be >> enough >> >>> for such experiments? >> >>> >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote: >> >>> >> >>>> Re 1) As described in the FLIP, a temp function lookup will first >> make >> >>> sure >> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to >> >>> remove >> >>>> that temp function. >> >>>> >> >>>> I agree Hive doesn't handle it consistently, and we are not copying >> >> Hive. >> >>>> IMHO, allowing registering temp functions in nonexistent catalog/db >> is >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list >> system >> >>>> functions and functions in the current catalog/db, since users cannot >> >>>> designate a nonexistent catalog/db as current ones, how can they list >> >>>> functions in nonexistent catalog/db? They may end up never knowing >> what >> >>>> temp functions they've created unless trying out with queries or we >> >>>> introducing some more nonstandard SQL statements. The same applies to >> >>> other >> >>>> temp objects like temp tables. >> >>>> >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me >> >>>> >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < >> >>>> [hidden email]> >> >>>> wrote: >> >>>> >> >>>>> Ad. 1 >> >>>>> I wouldn't say it is hacky. >> >>>>> Moreover, how do you want ensure that the dB always exists when a >> >>>> temporary >> >>>>> object is used?( in this particular case function). Do you want to >> >>> query >> >>>>> for the database existence whenever e.g a temporary function is >> >> used? I >> >>>>> think important aspect here is that the database can be dropped from >> >>>>> external system, not just flink or a different flink session. >> >>>>> >> >>>>> E.g in case of hive, you cannot create a temporary table in a >> >> database >> >>>> that >> >>>>> does not exist, that's true. But if you create a temporary table in >> a >> >>>>> database and drop that database from a different session, you can >> >> still >> >>>>> query the previously created temporary table from the original >> >> session. >> >>>> It >> >>>>> does not sound like a consistent behaviour to me. Why don't we make >> >>> this >> >>>>> behaviour of not binding a temporary objects to the lifetime of a >> >>>> database >> >>>>> explicit as part of the temporary objects contract? In the end they >> >>> exist >> >>>>> in different layers. Permanent objects & databases in a catalog (in >> >>> case >> >>>> of >> >>>>> hive megastore) whereas temporary objects in flink sessions. That's >> >>> also >> >>>>> true for the original hive client. The temporary objects live in the >> >>> hive >> >>>>> client whereas databases are created in the metastore. >> >>>>> >> >>>>> Ad.2 >> >>>>> I'm open for suggestions here. The one thing I wanted to achieve >> here >> >>> is >> >>>> so >> >>>>> that we do not change the contract of ObjectIdentifier. One >> important >> >>>> thing >> >>>>> to remember here is that we need the function identifier to be part >> >> of >> >>>> the >> >>>>> FunctionDefinition object and not only as the result of the function >> >>>>> lookup. At some point we want to be able to store QueryOperations in >> >>> the >> >>>>> catalogs. They can contain function calls within which we need to >> >> have >> >>>> the >> >>>>> identifier. >> >>>>> >> >>>>> I agree my initial suggestion is over complicated. How about we have >> >>> just >> >>>>> the FunctionIdentifier as top level class without making the >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the same >> >> what >> >>>> you >> >>>>> suggested. The only difference is that it would be a top level class >> >>>> with a >> >>>>> more descriptive name. >> >>>>> >> >>>>> >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: >> >>>>> >> >>>>>> Sorry, I missed some parts of the solution. The complete >> >> alternative >> >>> is >> >>>>> the >> >>>>>> following, basically having separate APIs in FunctionLookup for >> >>>> ambiguous >> >>>>>> and precise function lookup since planner is able to tell which API >> >>> to >> >>>>> call >> >>>>>> with parsed queries, and have a unified result: >> >>>>>> >> >>>>>> ``` >> >>>>>> class FunctionLookup { >> >>>>>> >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name); >> >>>>>> >> >>>>>> >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); >> >>>>>> >> >>>>>> >> >>>>>> class Result { >> >>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } >> >>>>>> String getName() { ... } >> >>>>>> // ... >> >>>>>> } >> >>>>>> >> >>>>>> } >> >>>>>> ``` >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Bowen >> >>>>>> >> >>>>>> >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> >> >>> wrote: >> >>>>>>> Hi Dawid, >> >>>>>>> >> >>>>>>> Re 1): I agree making it easy for users to run experiments is >> >>>>> important. >> >>>>>>> However, I'm not sure allowing users to register temp functions >> >> in >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky, >> >>> and >> >>>>>> breaks >> >>>>>>> the current contract between Flink and users that catalog/db must >> >>> be >> >>>>>> valid >> >>>>>>> in order to operate on. >> >>>>>>> >> >>>>>>> How about we instead focus on making it convenient to create >> >>>> catalogs? >> >>>>>>> Users actually can already do it with ease via program or SQL CLI >> >>>> yaml >> >>>>>> file >> >>>>>>> for an in-memory catalog which has neither extra dependency nor >> >>>>> external >> >>>>>>> connections. What we can further improve is DDL for catalogs, >> >> and I >> >>>>>> raised >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven >> >> by >> >>>>> Terry >> >>>>>>> now. >> >>>>>>> >> >>>>>>> In that case, if users would like to experiment via SQL, they can >> >>>>> easily >> >>>>>>> create an in memory catalog/database using DDL, then play with >> >> temp >> >>>>>>> functions. >> >>>>>>> >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has >> >> not >> >>>>> been >> >>>>>>> resolved when stack call reaches >> >> FunctionCatalog#lookupFunction(), >> >>>> but >> >>>>>> only >> >>>>>>> been parsed? >> >>>>>>> >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok with >> >>> the >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass of >> >>>>>>> FunctionIdentifier seem a bit counter intuitive. >> >>>>>>> >> >>>>>>> Another potentially simpler way is: >> >>>>>>> >> >>>>>>> ``` >> >>>>>>> // in class FunctionLookup >> >>>>>>> class Result { >> >>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } >> >>>>>>> String getName() { ... } >> >>>>>>> ... >> >>>>>>> } >> >>>>>>> ``` >> >>>>>>> >> >>>>>>> WDYT? >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < >> >>>>>>> [hidden email]> wrote: >> >>>>>>> >> >>>>>>>> Hi, >> >>>>>>>> I really like the flip and think it clarifies important aspects >> >> of >> >>>> the >> >>>>>>>> system. >> >>>>>>>> >> >>>>>>>> I have two, I hope small suggestions, which will not take much >> >>> time >> >>>> to >> >>>>>>>> agree on. >> >>>>>>>> >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the >> >> existence >> >>> of >> >>>>>>>> cat/db >> >>>>>>>> for temporary functions? That means not to check it, so e.g. >> >> it's >> >>>>>> possible >> >>>>>>>> to create a temporary function in a database that does not >> >> exist. >> >>> I >> >>>>>> think >> >>>>>>>> it's really useful e.g in cases when user wants to perform >> >>>> experiments >> >>>>>> but >> >>>>>>>> does not have access to the db yet or temporarily does not have >> >>>>>> connection >> >>>>>>>> to a catalog. >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not loosen >> >>> the >> >>>>>>>> requirements for all catalog objects such as tables, views, >> >> types >> >>>> just >> >>>>>> for >> >>>>>>>> the functions? It's really important later on from e.g the >> >>>>>> serializability >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is >> >> that >> >>> we >> >>>>>> know >> >>>>>>>> that it has been resolved. The suggested changes break that >> >>>>> assumption. >> >>>>>>>> What do you think about adding an interface FunctionIdentifier { >> >>>>>>>> >> >>>>>>>> String getName(); >> >>>>>>>> >> >>>>>>>> /** >> >>>>>>>> Return 3-part identifier. Empty in case of a built-in >> >> function. >> >>>>>>>> */ >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() >> >>>>>>>> } >> >>>>>>>> >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier { >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >> >>>>>>>> return Optional.of(this); >> >>>>>>>> } >> >>>>>>>> } >> >>>>>>>> >> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier >> >> {...} >> >>>>>>>> WDYT? >> >>>>>>>> >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: >> >>>>>>>> >> >>>>>>>>> +1. LGTM >> >>>>>>>>> >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < >> >> [hidden email]> >> >>>>>> wrote: >> >>>>>>>>>> +1 >> >>>>>>>>>> >> >>>>>>>>>> Best, >> >>>>>>>>>> Terry Wang >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: >> >>>>>>>>>>> >> >>>>>>>>>>> +1 >> >>>>>>>>>>> >> >>>>>>>>>>> Best, >> >>>>>>>>>>> Kurt >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < >> >>> [hidden email] >> >>>>>>>> wrote: >> >>>>>>>>>>>> Hi all, >> >>>>>>>>>>>> >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which >> >>>> we've >> >>>>>>>> reached >> >>>>>>>>>>>> consensus in [2]. >> >>>>>>>>>>>> >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm >> >>> UTC, >> >>>>> Sep >> >>>>>>>> 26. >> >>>>>>>>>>>> Thanks, >> >>>>>>>>>>>> Bowen >> >>>>>>>>>>>> >> >>>>>>>>>>>> [1] >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog >> >>>>>>>>>>>> [2] >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 >> >>>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> Xuefu Zhang >> >>>>>>>>> >> >>>>>>>>> "In Honey We Trust!" >> >>>>>>>>> >> >>> >> >>> -- >> >>> Best regards! >> >>> Rui Li >> >>> >> >> |
Hi, I think we're really close to a very good design for the
functions. Thank you to all involved in the discussion and
especially Bowen for driving the FLIP. The last part I'm missing in the FLIP is a bit more elaborate description of the FunctionIdentifier. I gave it a bit more thought and discussed it with Timo and I think the purpose of the FunctionIdentifier is to unambigiously reference a function. That said it should be effectively either identifier of a built-in function or a catalog function. In order to do that I would suggest a slight change to the class: class FunctionIdentifier { public FunctionIdentifier ofBuiltinFunction(String name) { return new FunctionIdentifier(name, null); } public FunctionIdentifier ofCatalogFunction(ObjectIdentifier identifier){ return new FunctionIdentifier(null, identifier); } public Optional<String> getBuiltinFunctionIdentifier() { return Optional.ofNullable(name); } public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() { return Optional.ofNullable(identifier); } private FunctionIdentifier(String name, ObjectIdentifier identifier) { this.name = name; this.identifier = identifier } } Moreover as we are changing the way functions are referenced we need to update classes like UnresolvedCallExpression and CallExpression. As those are PublicEvolving classes we should cover those changes in the FLIP. Lastly, as those classes need a serializable representation, we need a serializable representation for FunctionIdentifier as well. Therefore we need to add a method to the FunctionIdentifier class: public String asSerializableString() {...} similar to the method in the ObjectIdentifier. Let me know what do you think. Best, Dawid On 30/09/2019 23:23, Bowen Li wrote:
Hi all, I've updated the FLIP wiki with the following changes: - Lifespan of temp functions are not tied to those of catalogs and databases. Users can create temp functions even though catalogs/dbs in their fully qualified names don't even exist. - some new SQL commands - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in functions, and names of temp and catalog functions in the current catalog and db - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built functions, and fully qualified names of temp and catalog functions in all catalogs and dbs - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp functions in all catalog and db - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system functions Let me know if you have any questions. Seems we have resolved all concerns. If there's no more ones, I'd like to close the vote by this time tomorrow. Cheers, Bowen On Mon, Sep 30, 2019 at 11:59 AM Bowen Li [hidden email] wrote:Hi, I think above are some valid points, and we can adopt the suggestions. To elaborate a bit on the new SQL syntax, it would imply that, unlike "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY] FUNCTIONS" would return functions' fully qualified names with catalog and db names. On Mon, Sep 30, 2019 at 6:38 AM Timo Walther [hidden email] wrote:Hi all, I support Fabian's arguments. In my opinion, temporary objects should just be an additional layer on top of the regular catalog/database lookup logic. Thus, a temporary table or function has always highest precedence and should be stable within the local session. Otherwise it could magically disappear while someone else is performing modifications in the catalog. Furthermore, this feature is very useful for prototyping as users can simply express that a catalog/database is present even through they might not have access to it currently. Regards, Timo On 30.09.19 14:57, Fabian Hueske wrote:Hi all, Sorry for the late reply. I think it might lead to confusing situations if temporary functions (or any temporary db objects for that matter) are bound to the life cycleof an(external) db/catalog. Imaging a situation where you create a temp function in a db in anexternalcatalog and use it but at some point it does not work anymore becausesomeother dropped the database from the external catalog. Shouldn't temporary objects be only controlled by the owner of asession?I agree that creating temp objects in non-existing db/catalogs sounds abitstrange, but IMO the opposite (the db/catalog must exist for a temp function to be created/exist) can have significant implications like the one I described. I think it would be quite easy for users to understand that temporary objects are solely owned by them (and their session). The problem of listing temporary objects could be solved by adding a ALL [TEMPORARY] clause: SHOW ALL FUNCTIONS; could show all functions regardless of the catalog/database including temporary functions. SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functionsregardlessof the catalog/database. Best, Fabian Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <[hidden email]>:@Dawid, do you have any other concerns? If not, I hope we can close the voting. On Thu, Sep 26, 2019 at 8:14 PM Rui Li [hidden email] wrote:I'm not sure how much benefit #1 can bring us. If users just want totryout temporary functions, they can create temporary system functionswhichdon't require a catalog/DB. IIUC, the main reason why we allowtemporarycatalog function is to let users override permanent catalog functions. Therefore a temporary function in a non-existing catalog won't servethatpurpose. Besides, each session is provided with a default catalog andDB.So even if users simply want to create some catalog functions they can forget about after the session, wouldn't the default catalog/DB beenoughfor such experiments? On Thu, Sep 26, 2019 at 4:38 AM Bowen Li [hidden email] wrote:Re 1) As described in the FLIP, a temp function lookup will firstmakesurethe db exists. If the db doesn't exist, a lazy drop is triggered toremovethat temp function. I agree Hive doesn't handle it consistently, and we are not copyingHive.IMHO, allowing registering temp functions in nonexistent catalog/dbishacky and problematic. For instance, "SHOW FUNCTIONS" would listsystemfunctions and functions in the current catalog/db, since users cannot designate a nonexistent catalog/db as current ones, how can they list functions in nonexistent catalog/db? They may end up never knowingwhattemp functions they've created unless trying out with queries or we introducing some more nonstandard SQL statements. The same applies toothertemp objects like temp tables. Re 2) A standalone FunctionIdentifier sounds good to me On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < [hidden email]> wrote:Ad. 1 I wouldn't say it is hacky. Moreover, how do you want ensure that the dB always exists when atemporaryobject is used?( in this particular case function). Do you want toqueryfor the database existence whenever e.g a temporary function isused? Ithink important aspect here is that the database can be dropped from external system, not just flink or a different flink session. E.g in case of hive, you cannot create a temporary table in adatabasethatdoes not exist, that's true. But if you create a temporary table inadatabase and drop that database from a different session, you canstillquery the previously created temporary table from the originalsession.Itdoes not sound like a consistent behaviour to me. Why don't we makethisbehaviour of not binding a temporary objects to the lifetime of adatabaseexplicit as part of the temporary objects contract? In the end theyexistin different layers. Permanent objects & databases in a catalog (incaseofhive megastore) whereas temporary objects in flink sessions. That'salsotrue for the original hive client. The temporary objects live in thehiveclient whereas databases are created in the metastore. Ad.2 I'm open for suggestions here. The one thing I wanted to achievehereissothat we do not change the contract of ObjectIdentifier. Oneimportantthingto remember here is that we need the function identifier to be partoftheFunctionDefinition object and not only as the result of the function lookup. At some point we want to be able to store QueryOperations inthecatalogs. They can contain function calls within which we need tohavetheidentifier. I agree my initial suggestion is over complicated. How about we havejustthe FunctionIdentifier as top level class without making the ObjectIdentifier extend from it? I think it's pretty much the samewhatyousuggested. The only difference is that it would be a top level classwith amore descriptive name. On Wed, 25 Sep 2019, 13:57 Bowen Li, [hidden email] wrote:Sorry, I missed some parts of the solution. The completealternativeisthefollowing, basically having separate APIs in FunctionLookup forambiguousand precise function lookup since planner is able to tell which APItocallwith parsed queries, and have a unified result: ``` class FunctionLookup { Optional<Result> lookupAmbiguousFunction(String name); Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); class Result { Optional<ObjectIdentifier> getObjectIdentifier() { ... } String getName() { ... } // ... } } ``` Thanks, Bowen On Tue, Sep 24, 2019 at 9:42 PM Bowen Li [hidden email]wrote:Hi Dawid, Re 1): I agree making it easy for users to run experiments isimportant.However, I'm not sure allowing users to register temp functionsinnonexistent catalog/db is the optimal way. It seems a bit hacky,andbreaksthe current contract between Flink and users that catalog/db mustbevalidin order to operate on. How about we instead focus on making it convenient to createcatalogs?Users actually can already do it with ease via program or SQL CLIyamlfilefor an in-memory catalog which has neither extra dependency norexternalconnections. What we can further improve is DDL for catalogs,and Iraisedit in discussion of [FLIP 69 - Flink SQL DDL Enhancement] drivenbyTerrynow. In that case, if users would like to experiment via SQL, they caneasilycreate an in memory catalog/database using DDL, then play withtempfunctions. Re 2): For the assumption, IIUIC, function ObjectIdentifier hasnotbeenresolved when stack call reachesFunctionCatalog#lookupFunction(),butonlybeen parsed? I agree keeping ObjectIdentifier as-is would be good. I'm ok withthesuggested classes, though making ObjectIdentifier a subclass of FunctionIdentifier seem a bit counter intuitive. Another potentially simpler way is: ``` // in class FunctionLookup class Result { Optional<ObjectIdentifier> getObjectIdentifier() { ... } String getName() { ... } ... } ``` WDYT? On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < [hidden email]> wrote:Hi, I really like the flip and think it clarifies important aspectsofthesystem. I have two, I hope small suggestions, which will not take muchtimetoagree on. 1. Could we follow the MySQL approach in regards to theexistenceofcat/db for temporary functions? That means not to check it, so e.g.it'spossibleto create a temporary function in a database that does notexist.Ithinkit's really useful e.g in cases when user wants to performexperimentsbutdoes not have access to the db yet or temporarily does not haveconnectionto a catalog. 2. Could we not change the ObjectIdentifier? Could we not loosentherequirements for all catalog objects such as tables, views,typesjustforthe functions? It's really important later on from e.g theserializabilityperspective. The important aspect of the ObjectIdentifier isthatweknowthat it has been resolved. The suggested changes break thatassumption.What do you think about adding an interface FunctionIdentifier { String getName(); /** Return 3-part identifier. Empty in case of a built-infunction.*/ Optional<ObjectIdentifier> getObjectIdentifier() } class ObjectIdentifier implements FunctionIdentifier { Optional<ObjectIdentifier> getObjectIdentifier() { return Optional.of(this); } } class SystemFunctionIdentifier implements FunctionIdentifier{...}WDYT? On Wed, 25 Sep 2019, 04:50 Xuefu Z, [hidden email] wrote:+1. LGTM On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]>wrote:+1 Best, Terry Wang在 2019年9月24日,上午10:42,Kurt Young [hidden email] 写道: +1 Best, Kurt On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]wrote:Hi all, I'd like to start a voting thread for FLIP-57 [1], whichwe'vereachedconsensus in [2]. This voting will be open for minimum 3 days till 6:30pmUTC,Sep26.Thanks, Bowen [1]https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog[2]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613-- Xuefu Zhang "In Honey We Trust!"-- Best regards! Rui Li signature.asc (849 bytes) Download Attachment |
In reply to this post by bowen.li
Thanks for the summary Bowen!
Looks good to me. Cheers, Fabian Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <[hidden email]>: > Hi all, > > I've updated the FLIP wiki with the following changes: > > - Lifespan of temp functions are not tied to those of catalogs and > databases. Users can create temp functions even though catalogs/dbs in > their fully qualified names don't even exist. > - some new SQL commands > - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in > functions, and names of temp and catalog functions in the current catalog > and db > - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built > functions, and fully qualified names of temp and catalog functions in all > catalogs and dbs > - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp > functions in all catalog and db > - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system > functions > > Let me know if you have any questions. > > Seems we have resolved all concerns. If there's no more ones, I'd like to > close the vote by this time tomorrow. > > Cheers, > Bowen > > On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote: > > > Hi, > > > > I think above are some valid points, and we can adopt the suggestions. > > > > To elaborate a bit on the new SQL syntax, it would imply that, unlike > > "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY] > > FUNCTIONS" would return functions' fully qualified names with catalog and > > db names. > > > > > > > > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> wrote: > > > >> Hi all, > >> > >> I support Fabian's arguments. In my opinion, temporary objects should > >> just be an additional layer on top of the regular catalog/database > >> lookup logic. Thus, a temporary table or function has always highest > >> precedence and should be stable within the local session. Otherwise it > >> could magically disappear while someone else is performing modifications > >> in the catalog. > >> > >> Furthermore, this feature is very useful for prototyping as users can > >> simply express that a catalog/database is present even through they > >> might not have access to it currently. > >> > >> Regards, > >> Timo > >> > >> > >> On 30.09.19 14:57, Fabian Hueske wrote: > >> > Hi all, > >> > > >> > Sorry for the late reply. > >> > > >> > I think it might lead to confusing situations if temporary functions > (or > >> > any temporary db objects for that matter) are bound to the life cycle > >> of an > >> > (external) db/catalog. > >> > Imaging a situation where you create a temp function in a db in an > >> external > >> > catalog and use it but at some point it does not work anymore because > >> some > >> > other dropped the database from the external catalog. > >> > Shouldn't temporary objects be only controlled by the owner of a > >> session? > >> > > >> > I agree that creating temp objects in non-existing db/catalogs sounds > a > >> bit > >> > strange, but IMO the opposite (the db/catalog must exist for a temp > >> > function to be created/exist) can have significant implications like > the > >> > one I described. > >> > I think it would be quite easy for users to understand that temporary > >> > objects are solely owned by them (and their session). > >> > The problem of listing temporary objects could be solved by adding a > ALL > >> > [TEMPORARY] clause: > >> > > >> > SHOW ALL FUNCTIONS; could show all functions regardless of the > >> > catalog/database including temporary functions. > >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions > >> regardless > >> > of the catalog/database. > >> > > >> > Best, > >> > Fabian > >> > > >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < > >> [hidden email]>: > >> > > >> >> @Dawid, do you have any other concerns? If not, I hope we can close > the > >> >> voting. > >> >> > >> >> > >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> > wrote: > >> >> > >> >>> I'm not sure how much benefit #1 can bring us. If users just want to > >> try > >> >>> out temporary functions, they can create temporary system functions > >> which > >> >>> don't require a catalog/DB. IIUC, the main reason why we allow > >> temporary > >> >>> catalog function is to let users override permanent catalog > functions. > >> >>> Therefore a temporary function in a non-existing catalog won't serve > >> that > >> >>> purpose. Besides, each session is provided with a default catalog > and > >> DB. > >> >>> So even if users simply want to create some catalog functions they > can > >> >>> forget about after the session, wouldn't the default catalog/DB be > >> enough > >> >>> for such experiments? > >> >>> > >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> > wrote: > >> >>> > >> >>>> Re 1) As described in the FLIP, a temp function lookup will first > >> make > >> >>> sure > >> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to > >> >>> remove > >> >>>> that temp function. > >> >>>> > >> >>>> I agree Hive doesn't handle it consistently, and we are not copying > >> >> Hive. > >> >>>> IMHO, allowing registering temp functions in nonexistent catalog/db > >> is > >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list > >> system > >> >>>> functions and functions in the current catalog/db, since users > cannot > >> >>>> designate a nonexistent catalog/db as current ones, how can they > list > >> >>>> functions in nonexistent catalog/db? They may end up never knowing > >> what > >> >>>> temp functions they've created unless trying out with queries or we > >> >>>> introducing some more nonstandard SQL statements. The same applies > to > >> >>> other > >> >>>> temp objects like temp tables. > >> >>>> > >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me > >> >>>> > >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > >> >>>> [hidden email]> > >> >>>> wrote: > >> >>>> > >> >>>>> Ad. 1 > >> >>>>> I wouldn't say it is hacky. > >> >>>>> Moreover, how do you want ensure that the dB always exists when a > >> >>>> temporary > >> >>>>> object is used?( in this particular case function). Do you want to > >> >>> query > >> >>>>> for the database existence whenever e.g a temporary function is > >> >> used? I > >> >>>>> think important aspect here is that the database can be dropped > from > >> >>>>> external system, not just flink or a different flink session. > >> >>>>> > >> >>>>> E.g in case of hive, you cannot create a temporary table in a > >> >> database > >> >>>> that > >> >>>>> does not exist, that's true. But if you create a temporary table > in > >> a > >> >>>>> database and drop that database from a different session, you can > >> >> still > >> >>>>> query the previously created temporary table from the original > >> >> session. > >> >>>> It > >> >>>>> does not sound like a consistent behaviour to me. Why don't we > make > >> >>> this > >> >>>>> behaviour of not binding a temporary objects to the lifetime of a > >> >>>> database > >> >>>>> explicit as part of the temporary objects contract? In the end > they > >> >>> exist > >> >>>>> in different layers. Permanent objects & databases in a catalog > (in > >> >>> case > >> >>>> of > >> >>>>> hive megastore) whereas temporary objects in flink sessions. > That's > >> >>> also > >> >>>>> true for the original hive client. The temporary objects live in > the > >> >>> hive > >> >>>>> client whereas databases are created in the metastore. > >> >>>>> > >> >>>>> Ad.2 > >> >>>>> I'm open for suggestions here. The one thing I wanted to achieve > >> here > >> >>> is > >> >>>> so > >> >>>>> that we do not change the contract of ObjectIdentifier. One > >> important > >> >>>> thing > >> >>>>> to remember here is that we need the function identifier to be > part > >> >> of > >> >>>> the > >> >>>>> FunctionDefinition object and not only as the result of the > function > >> >>>>> lookup. At some point we want to be able to store QueryOperations > in > >> >>> the > >> >>>>> catalogs. They can contain function calls within which we need to > >> >> have > >> >>>> the > >> >>>>> identifier. > >> >>>>> > >> >>>>> I agree my initial suggestion is over complicated. How about we > have > >> >>> just > >> >>>>> the FunctionIdentifier as top level class without making the > >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the same > >> >> what > >> >>>> you > >> >>>>> suggested. The only difference is that it would be a top level > class > >> >>>> with a > >> >>>>> more descriptive name. > >> >>>>> > >> >>>>> > >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote: > >> >>>>> > >> >>>>>> Sorry, I missed some parts of the solution. The complete > >> >> alternative > >> >>> is > >> >>>>> the > >> >>>>>> following, basically having separate APIs in FunctionLookup for > >> >>>> ambiguous > >> >>>>>> and precise function lookup since planner is able to tell which > API > >> >>> to > >> >>>>> call > >> >>>>>> with parsed queries, and have a unified result: > >> >>>>>> > >> >>>>>> ``` > >> >>>>>> class FunctionLookup { > >> >>>>>> > >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name); > >> >>>>>> > >> >>>>>> > >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > >> >>>>>> > >> >>>>>> > >> >>>>>> class Result { > >> >>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > >> >>>>>> String getName() { ... } > >> >>>>>> // ... > >> >>>>>> } > >> >>>>>> > >> >>>>>> } > >> >>>>>> ``` > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> Bowen > >> >>>>>> > >> >>>>>> > >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> > >> >>> wrote: > >> >>>>>>> Hi Dawid, > >> >>>>>>> > >> >>>>>>> Re 1): I agree making it easy for users to run experiments is > >> >>>>> important. > >> >>>>>>> However, I'm not sure allowing users to register temp functions > >> >> in > >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky, > >> >>> and > >> >>>>>> breaks > >> >>>>>>> the current contract between Flink and users that catalog/db > must > >> >>> be > >> >>>>>> valid > >> >>>>>>> in order to operate on. > >> >>>>>>> > >> >>>>>>> How about we instead focus on making it convenient to create > >> >>>> catalogs? > >> >>>>>>> Users actually can already do it with ease via program or SQL > CLI > >> >>>> yaml > >> >>>>>> file > >> >>>>>>> for an in-memory catalog which has neither extra dependency nor > >> >>>>> external > >> >>>>>>> connections. What we can further improve is DDL for catalogs, > >> >> and I > >> >>>>>> raised > >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven > >> >> by > >> >>>>> Terry > >> >>>>>>> now. > >> >>>>>>> > >> >>>>>>> In that case, if users would like to experiment via SQL, they > can > >> >>>>> easily > >> >>>>>>> create an in memory catalog/database using DDL, then play with > >> >> temp > >> >>>>>>> functions. > >> >>>>>>> > >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has > >> >> not > >> >>>>> been > >> >>>>>>> resolved when stack call reaches > >> >> FunctionCatalog#lookupFunction(), > >> >>>> but > >> >>>>>> only > >> >>>>>>> been parsed? > >> >>>>>>> > >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok > with > >> >>> the > >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass of > >> >>>>>>> FunctionIdentifier seem a bit counter intuitive. > >> >>>>>>> > >> >>>>>>> Another potentially simpler way is: > >> >>>>>>> > >> >>>>>>> ``` > >> >>>>>>> // in class FunctionLookup > >> >>>>>>> class Result { > >> >>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > >> >>>>>>> String getName() { ... } > >> >>>>>>> ... > >> >>>>>>> } > >> >>>>>>> ``` > >> >>>>>>> > >> >>>>>>> WDYT? > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > >> >>>>>>> [hidden email]> wrote: > >> >>>>>>> > >> >>>>>>>> Hi, > >> >>>>>>>> I really like the flip and think it clarifies important aspects > >> >> of > >> >>>> the > >> >>>>>>>> system. > >> >>>>>>>> > >> >>>>>>>> I have two, I hope small suggestions, which will not take much > >> >>> time > >> >>>> to > >> >>>>>>>> agree on. > >> >>>>>>>> > >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the > >> >> existence > >> >>> of > >> >>>>>>>> cat/db > >> >>>>>>>> for temporary functions? That means not to check it, so e.g. > >> >> it's > >> >>>>>> possible > >> >>>>>>>> to create a temporary function in a database that does not > >> >> exist. > >> >>> I > >> >>>>>> think > >> >>>>>>>> it's really useful e.g in cases when user wants to perform > >> >>>> experiments > >> >>>>>> but > >> >>>>>>>> does not have access to the db yet or temporarily does not have > >> >>>>>> connection > >> >>>>>>>> to a catalog. > >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not > loosen > >> >>> the > >> >>>>>>>> requirements for all catalog objects such as tables, views, > >> >> types > >> >>>> just > >> >>>>>> for > >> >>>>>>>> the functions? It's really important later on from e.g the > >> >>>>>> serializability > >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is > >> >> that > >> >>> we > >> >>>>>> know > >> >>>>>>>> that it has been resolved. The suggested changes break that > >> >>>>> assumption. > >> >>>>>>>> What do you think about adding an interface FunctionIdentifier > { > >> >>>>>>>> > >> >>>>>>>> String getName(); > >> >>>>>>>> > >> >>>>>>>> /** > >> >>>>>>>> Return 3-part identifier. Empty in case of a built-in > >> >> function. > >> >>>>>>>> */ > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() > >> >>>>>>>> } > >> >>>>>>>> > >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier { > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > >> >>>>>>>> return Optional.of(this); > >> >>>>>>>> } > >> >>>>>>>> } > >> >>>>>>>> > >> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier > >> >> {...} > >> >>>>>>>> WDYT? > >> >>>>>>>> > >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote: > >> >>>>>>>> > >> >>>>>>>>> +1. LGTM > >> >>>>>>>>> > >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > >> >> [hidden email]> > >> >>>>>> wrote: > >> >>>>>>>>>> +1 > >> >>>>>>>>>> > >> >>>>>>>>>> Best, > >> >>>>>>>>>> Terry Wang > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > >> >>>>>>>>>>> > >> >>>>>>>>>>> +1 > >> >>>>>>>>>>> > >> >>>>>>>>>>> Best, > >> >>>>>>>>>>> Kurt > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > >> >>> [hidden email] > >> >>>>>>>> wrote: > >> >>>>>>>>>>>> Hi all, > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which > >> >>>> we've > >> >>>>>>>> reached > >> >>>>>>>>>>>> consensus in [2]. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm > >> >>> UTC, > >> >>>>> Sep > >> >>>>>>>> 26. > >> >>>>>>>>>>>> Thanks, > >> >>>>>>>>>>>> Bowen > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> [1] > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > >> >>>>>>>>>>>> [2] > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> > >> >> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > >> >>>>>>>>>> > >> >>>>>>>>> -- > >> >>>>>>>>> Xuefu Zhang > >> >>>>>>>>> > >> >>>>>>>>> "In Honey We Trust!" > >> >>>>>>>>> > >> >>> > >> >>> -- > >> >>> Best regards! > >> >>> Rui Li > >> >>> > >> > >> > |
Hi Dawid,
Thanks for bringing the suggestions up. I was prototyping yesterday and found out those places exactly as what you suggested. For CallExpression and UnresolvedCallExpression, I've added them to FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and mark that as a breaking change For FunctionIdentifier, the suggested changes LGTM. Just want to bring up an issue on naming. It seems to me how we now name functions categories is a bit unclear and confusing, which is reflected on the suggested APIs - in FunctionIdentifier you lay out, "builtin function" would include builtin functions and temporary system functions as we are kind of using "system" and "built-in" interchangeably, and "catalog function" would include catalog functions and temporary functions. I currently can see two approaches to make it clearer to users. 1) Simplify FunctionIdentifier to be the following. As it's internal, we add comments and explanation to devs on which case the APIs support. However, I feel this approach would be somehow a bit conflicting to what you want to achieve for the clarity of APIs @Internal class FunctionIdentifier { // for built-in function and temporary system function public FunctionIdentifier of(String name) { } // for temporary function and catalog function public FunctionIdentifier of(ObjectIdentifier identifier){ } public Optional<String> getFunctionName() { } public Optional<ObjectIdentifier> getObjectIdentifier() { } } 2) We can rename our function categories as following so there'll be mainly just two categories of functions, "system functions" and "catalog functions", either of which can have temporary ones - built-in functions -> officially rename to "system functions" and note to users that "system" and "built-in" can be used interchangeably. We prefer "system" because that's the keyword we decided to use in DDL that creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION") - temporary system functions - catalog functions - temporary functions -> rename to "temporary catalog function" @Internal class FunctionIdentifier { // for temporary/non-temporary system function public FunctionIdentifier ofSystemFunction(String name) { } // for temporary/non-temporary catalog function public FunctionIdentifier ofCatalogFunction(ObjectIdentifier identifier){ } public Optional<String> getSystemFunctionName() { } public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() { } } WDYT? On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> wrote: > Thanks for the summary Bowen! > > Looks good to me. > > Cheers, > Fabian > > Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <[hidden email] > >: > > > Hi all, > > > > I've updated the FLIP wiki with the following changes: > > > > - Lifespan of temp functions are not tied to those of catalogs and > > databases. Users can create temp functions even though catalogs/dbs in > > their fully qualified names don't even exist. > > - some new SQL commands > > - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in > > functions, and names of temp and catalog functions in the current catalog > > and db > > - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built > > functions, and fully qualified names of temp and catalog functions in all > > catalogs and dbs > > - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp > > functions in all catalog and db > > - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp > system > > functions > > > > Let me know if you have any questions. > > > > Seems we have resolved all concerns. If there's no more ones, I'd like to > > close the vote by this time tomorrow. > > > > Cheers, > > Bowen > > > > On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote: > > > > > Hi, > > > > > > I think above are some valid points, and we can adopt the suggestions. > > > > > > To elaborate a bit on the new SQL syntax, it would imply that, unlike > > > "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY] > > > FUNCTIONS" would return functions' fully qualified names with catalog > and > > > db names. > > > > > > > > > > > > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> > wrote: > > > > > >> Hi all, > > >> > > >> I support Fabian's arguments. In my opinion, temporary objects should > > >> just be an additional layer on top of the regular catalog/database > > >> lookup logic. Thus, a temporary table or function has always highest > > >> precedence and should be stable within the local session. Otherwise it > > >> could magically disappear while someone else is performing > modifications > > >> in the catalog. > > >> > > >> Furthermore, this feature is very useful for prototyping as users can > > >> simply express that a catalog/database is present even through they > > >> might not have access to it currently. > > >> > > >> Regards, > > >> Timo > > >> > > >> > > >> On 30.09.19 14:57, Fabian Hueske wrote: > > >> > Hi all, > > >> > > > >> > Sorry for the late reply. > > >> > > > >> > I think it might lead to confusing situations if temporary functions > > (or > > >> > any temporary db objects for that matter) are bound to the life > cycle > > >> of an > > >> > (external) db/catalog. > > >> > Imaging a situation where you create a temp function in a db in an > > >> external > > >> > catalog and use it but at some point it does not work anymore > because > > >> some > > >> > other dropped the database from the external catalog. > > >> > Shouldn't temporary objects be only controlled by the owner of a > > >> session? > > >> > > > >> > I agree that creating temp objects in non-existing db/catalogs > sounds > > a > > >> bit > > >> > strange, but IMO the opposite (the db/catalog must exist for a temp > > >> > function to be created/exist) can have significant implications like > > the > > >> > one I described. > > >> > I think it would be quite easy for users to understand that > temporary > > >> > objects are solely owned by them (and their session). > > >> > The problem of listing temporary objects could be solved by adding a > > ALL > > >> > [TEMPORARY] clause: > > >> > > > >> > SHOW ALL FUNCTIONS; could show all functions regardless of the > > >> > catalog/database including temporary functions. > > >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions > > >> regardless > > >> > of the catalog/database. > > >> > > > >> > Best, > > >> > Fabian > > >> > > > >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < > > >> [hidden email]>: > > >> > > > >> >> @Dawid, do you have any other concerns? If not, I hope we can close > > the > > >> >> voting. > > >> >> > > >> >> > > >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> > > wrote: > > >> >> > > >> >>> I'm not sure how much benefit #1 can bring us. If users just want > to > > >> try > > >> >>> out temporary functions, they can create temporary system > functions > > >> which > > >> >>> don't require a catalog/DB. IIUC, the main reason why we allow > > >> temporary > > >> >>> catalog function is to let users override permanent catalog > > functions. > > >> >>> Therefore a temporary function in a non-existing catalog won't > serve > > >> that > > >> >>> purpose. Besides, each session is provided with a default catalog > > and > > >> DB. > > >> >>> So even if users simply want to create some catalog functions they > > can > > >> >>> forget about after the session, wouldn't the default catalog/DB be > > >> enough > > >> >>> for such experiments? > > >> >>> > > >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> > > wrote: > > >> >>> > > >> >>>> Re 1) As described in the FLIP, a temp function lookup will first > > >> make > > >> >>> sure > > >> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered > to > > >> >>> remove > > >> >>>> that temp function. > > >> >>>> > > >> >>>> I agree Hive doesn't handle it consistently, and we are not > copying > > >> >> Hive. > > >> >>>> IMHO, allowing registering temp functions in nonexistent > catalog/db > > >> is > > >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list > > >> system > > >> >>>> functions and functions in the current catalog/db, since users > > cannot > > >> >>>> designate a nonexistent catalog/db as current ones, how can they > > list > > >> >>>> functions in nonexistent catalog/db? They may end up never > knowing > > >> what > > >> >>>> temp functions they've created unless trying out with queries or > we > > >> >>>> introducing some more nonstandard SQL statements. The same > applies > > to > > >> >>> other > > >> >>>> temp objects like temp tables. > > >> >>>> > > >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me > > >> >>>> > > >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > > >> >>>> [hidden email]> > > >> >>>> wrote: > > >> >>>> > > >> >>>>> Ad. 1 > > >> >>>>> I wouldn't say it is hacky. > > >> >>>>> Moreover, how do you want ensure that the dB always exists when > a > > >> >>>> temporary > > >> >>>>> object is used?( in this particular case function). Do you want > to > > >> >>> query > > >> >>>>> for the database existence whenever e.g a temporary function is > > >> >> used? I > > >> >>>>> think important aspect here is that the database can be dropped > > from > > >> >>>>> external system, not just flink or a different flink session. > > >> >>>>> > > >> >>>>> E.g in case of hive, you cannot create a temporary table in a > > >> >> database > > >> >>>> that > > >> >>>>> does not exist, that's true. But if you create a temporary table > > in > > >> a > > >> >>>>> database and drop that database from a different session, you > can > > >> >> still > > >> >>>>> query the previously created temporary table from the original > > >> >> session. > > >> >>>> It > > >> >>>>> does not sound like a consistent behaviour to me. Why don't we > > make > > >> >>> this > > >> >>>>> behaviour of not binding a temporary objects to the lifetime of > a > > >> >>>> database > > >> >>>>> explicit as part of the temporary objects contract? In the end > > they > > >> >>> exist > > >> >>>>> in different layers. Permanent objects & databases in a catalog > > (in > > >> >>> case > > >> >>>> of > > >> >>>>> hive megastore) whereas temporary objects in flink sessions. > > That's > > >> >>> also > > >> >>>>> true for the original hive client. The temporary objects live in > > the > > >> >>> hive > > >> >>>>> client whereas databases are created in the metastore. > > >> >>>>> > > >> >>>>> Ad.2 > > >> >>>>> I'm open for suggestions here. The one thing I wanted to achieve > > >> here > > >> >>> is > > >> >>>> so > > >> >>>>> that we do not change the contract of ObjectIdentifier. One > > >> important > > >> >>>> thing > > >> >>>>> to remember here is that we need the function identifier to be > > part > > >> >> of > > >> >>>> the > > >> >>>>> FunctionDefinition object and not only as the result of the > > function > > >> >>>>> lookup. At some point we want to be able to store > QueryOperations > > in > > >> >>> the > > >> >>>>> catalogs. They can contain function calls within which we need > to > > >> >> have > > >> >>>> the > > >> >>>>> identifier. > > >> >>>>> > > >> >>>>> I agree my initial suggestion is over complicated. How about we > > have > > >> >>> just > > >> >>>>> the FunctionIdentifier as top level class without making the > > >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the > same > > >> >> what > > >> >>>> you > > >> >>>>> suggested. The only difference is that it would be a top level > > class > > >> >>>> with a > > >> >>>>> more descriptive name. > > >> >>>>> > > >> >>>>> > > >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> > wrote: > > >> >>>>> > > >> >>>>>> Sorry, I missed some parts of the solution. The complete > > >> >> alternative > > >> >>> is > > >> >>>>> the > > >> >>>>>> following, basically having separate APIs in FunctionLookup for > > >> >>>> ambiguous > > >> >>>>>> and precise function lookup since planner is able to tell which > > API > > >> >>> to > > >> >>>>> call > > >> >>>>>> with parsed queries, and have a unified result: > > >> >>>>>> > > >> >>>>>> ``` > > >> >>>>>> class FunctionLookup { > > >> >>>>>> > > >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name); > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> class Result { > > >> >>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > >> >>>>>> String getName() { ... } > > >> >>>>>> // ... > > >> >>>>>> } > > >> >>>>>> > > >> >>>>>> } > > >> >>>>>> ``` > > >> >>>>>> > > >> >>>>>> Thanks, > > >> >>>>>> Bowen > > >> >>>>>> > > >> >>>>>> > > >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> > > >> >>> wrote: > > >> >>>>>>> Hi Dawid, > > >> >>>>>>> > > >> >>>>>>> Re 1): I agree making it easy for users to run experiments is > > >> >>>>> important. > > >> >>>>>>> However, I'm not sure allowing users to register temp > functions > > >> >> in > > >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit > hacky, > > >> >>> and > > >> >>>>>> breaks > > >> >>>>>>> the current contract between Flink and users that catalog/db > > must > > >> >>> be > > >> >>>>>> valid > > >> >>>>>>> in order to operate on. > > >> >>>>>>> > > >> >>>>>>> How about we instead focus on making it convenient to create > > >> >>>> catalogs? > > >> >>>>>>> Users actually can already do it with ease via program or SQL > > CLI > > >> >>>> yaml > > >> >>>>>> file > > >> >>>>>>> for an in-memory catalog which has neither extra dependency > nor > > >> >>>>> external > > >> >>>>>>> connections. What we can further improve is DDL for catalogs, > > >> >> and I > > >> >>>>>> raised > > >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] > driven > > >> >> by > > >> >>>>> Terry > > >> >>>>>>> now. > > >> >>>>>>> > > >> >>>>>>> In that case, if users would like to experiment via SQL, they > > can > > >> >>>>> easily > > >> >>>>>>> create an in memory catalog/database using DDL, then play with > > >> >> temp > > >> >>>>>>> functions. > > >> >>>>>>> > > >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier > has > > >> >> not > > >> >>>>> been > > >> >>>>>>> resolved when stack call reaches > > >> >> FunctionCatalog#lookupFunction(), > > >> >>>> but > > >> >>>>>> only > > >> >>>>>>> been parsed? > > >> >>>>>>> > > >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok > > with > > >> >>> the > > >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass > of > > >> >>>>>>> FunctionIdentifier seem a bit counter intuitive. > > >> >>>>>>> > > >> >>>>>>> Another potentially simpler way is: > > >> >>>>>>> > > >> >>>>>>> ``` > > >> >>>>>>> // in class FunctionLookup > > >> >>>>>>> class Result { > > >> >>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... } > > >> >>>>>>> String getName() { ... } > > >> >>>>>>> ... > > >> >>>>>>> } > > >> >>>>>>> ``` > > >> >>>>>>> > > >> >>>>>>> WDYT? > > >> >>>>>>> > > >> >>>>>>> > > >> >>>>>>> > > >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > > >> >>>>>>> [hidden email]> wrote: > > >> >>>>>>> > > >> >>>>>>>> Hi, > > >> >>>>>>>> I really like the flip and think it clarifies important > aspects > > >> >> of > > >> >>>> the > > >> >>>>>>>> system. > > >> >>>>>>>> > > >> >>>>>>>> I have two, I hope small suggestions, which will not take > much > > >> >>> time > > >> >>>> to > > >> >>>>>>>> agree on. > > >> >>>>>>>> > > >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the > > >> >> existence > > >> >>> of > > >> >>>>>>>> cat/db > > >> >>>>>>>> for temporary functions? That means not to check it, so e.g. > > >> >> it's > > >> >>>>>> possible > > >> >>>>>>>> to create a temporary function in a database that does not > > >> >> exist. > > >> >>> I > > >> >>>>>> think > > >> >>>>>>>> it's really useful e.g in cases when user wants to perform > > >> >>>> experiments > > >> >>>>>> but > > >> >>>>>>>> does not have access to the db yet or temporarily does not > have > > >> >>>>>> connection > > >> >>>>>>>> to a catalog. > > >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not > > loosen > > >> >>> the > > >> >>>>>>>> requirements for all catalog objects such as tables, views, > > >> >> types > > >> >>>> just > > >> >>>>>> for > > >> >>>>>>>> the functions? It's really important later on from e.g the > > >> >>>>>> serializability > > >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is > > >> >> that > > >> >>> we > > >> >>>>>> know > > >> >>>>>>>> that it has been resolved. The suggested changes break that > > >> >>>>> assumption. > > >> >>>>>>>> What do you think about adding an interface > FunctionIdentifier > > { > > >> >>>>>>>> > > >> >>>>>>>> String getName(); > > >> >>>>>>>> > > >> >>>>>>>> /** > > >> >>>>>>>> Return 3-part identifier. Empty in case of a built-in > > >> >> function. > > >> >>>>>>>> */ > > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() > > >> >>>>>>>> } > > >> >>>>>>>> > > >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier { > > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > > >> >>>>>>>> return Optional.of(this); > > >> >>>>>>>> } > > >> >>>>>>>> } > > >> >>>>>>>> > > >> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier > > >> >> {...} > > >> >>>>>>>> WDYT? > > >> >>>>>>>> > > >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> > wrote: > > >> >>>>>>>> > > >> >>>>>>>>> +1. LGTM > > >> >>>>>>>>> > > >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > > >> >> [hidden email]> > > >> >>>>>> wrote: > > >> >>>>>>>>>> +1 > > >> >>>>>>>>>> > > >> >>>>>>>>>> Best, > > >> >>>>>>>>>> Terry Wang > > >> >>>>>>>>>> > > >> >>>>>>>>>> > > >> >>>>>>>>>> > > >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> +1 > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> Best, > > >> >>>>>>>>>>> Kurt > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> > > >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > > >> >>> [hidden email] > > >> >>>>>>>> wrote: > > >> >>>>>>>>>>>> Hi all, > > >> >>>>>>>>>>>> > > >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which > > >> >>>> we've > > >> >>>>>>>> reached > > >> >>>>>>>>>>>> consensus in [2]. > > >> >>>>>>>>>>>> > > >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm > > >> >>> UTC, > > >> >>>>> Sep > > >> >>>>>>>> 26. > > >> >>>>>>>>>>>> Thanks, > > >> >>>>>>>>>>>> Bowen > > >> >>>>>>>>>>>> > > >> >>>>>>>>>>>> [1] > > >> >>>>>>>>>>>> > > >> >>>>>>>>>>>> > > >> >> > > >> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > >> >>>>>>>>>>>> [2] > > >> >>>>>>>>>>>> > > >> >>>>>>>>>>>> > > >> >> > > >> > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > >> >>>>>>>>>> > > >> >>>>>>>>> -- > > >> >>>>>>>>> Xuefu Zhang > > >> >>>>>>>>> > > >> >>>>>>>>> "In Honey We Trust!" > > >> >>>>>>>>> > > >> >>> > > >> >>> -- > > >> >>> Best regards! > > >> >>> Rui Li > > >> >>> > > >> > > >> > > > |
Here are some of my thoughts on the minor debates above:
1. +1 for 4 categories of functions. They are categorized along two dimensions of binary values: X: *temporary* vs non-temporary (persistent); Y: *system* vs non-system (so said catalog). 2. In my opinion, class functionIdentifier doesn't really need to reflect the categories of the functions. Instead, we should decouple them to make the API more stable. Thus, my suggestion is: @Internal class FunctionIdentifier { // for temporary/non-temporary system function public FunctionIdentifier ofSimpleName(String name) { } // for temporary/non-temporary catalog function public FunctionIdentifier ofIdentifier(ObjectIdentifier identifier){ } public Optional<String> getSimpleName() { } public Optional<ObjectIdentifier> getIdentifier() { } } 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be: SHOW [TEMPORARY] [SYSTEM] FUNCTIONS. When either keyword is missing, "ALL" is implied along that dimension. We should always limit the search to the system function catalog and the current catalog/DB. I don't see a need of listing functions across different catalogs and databases. (It can be added later if that arises.) Thanks, Xuefu On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote: > Hi Dawid, > > Thanks for bringing the suggestions up. I was prototyping yesterday and > found out those places exactly as what you suggested. > > For CallExpression and UnresolvedCallExpression, I've added them to > FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and mark > that as a breaking change > > For FunctionIdentifier, the suggested changes LGTM. Just want to bring up > an issue on naming. It seems to me how we now name functions categories is > a bit unclear and confusing, which is reflected on the suggested APIs - in > FunctionIdentifier you lay out, "builtin function" would include builtin > functions and temporary system functions as we are kind of using "system" > and "built-in" interchangeably, and "catalog function" would include > catalog functions and temporary functions. I currently can see two > approaches to make it clearer to users. > > 1) Simplify FunctionIdentifier to be the following. As it's internal, we > add comments and explanation to devs on which case the APIs support. > However, I feel this approach would be somehow a bit conflicting to what > you want to achieve for the clarity of APIs > > @Internal > class FunctionIdentifier { > // for built-in function and temporary system function > public FunctionIdentifier of(String name) { } > // for temporary function and catalog function > public FunctionIdentifier of(ObjectIdentifier identifier){ } > public Optional<String> getFunctionName() { } > public Optional<ObjectIdentifier> getObjectIdentifier() { } > } > > 2) We can rename our function categories as following so there'll be mainly > just two categories of functions, "system functions" and "catalog > functions", either of which can have temporary ones > > - built-in functions -> officially rename to "system functions" and note > to users that "system" and "built-in" can be used interchangeably. We > prefer "system" because that's the keyword we decided to use in DDL that > creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION") > - temporary system functions > - catalog functions > - temporary functions -> rename to "temporary catalog function" > > @Internal > class FunctionIdentifier { > // for temporary/non-temporary system function > public FunctionIdentifier ofSystemFunction(String name) { } > // for temporary/non-temporary catalog function > public FunctionIdentifier ofCatalogFunction(ObjectIdentifier > identifier){ } > public Optional<String> getSystemFunctionName() { } > public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() { } > } > > WDYT? > > > On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> wrote: > > > Thanks for the summary Bowen! > > > > Looks good to me. > > > > Cheers, > > Fabian > > > > Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li < > [hidden email] > > >: > > > > > Hi all, > > > > > > I've updated the FLIP wiki with the following changes: > > > > > > - Lifespan of temp functions are not tied to those of catalogs and > > > databases. Users can create temp functions even though catalogs/dbs in > > > their fully qualified names don't even exist. > > > - some new SQL commands > > > - "SHOW FUNCTIONS" - list names of temp and non-temp > system/built-in > > > functions, and names of temp and catalog functions in the current > catalog > > > and db > > > - "SHOW ALL FUNCTIONS" - list names of temp and non-temp > system/built > > > functions, and fully qualified names of temp and catalog functions in > all > > > catalogs and dbs > > > - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of > temp > > > functions in all catalog and db > > > - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp > > system > > > functions > > > > > > Let me know if you have any questions. > > > > > > Seems we have resolved all concerns. If there's no more ones, I'd like > to > > > close the vote by this time tomorrow. > > > > > > Cheers, > > > Bowen > > > > > > On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote: > > > > > > > Hi, > > > > > > > > I think above are some valid points, and we can adopt the > suggestions. > > > > > > > > To elaborate a bit on the new SQL syntax, it would imply that, unlike > > > > "SHOW FUNCTION" which only return function names, "SHOW ALL > [TEMPORARY] > > > > FUNCTIONS" would return functions' fully qualified names with catalog > > and > > > > db names. > > > > > > > > > > > > > > > > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> > > wrote: > > > > > > > >> Hi all, > > > >> > > > >> I support Fabian's arguments. In my opinion, temporary objects > should > > > >> just be an additional layer on top of the regular catalog/database > > > >> lookup logic. Thus, a temporary table or function has always highest > > > >> precedence and should be stable within the local session. Otherwise > it > > > >> could magically disappear while someone else is performing > > modifications > > > >> in the catalog. > > > >> > > > >> Furthermore, this feature is very useful for prototyping as users > can > > > >> simply express that a catalog/database is present even through they > > > >> might not have access to it currently. > > > >> > > > >> Regards, > > > >> Timo > > > >> > > > >> > > > >> On 30.09.19 14:57, Fabian Hueske wrote: > > > >> > Hi all, > > > >> > > > > >> > Sorry for the late reply. > > > >> > > > > >> > I think it might lead to confusing situations if temporary > functions > > > (or > > > >> > any temporary db objects for that matter) are bound to the life > > cycle > > > >> of an > > > >> > (external) db/catalog. > > > >> > Imaging a situation where you create a temp function in a db in an > > > >> external > > > >> > catalog and use it but at some point it does not work anymore > > because > > > >> some > > > >> > other dropped the database from the external catalog. > > > >> > Shouldn't temporary objects be only controlled by the owner of a > > > >> session? > > > >> > > > > >> > I agree that creating temp objects in non-existing db/catalogs > > sounds > > > a > > > >> bit > > > >> > strange, but IMO the opposite (the db/catalog must exist for a > temp > > > >> > function to be created/exist) can have significant implications > like > > > the > > > >> > one I described. > > > >> > I think it would be quite easy for users to understand that > > temporary > > > >> > objects are solely owned by them (and their session). > > > >> > The problem of listing temporary objects could be solved by > adding a > > > ALL > > > >> > [TEMPORARY] clause: > > > >> > > > > >> > SHOW ALL FUNCTIONS; could show all functions regardless of the > > > >> > catalog/database including temporary functions. > > > >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions > > > >> regardless > > > >> > of the catalog/database. > > > >> > > > > >> > Best, > > > >> > Fabian > > > >> > > > > >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < > > > >> [hidden email]>: > > > >> > > > > >> >> @Dawid, do you have any other concerns? If not, I hope we can > close > > > the > > > >> >> voting. > > > >> >> > > > >> >> > > > >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> > > > wrote: > > > >> >> > > > >> >>> I'm not sure how much benefit #1 can bring us. If users just > want > > to > > > >> try > > > >> >>> out temporary functions, they can create temporary system > > functions > > > >> which > > > >> >>> don't require a catalog/DB. IIUC, the main reason why we allow > > > >> temporary > > > >> >>> catalog function is to let users override permanent catalog > > > functions. > > > >> >>> Therefore a temporary function in a non-existing catalog won't > > serve > > > >> that > > > >> >>> purpose. Besides, each session is provided with a default > catalog > > > and > > > >> DB. > > > >> >>> So even if users simply want to create some catalog functions > they > > > can > > > >> >>> forget about after the session, wouldn't the default catalog/DB > be > > > >> enough > > > >> >>> for such experiments? > > > >> >>> > > > >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> > > > wrote: > > > >> >>> > > > >> >>>> Re 1) As described in the FLIP, a temp function lookup will > first > > > >> make > > > >> >>> sure > > > >> >>>> the db exists. If the db doesn't exist, a lazy drop is > triggered > > to > > > >> >>> remove > > > >> >>>> that temp function. > > > >> >>>> > > > >> >>>> I agree Hive doesn't handle it consistently, and we are not > > copying > > > >> >> Hive. > > > >> >>>> IMHO, allowing registering temp functions in nonexistent > > catalog/db > > > >> is > > > >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would > list > > > >> system > > > >> >>>> functions and functions in the current catalog/db, since users > > > cannot > > > >> >>>> designate a nonexistent catalog/db as current ones, how can > they > > > list > > > >> >>>> functions in nonexistent catalog/db? They may end up never > > knowing > > > >> what > > > >> >>>> temp functions they've created unless trying out with queries > or > > we > > > >> >>>> introducing some more nonstandard SQL statements. The same > > applies > > > to > > > >> >>> other > > > >> >>>> temp objects like temp tables. > > > >> >>>> > > > >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me > > > >> >>>> > > > >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > > > >> >>>> [hidden email]> > > > >> >>>> wrote: > > > >> >>>> > > > >> >>>>> Ad. 1 > > > >> >>>>> I wouldn't say it is hacky. > > > >> >>>>> Moreover, how do you want ensure that the dB always exists > when > > a > > > >> >>>> temporary > > > >> >>>>> object is used?( in this particular case function). Do you > want > > to > > > >> >>> query > > > >> >>>>> for the database existence whenever e.g a temporary function > is > > > >> >> used? I > > > >> >>>>> think important aspect here is that the database can be > dropped > > > from > > > >> >>>>> external system, not just flink or a different flink session. > > > >> >>>>> > > > >> >>>>> E.g in case of hive, you cannot create a temporary table in a > > > >> >> database > > > >> >>>> that > > > >> >>>>> does not exist, that's true. But if you create a temporary > table > > > in > > > >> a > > > >> >>>>> database and drop that database from a different session, you > > can > > > >> >> still > > > >> >>>>> query the previously created temporary table from the original > > > >> >> session. > > > >> >>>> It > > > >> >>>>> does not sound like a consistent behaviour to me. Why don't we > > > make > > > >> >>> this > > > >> >>>>> behaviour of not binding a temporary objects to the lifetime > of > > a > > > >> >>>> database > > > >> >>>>> explicit as part of the temporary objects contract? In the end > > > they > > > >> >>> exist > > > >> >>>>> in different layers. Permanent objects & databases in a > catalog > > > (in > > > >> >>> case > > > >> >>>> of > > > >> >>>>> hive megastore) whereas temporary objects in flink sessions. > > > That's > > > >> >>> also > > > >> >>>>> true for the original hive client. The temporary objects live > in > > > the > > > >> >>> hive > > > >> >>>>> client whereas databases are created in the metastore. > > > >> >>>>> > > > >> >>>>> Ad.2 > > > >> >>>>> I'm open for suggestions here. The one thing I wanted to > achieve > > > >> here > > > >> >>> is > > > >> >>>> so > > > >> >>>>> that we do not change the contract of ObjectIdentifier. One > > > >> important > > > >> >>>> thing > > > >> >>>>> to remember here is that we need the function identifier to be > > > part > > > >> >> of > > > >> >>>> the > > > >> >>>>> FunctionDefinition object and not only as the result of the > > > function > > > >> >>>>> lookup. At some point we want to be able to store > > QueryOperations > > > in > > > >> >>> the > > > >> >>>>> catalogs. They can contain function calls within which we need > > to > > > >> >> have > > > >> >>>> the > > > >> >>>>> identifier. > > > >> >>>>> > > > >> >>>>> I agree my initial suggestion is over complicated. How about > we > > > have > > > >> >>> just > > > >> >>>>> the FunctionIdentifier as top level class without making the > > > >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the > > same > > > >> >> what > > > >> >>>> you > > > >> >>>>> suggested. The only difference is that it would be a top level > > > class > > > >> >>>> with a > > > >> >>>>> more descriptive name. > > > >> >>>>> > > > >> >>>>> > > > >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> > > wrote: > > > >> >>>>> > > > >> >>>>>> Sorry, I missed some parts of the solution. The complete > > > >> >> alternative > > > >> >>> is > > > >> >>>>> the > > > >> >>>>>> following, basically having separate APIs in FunctionLookup > for > > > >> >>>> ambiguous > > > >> >>>>>> and precise function lookup since planner is able to tell > which > > > API > > > >> >>> to > > > >> >>>>> call > > > >> >>>>>> with parsed queries, and have a unified result: > > > >> >>>>>> > > > >> >>>>>> ``` > > > >> >>>>>> class FunctionLookup { > > > >> >>>>>> > > > >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name); > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> class Result { > > > >> >>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... > } > > > >> >>>>>> String getName() { ... } > > > >> >>>>>> // ... > > > >> >>>>>> } > > > >> >>>>>> > > > >> >>>>>> } > > > >> >>>>>> ``` > > > >> >>>>>> > > > >> >>>>>> Thanks, > > > >> >>>>>> Bowen > > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li < > [hidden email]> > > > >> >>> wrote: > > > >> >>>>>>> Hi Dawid, > > > >> >>>>>>> > > > >> >>>>>>> Re 1): I agree making it easy for users to run experiments > is > > > >> >>>>> important. > > > >> >>>>>>> However, I'm not sure allowing users to register temp > > functions > > > >> >> in > > > >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit > > hacky, > > > >> >>> and > > > >> >>>>>> breaks > > > >> >>>>>>> the current contract between Flink and users that catalog/db > > > must > > > >> >>> be > > > >> >>>>>> valid > > > >> >>>>>>> in order to operate on. > > > >> >>>>>>> > > > >> >>>>>>> How about we instead focus on making it convenient to create > > > >> >>>> catalogs? > > > >> >>>>>>> Users actually can already do it with ease via program or > SQL > > > CLI > > > >> >>>> yaml > > > >> >>>>>> file > > > >> >>>>>>> for an in-memory catalog which has neither extra dependency > > nor > > > >> >>>>> external > > > >> >>>>>>> connections. What we can further improve is DDL for > catalogs, > > > >> >> and I > > > >> >>>>>> raised > > > >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] > > driven > > > >> >> by > > > >> >>>>> Terry > > > >> >>>>>>> now. > > > >> >>>>>>> > > > >> >>>>>>> In that case, if users would like to experiment via SQL, > they > > > can > > > >> >>>>> easily > > > >> >>>>>>> create an in memory catalog/database using DDL, then play > with > > > >> >> temp > > > >> >>>>>>> functions. > > > >> >>>>>>> > > > >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier > > has > > > >> >> not > > > >> >>>>> been > > > >> >>>>>>> resolved when stack call reaches > > > >> >> FunctionCatalog#lookupFunction(), > > > >> >>>> but > > > >> >>>>>> only > > > >> >>>>>>> been parsed? > > > >> >>>>>>> > > > >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok > > > with > > > >> >>> the > > > >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass > > of > > > >> >>>>>>> FunctionIdentifier seem a bit counter intuitive. > > > >> >>>>>>> > > > >> >>>>>>> Another potentially simpler way is: > > > >> >>>>>>> > > > >> >>>>>>> ``` > > > >> >>>>>>> // in class FunctionLookup > > > >> >>>>>>> class Result { > > > >> >>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > ... } > > > >> >>>>>>> String getName() { ... } > > > >> >>>>>>> ... > > > >> >>>>>>> } > > > >> >>>>>>> ``` > > > >> >>>>>>> > > > >> >>>>>>> WDYT? > > > >> >>>>>>> > > > >> >>>>>>> > > > >> >>>>>>> > > > >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > > > >> >>>>>>> [hidden email]> wrote: > > > >> >>>>>>> > > > >> >>>>>>>> Hi, > > > >> >>>>>>>> I really like the flip and think it clarifies important > > aspects > > > >> >> of > > > >> >>>> the > > > >> >>>>>>>> system. > > > >> >>>>>>>> > > > >> >>>>>>>> I have two, I hope small suggestions, which will not take > > much > > > >> >>> time > > > >> >>>> to > > > >> >>>>>>>> agree on. > > > >> >>>>>>>> > > > >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the > > > >> >> existence > > > >> >>> of > > > >> >>>>>>>> cat/db > > > >> >>>>>>>> for temporary functions? That means not to check it, so > e.g. > > > >> >> it's > > > >> >>>>>> possible > > > >> >>>>>>>> to create a temporary function in a database that does not > > > >> >> exist. > > > >> >>> I > > > >> >>>>>> think > > > >> >>>>>>>> it's really useful e.g in cases when user wants to perform > > > >> >>>> experiments > > > >> >>>>>> but > > > >> >>>>>>>> does not have access to the db yet or temporarily does not > > have > > > >> >>>>>> connection > > > >> >>>>>>>> to a catalog. > > > >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not > > > loosen > > > >> >>> the > > > >> >>>>>>>> requirements for all catalog objects such as tables, views, > > > >> >> types > > > >> >>>> just > > > >> >>>>>> for > > > >> >>>>>>>> the functions? It's really important later on from e.g the > > > >> >>>>>> serializability > > > >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier > is > > > >> >> that > > > >> >>> we > > > >> >>>>>> know > > > >> >>>>>>>> that it has been resolved. The suggested changes break that > > > >> >>>>> assumption. > > > >> >>>>>>>> What do you think about adding an interface > > FunctionIdentifier > > > { > > > >> >>>>>>>> > > > >> >>>>>>>> String getName(); > > > >> >>>>>>>> > > > >> >>>>>>>> /** > > > >> >>>>>>>> Return 3-part identifier. Empty in case of a built-in > > > >> >> function. > > > >> >>>>>>>> */ > > > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() > > > >> >>>>>>>> } > > > >> >>>>>>>> > > > >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier { > > > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > > > >> >>>>>>>> return Optional.of(this); > > > >> >>>>>>>> } > > > >> >>>>>>>> } > > > >> >>>>>>>> > > > >> >>>>>>>> class SystemFunctionIdentifier implements > FunctionIdentifier > > > >> >> {...} > > > >> >>>>>>>> WDYT? > > > >> >>>>>>>> > > > >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> > > wrote: > > > >> >>>>>>>> > > > >> >>>>>>>>> +1. LGTM > > > >> >>>>>>>>> > > > >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > > > >> >> [hidden email]> > > > >> >>>>>> wrote: > > > >> >>>>>>>>>> +1 > > > >> >>>>>>>>>> > > > >> >>>>>>>>>> Best, > > > >> >>>>>>>>>> Terry Wang > > > >> >>>>>>>>>> > > > >> >>>>>>>>>> > > > >> >>>>>>>>>> > > > >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > > > >> >>>>>>>>>>> > > > >> >>>>>>>>>>> +1 > > > >> >>>>>>>>>>> > > > >> >>>>>>>>>>> Best, > > > >> >>>>>>>>>>> Kurt > > > >> >>>>>>>>>>> > > > >> >>>>>>>>>>> > > > >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > > > >> >>> [hidden email] > > > >> >>>>>>>> wrote: > > > >> >>>>>>>>>>>> Hi all, > > > >> >>>>>>>>>>>> > > > >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], > which > > > >> >>>> we've > > > >> >>>>>>>> reached > > > >> >>>>>>>>>>>> consensus in [2]. > > > >> >>>>>>>>>>>> > > > >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm > > > >> >>> UTC, > > > >> >>>>> Sep > > > >> >>>>>>>> 26. > > > >> >>>>>>>>>>>> Thanks, > > > >> >>>>>>>>>>>> Bowen > > > >> >>>>>>>>>>>> > > > >> >>>>>>>>>>>> [1] > > > >> >>>>>>>>>>>> > > > >> >>>>>>>>>>>> > > > >> >> > > > >> > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > > >> >>>>>>>>>>>> [2] > > > >> >>>>>>>>>>>> > > > >> >>>>>>>>>>>> > > > >> >> > > > >> > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > > > >> >>>>>>>>>> > > > >> >>>>>>>>> -- > > > >> >>>>>>>>> Xuefu Zhang > > > >> >>>>>>>>> > > > >> >>>>>>>>> "In Honey We Trust!" > > > >> >>>>>>>>> > > > >> >>> > > > >> >>> -- > > > >> >>> Best regards! > > > >> >>> Rui Li > > > >> >>> > > > >> > > > >> > > > > > > -- Xuefu Zhang "In Honey We Trust!" |
Hi,
I very much agree with Xuefu's summary of the two points, especially on the "functionIdentifier doesn't need to reflect the categories". For the factory methods I think methods of should be enough: // for temporary/non-temporary system function public FunctionIdentifier of(String name) { } // for temporary/non-temporary catalog function public FunctionIdentifier of(ObjectIdentifier identifier){ } In case of the getters I did not like the method name `getName` in the original proposal, as in my opinion it could imply that it can return also just the name part of an ObjectIdentifier, which should not be the case. I'm fine with getSimpleName/getIdentifier, but want to throw in few other suggestions: * getShortPath(Identifier)/getLongPath(Identifier), * getSystemPath(Identifier)/getCatalogPath(Identifier) +1 to any of the 3 options. One additional thing the FunctionIdentifier should be a PublicEvolving class, as it is part of a PublicEvolving APIs e.g. CallExpression, which user might need to access e.g. in a filter pushdown. I also support the Xuefu's suggestion not to support the "ALL" keyword in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design of it is not part of the FLIP-57, we do not need to agree on that in this thread. Overall I think after updating the FLIP with the outcome of the discussion I vote +1 for it. Best, Dawid On 02/10/2019 00:21, Xuefu Z wrote: > Here are some of my thoughts on the minor debates above: > > 1. +1 for 4 categories of functions. They are categorized along two > dimensions of binary values: X: *temporary* vs non-temporary (persistent); > Y: *system* vs non-system (so said catalog). > 2. In my opinion, class functionIdentifier doesn't really need to reflect > the categories of the functions. Instead, we should decouple them to make > the API more stable. Thus, my suggestion is: > > @Internal > class FunctionIdentifier { > // for temporary/non-temporary system function > public FunctionIdentifier ofSimpleName(String name) { } > // for temporary/non-temporary catalog function > public FunctionIdentifier ofIdentifier(ObjectIdentifier > identifier){ } > public Optional<String> getSimpleName() { } > public Optional<ObjectIdentifier> getIdentifier() { } > } > 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be: > > SHOW [TEMPORARY] [SYSTEM] FUNCTIONS. > > When either keyword is missing, "ALL" is implied along that dimension. We > should always limit the search to the system function catalog and the > current catalog/DB. I don't see a need of listing functions across > different catalogs and databases. (It can be added later if that arises.) > > Thanks, > Xuefu > > On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote: > >> Hi Dawid, >> >> Thanks for bringing the suggestions up. I was prototyping yesterday and >> found out those places exactly as what you suggested. >> >> For CallExpression and UnresolvedCallExpression, I've added them to >> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and mark >> that as a breaking change >> >> For FunctionIdentifier, the suggested changes LGTM. Just want to bring up >> an issue on naming. It seems to me how we now name functions categories is >> a bit unclear and confusing, which is reflected on the suggested APIs - in >> FunctionIdentifier you lay out, "builtin function" would include builtin >> functions and temporary system functions as we are kind of using "system" >> and "built-in" interchangeably, and "catalog function" would include >> catalog functions and temporary functions. I currently can see two >> approaches to make it clearer to users. >> >> 1) Simplify FunctionIdentifier to be the following. As it's internal, we >> add comments and explanation to devs on which case the APIs support. >> However, I feel this approach would be somehow a bit conflicting to what >> you want to achieve for the clarity of APIs >> >> @Internal >> class FunctionIdentifier { >> // for built-in function and temporary system function >> public FunctionIdentifier of(String name) { } >> // for temporary function and catalog function >> public FunctionIdentifier of(ObjectIdentifier identifier){ } >> public Optional<String> getFunctionName() { } >> public Optional<ObjectIdentifier> getObjectIdentifier() { } >> } >> >> 2) We can rename our function categories as following so there'll be mainly >> just two categories of functions, "system functions" and "catalog >> functions", either of which can have temporary ones >> >> - built-in functions -> officially rename to "system functions" and note >> to users that "system" and "built-in" can be used interchangeably. We >> prefer "system" because that's the keyword we decided to use in DDL that >> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION") >> - temporary system functions >> - catalog functions >> - temporary functions -> rename to "temporary catalog function" >> >> @Internal >> class FunctionIdentifier { >> // for temporary/non-temporary system function >> public FunctionIdentifier ofSystemFunction(String name) { } >> // for temporary/non-temporary catalog function >> public FunctionIdentifier ofCatalogFunction(ObjectIdentifier >> identifier){ } >> public Optional<String> getSystemFunctionName() { } >> public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() { } >> } >> >> WDYT? >> >> >> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> wrote: >> >>> Thanks for the summary Bowen! >>> >>> Looks good to me. >>> >>> Cheers, >>> Fabian >>> >>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li < >> [hidden email] >>>> : >>>> Hi all, >>>> >>>> I've updated the FLIP wiki with the following changes: >>>> >>>> - Lifespan of temp functions are not tied to those of catalogs and >>>> databases. Users can create temp functions even though catalogs/dbs in >>>> their fully qualified names don't even exist. >>>> - some new SQL commands >>>> - "SHOW FUNCTIONS" - list names of temp and non-temp >> system/built-in >>>> functions, and names of temp and catalog functions in the current >> catalog >>>> and db >>>> - "SHOW ALL FUNCTIONS" - list names of temp and non-temp >> system/built >>>> functions, and fully qualified names of temp and catalog functions in >> all >>>> catalogs and dbs >>>> - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of >> temp >>>> functions in all catalog and db >>>> - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp >>> system >>>> functions >>>> >>>> Let me know if you have any questions. >>>> >>>> Seems we have resolved all concerns. If there's no more ones, I'd like >> to >>>> close the vote by this time tomorrow. >>>> >>>> Cheers, >>>> Bowen >>>> >>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I think above are some valid points, and we can adopt the >> suggestions. >>>>> To elaborate a bit on the new SQL syntax, it would imply that, unlike >>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL >> [TEMPORARY] >>>>> FUNCTIONS" would return functions' fully qualified names with catalog >>> and >>>>> db names. >>>>> >>>>> >>>>> >>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> >>> wrote: >>>>>> Hi all, >>>>>> >>>>>> I support Fabian's arguments. In my opinion, temporary objects >> should >>>>>> just be an additional layer on top of the regular catalog/database >>>>>> lookup logic. Thus, a temporary table or function has always highest >>>>>> precedence and should be stable within the local session. Otherwise >> it >>>>>> could magically disappear while someone else is performing >>> modifications >>>>>> in the catalog. >>>>>> >>>>>> Furthermore, this feature is very useful for prototyping as users >> can >>>>>> simply express that a catalog/database is present even through they >>>>>> might not have access to it currently. >>>>>> >>>>>> Regards, >>>>>> Timo >>>>>> >>>>>> >>>>>> On 30.09.19 14:57, Fabian Hueske wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> Sorry for the late reply. >>>>>>> >>>>>>> I think it might lead to confusing situations if temporary >> functions >>>> (or >>>>>>> any temporary db objects for that matter) are bound to the life >>> cycle >>>>>> of an >>>>>>> (external) db/catalog. >>>>>>> Imaging a situation where you create a temp function in a db in an >>>>>> external >>>>>>> catalog and use it but at some point it does not work anymore >>> because >>>>>> some >>>>>>> other dropped the database from the external catalog. >>>>>>> Shouldn't temporary objects be only controlled by the owner of a >>>>>> session? >>>>>>> I agree that creating temp objects in non-existing db/catalogs >>> sounds >>>> a >>>>>> bit >>>>>>> strange, but IMO the opposite (the db/catalog must exist for a >> temp >>>>>>> function to be created/exist) can have significant implications >> like >>>> the >>>>>>> one I described. >>>>>>> I think it would be quite easy for users to understand that >>> temporary >>>>>>> objects are solely owned by them (and their session). >>>>>>> The problem of listing temporary objects could be solved by >> adding a >>>> ALL >>>>>>> [TEMPORARY] clause: >>>>>>> >>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the >>>>>>> catalog/database including temporary functions. >>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions >>>>>> regardless >>>>>>> of the catalog/database. >>>>>>> >>>>>>> Best, >>>>>>> Fabian >>>>>>> >>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < >>>>>> [hidden email]>: >>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can >> close >>>> the >>>>>>>> voting. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> >>>> wrote: >>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just >> want >>> to >>>>>> try >>>>>>>>> out temporary functions, they can create temporary system >>> functions >>>>>> which >>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow >>>>>> temporary >>>>>>>>> catalog function is to let users override permanent catalog >>>> functions. >>>>>>>>> Therefore a temporary function in a non-existing catalog won't >>> serve >>>>>> that >>>>>>>>> purpose. Besides, each session is provided with a default >> catalog >>>> and >>>>>> DB. >>>>>>>>> So even if users simply want to create some catalog functions >> they >>>> can >>>>>>>>> forget about after the session, wouldn't the default catalog/DB >> be >>>>>> enough >>>>>>>>> for such experiments? >>>>>>>>> >>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> >>>> wrote: >>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will >> first >>>>>> make >>>>>>>>> sure >>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is >> triggered >>> to >>>>>>>>> remove >>>>>>>>>> that temp function. >>>>>>>>>> >>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not >>> copying >>>>>>>> Hive. >>>>>>>>>> IMHO, allowing registering temp functions in nonexistent >>> catalog/db >>>>>> is >>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would >> list >>>>>> system >>>>>>>>>> functions and functions in the current catalog/db, since users >>>> cannot >>>>>>>>>> designate a nonexistent catalog/db as current ones, how can >> they >>>> list >>>>>>>>>> functions in nonexistent catalog/db? They may end up never >>> knowing >>>>>> what >>>>>>>>>> temp functions they've created unless trying out with queries >> or >>> we >>>>>>>>>> introducing some more nonstandard SQL statements. The same >>> applies >>>> to >>>>>>>>> other >>>>>>>>>> temp objects like temp tables. >>>>>>>>>> >>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me >>>>>>>>>> >>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < >>>>>>>>>> [hidden email]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Ad. 1 >>>>>>>>>>> I wouldn't say it is hacky. >>>>>>>>>>> Moreover, how do you want ensure that the dB always exists >> when >>> a >>>>>>>>>> temporary >>>>>>>>>>> object is used?( in this particular case function). Do you >> want >>> to >>>>>>>>> query >>>>>>>>>>> for the database existence whenever e.g a temporary function >> is >>>>>>>> used? I >>>>>>>>>>> think important aspect here is that the database can be >> dropped >>>> from >>>>>>>>>>> external system, not just flink or a different flink session. >>>>>>>>>>> >>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a >>>>>>>> database >>>>>>>>>> that >>>>>>>>>>> does not exist, that's true. But if you create a temporary >> table >>>> in >>>>>> a >>>>>>>>>>> database and drop that database from a different session, you >>> can >>>>>>>> still >>>>>>>>>>> query the previously created temporary table from the original >>>>>>>> session. >>>>>>>>>> It >>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we >>>> make >>>>>>>>> this >>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime >> of >>> a >>>>>>>>>> database >>>>>>>>>>> explicit as part of the temporary objects contract? In the end >>>> they >>>>>>>>> exist >>>>>>>>>>> in different layers. Permanent objects & databases in a >> catalog >>>> (in >>>>>>>>> case >>>>>>>>>> of >>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions. >>>> That's >>>>>>>>> also >>>>>>>>>>> true for the original hive client. The temporary objects live >> in >>>> the >>>>>>>>> hive >>>>>>>>>>> client whereas databases are created in the metastore. >>>>>>>>>>> >>>>>>>>>>> Ad.2 >>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to >> achieve >>>>>> here >>>>>>>>> is >>>>>>>>>> so >>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One >>>>>> important >>>>>>>>>> thing >>>>>>>>>>> to remember here is that we need the function identifier to be >>>> part >>>>>>>> of >>>>>>>>>> the >>>>>>>>>>> FunctionDefinition object and not only as the result of the >>>> function >>>>>>>>>>> lookup. At some point we want to be able to store >>> QueryOperations >>>> in >>>>>>>>> the >>>>>>>>>>> catalogs. They can contain function calls within which we need >>> to >>>>>>>> have >>>>>>>>>> the >>>>>>>>>>> identifier. >>>>>>>>>>> >>>>>>>>>>> I agree my initial suggestion is over complicated. How about >> we >>>> have >>>>>>>>> just >>>>>>>>>>> the FunctionIdentifier as top level class without making the >>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the >>> same >>>>>>>> what >>>>>>>>>> you >>>>>>>>>>> suggested. The only difference is that it would be a top level >>>> class >>>>>>>>>> with a >>>>>>>>>>> more descriptive name. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> >>> wrote: >>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete >>>>>>>> alternative >>>>>>>>> is >>>>>>>>>>> the >>>>>>>>>>>> following, basically having separate APIs in FunctionLookup >> for >>>>>>>>>> ambiguous >>>>>>>>>>>> and precise function lookup since planner is able to tell >> which >>>> API >>>>>>>>> to >>>>>>>>>>> call >>>>>>>>>>>> with parsed queries, and have a unified result: >>>>>>>>>>>> >>>>>>>>>>>> ``` >>>>>>>>>>>> class FunctionLookup { >>>>>>>>>>>> >>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> class Result { >>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... >> } >>>>>>>>>>>> String getName() { ... } >>>>>>>>>>>> // ... >>>>>>>>>>>> } >>>>>>>>>>>> >>>>>>>>>>>> } >>>>>>>>>>>> ``` >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Bowen >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li < >> [hidden email]> >>>>>>>>> wrote: >>>>>>>>>>>>> Hi Dawid, >>>>>>>>>>>>> >>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments >> is >>>>>>>>>>> important. >>>>>>>>>>>>> However, I'm not sure allowing users to register temp >>> functions >>>>>>>> in >>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit >>> hacky, >>>>>>>>> and >>>>>>>>>>>> breaks >>>>>>>>>>>>> the current contract between Flink and users that catalog/db >>>> must >>>>>>>>> be >>>>>>>>>>>> valid >>>>>>>>>>>>> in order to operate on. >>>>>>>>>>>>> >>>>>>>>>>>>> How about we instead focus on making it convenient to create >>>>>>>>>> catalogs? >>>>>>>>>>>>> Users actually can already do it with ease via program or >> SQL >>>> CLI >>>>>>>>>> yaml >>>>>>>>>>>> file >>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency >>> nor >>>>>>>>>>> external >>>>>>>>>>>>> connections. What we can further improve is DDL for >> catalogs, >>>>>>>> and I >>>>>>>>>>>> raised >>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] >>> driven >>>>>>>> by >>>>>>>>>>> Terry >>>>>>>>>>>>> now. >>>>>>>>>>>>> >>>>>>>>>>>>> In that case, if users would like to experiment via SQL, >> they >>>> can >>>>>>>>>>> easily >>>>>>>>>>>>> create an in memory catalog/database using DDL, then play >> with >>>>>>>> temp >>>>>>>>>>>>> functions. >>>>>>>>>>>>> >>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier >>> has >>>>>>>> not >>>>>>>>>>> been >>>>>>>>>>>>> resolved when stack call reaches >>>>>>>> FunctionCatalog#lookupFunction(), >>>>>>>>>> but >>>>>>>>>>>> only >>>>>>>>>>>>> been parsed? >>>>>>>>>>>>> >>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok >>>> with >>>>>>>>> the >>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass >>> of >>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive. >>>>>>>>>>>>> >>>>>>>>>>>>> Another potentially simpler way is: >>>>>>>>>>>>> >>>>>>>>>>>>> ``` >>>>>>>>>>>>> // in class FunctionLookup >>>>>>>>>>>>> class Result { >>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >> ... } >>>>>>>>>>>>> String getName() { ... } >>>>>>>>>>>>> ... >>>>>>>>>>>>> } >>>>>>>>>>>>> ``` >>>>>>>>>>>>> >>>>>>>>>>>>> WDYT? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < >>>>>>>>>>>>> [hidden email]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> I really like the flip and think it clarifies important >>> aspects >>>>>>>> of >>>>>>>>>> the >>>>>>>>>>>>>> system. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take >>> much >>>>>>>>> time >>>>>>>>>> to >>>>>>>>>>>>>> agree on. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the >>>>>>>> existence >>>>>>>>> of >>>>>>>>>>>>>> cat/db >>>>>>>>>>>>>> for temporary functions? That means not to check it, so >> e.g. >>>>>>>> it's >>>>>>>>>>>> possible >>>>>>>>>>>>>> to create a temporary function in a database that does not >>>>>>>> exist. >>>>>>>>> I >>>>>>>>>>>> think >>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform >>>>>>>>>> experiments >>>>>>>>>>>> but >>>>>>>>>>>>>> does not have access to the db yet or temporarily does not >>> have >>>>>>>>>>>> connection >>>>>>>>>>>>>> to a catalog. >>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not >>>> loosen >>>>>>>>> the >>>>>>>>>>>>>> requirements for all catalog objects such as tables, views, >>>>>>>> types >>>>>>>>>> just >>>>>>>>>>>> for >>>>>>>>>>>>>> the functions? It's really important later on from e.g the >>>>>>>>>>>> serializability >>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier >> is >>>>>>>> that >>>>>>>>> we >>>>>>>>>>>> know >>>>>>>>>>>>>> that it has been resolved. The suggested changes break that >>>>>>>>>>> assumption. >>>>>>>>>>>>>> What do you think about adding an interface >>> FunctionIdentifier >>>> { >>>>>>>>>>>>>> String getName(); >>>>>>>>>>>>>> >>>>>>>>>>>>>> /** >>>>>>>>>>>>>> Return 3-part identifier. Empty in case of a built-in >>>>>>>> function. >>>>>>>>>>>>>> */ >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier { >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >>>>>>>>>>>>>> return Optional.of(this); >>>>>>>>>>>>>> } >>>>>>>>>>>>>> } >>>>>>>>>>>>>> >>>>>>>>>>>>>> class SystemFunctionIdentifier implements >> FunctionIdentifier >>>>>>>> {...} >>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> >>> wrote: >>>>>>>>>>>>>>> +1. LGTM >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < >>>>>>>> [hidden email]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Terry Wang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Kurt >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < >>>>>>>>> [hidden email] >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], >> which >>>>>>>>>> we've >>>>>>>>>>>>>> reached >>>>>>>>>>>>>>>>>> consensus in [2]. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm >>>>>>>>> UTC, >>>>>>>>>>> Sep >>>>>>>>>>>>>> 26. >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> Bowen >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog >>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Xuefu Zhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> "In Honey We Trust!" >>>>>>>>>>>>>>> >>>>>>>>> -- >>>>>>>>> Best regards! >>>>>>>>> Rui Li >>>>>>>>> >>>>>> > signature.asc (849 bytes) Download Attachment |
Free forum by Nabble | Edit this page |