Introducing a new term "path" to APIs like
"getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to users, thus I feel "getSimpleName/getIdentifier" is fine. To summarize the discussion result. - categorize functions into 2 dimensions - system v.s. catalog, non-temp v.s. temp - and that give us 4 combinations - definition of FunctionIdentifier @PublicEvolving Class FunctionIdentifier { String name; ObjectIdentifier oi; // for temporary/non-temporary system function public FunctionIdentifier of(String name) { } // for temporary/non-temporary catalog function public FunctionIdentifier of(ObjectIdentifier identifier) { } Optional<ObjectIdentifier> getIdentifier() {} Optional<String> getSimpleName() {} } I've updated them to FLIP wiki. Please take a final look. I'll close the voting if there's no other concern raised within 24 hours. Cheers On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <[hidden email]> wrote: > Hi, > > I very much agree with Xuefu's summary of the two points, especially on > the "functionIdentifier doesn't need to reflect the categories". > > For the factory methods I think methods of should be enough: > > // for temporary/non-temporary system function > public FunctionIdentifier of(String name) { } > // for temporary/non-temporary catalog function > public FunctionIdentifier of(ObjectIdentifier identifier){ } > > In case of the getters I did not like the method name `getName` in the > original proposal, as in my opinion it could imply that it can return > also just the name part of an ObjectIdentifier, which should not be the > case. > > I'm fine with getSimpleName/getIdentifier, but want to throw in few > other suggestions: > > * getShortPath(Identifier)/getLongPath(Identifier), > > * getSystemPath(Identifier)/getCatalogPath(Identifier) > > +1 to any of the 3 options. > > One additional thing the FunctionIdentifier should be a PublicEvolving > class, as it is part of a PublicEvolving APIs e.g. CallExpression, which > user might need to access e.g. in a filter pushdown. > > I also support the Xuefu's suggestion not to support the "ALL" keyword > in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design > of it is not part of the FLIP-57, we do not need to agree on that in > this thread. > > Overall I think after updating the FLIP with the outcome of the > discussion I vote +1 for it. > > Best, > > Dawid > > > On 02/10/2019 00:21, Xuefu Z wrote: > > Here are some of my thoughts on the minor debates above: > > > > 1. +1 for 4 categories of functions. They are categorized along two > > dimensions of binary values: X: *temporary* vs non-temporary > (persistent); > > Y: *system* vs non-system (so said catalog). > > 2. In my opinion, class functionIdentifier doesn't really need to reflect > > the categories of the functions. Instead, we should decouple them to make > > the API more stable. Thus, my suggestion is: > > > > @Internal > > class FunctionIdentifier { > > // for temporary/non-temporary system function > > public FunctionIdentifier ofSimpleName(String name) { } > > // for temporary/non-temporary catalog function > > public FunctionIdentifier ofIdentifier(ObjectIdentifier > > identifier){ } > > public Optional<String> getSimpleName() { } > > public Optional<ObjectIdentifier> getIdentifier() { } > > } > > 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be: > > > > SHOW [TEMPORARY] [SYSTEM] FUNCTIONS. > > > > When either keyword is missing, "ALL" is implied along that dimension. We > > should always limit the search to the system function catalog and the > > current catalog/DB. I don't see a need of listing functions across > > different catalogs and databases. (It can be added later if that arises.) > > > > Thanks, > > Xuefu > > > > On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote: > > > >> Hi Dawid, > >> > >> Thanks for bringing the suggestions up. I was prototyping yesterday and > >> found out those places exactly as what you suggested. > >> > >> For CallExpression and UnresolvedCallExpression, I've added them to > >> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and > mark > >> that as a breaking change > >> > >> For FunctionIdentifier, the suggested changes LGTM. Just want to bring > up > >> an issue on naming. It seems to me how we now name functions categories > is > >> a bit unclear and confusing, which is reflected on the suggested APIs - > in > >> FunctionIdentifier you lay out, "builtin function" would include builtin > >> functions and temporary system functions as we are kind of using > "system" > >> and "built-in" interchangeably, and "catalog function" would include > >> catalog functions and temporary functions. I currently can see two > >> approaches to make it clearer to users. > >> > >> 1) Simplify FunctionIdentifier to be the following. As it's internal, we > >> add comments and explanation to devs on which case the APIs support. > >> However, I feel this approach would be somehow a bit conflicting to what > >> you want to achieve for the clarity of APIs > >> > >> @Internal > >> class FunctionIdentifier { > >> // for built-in function and temporary system function > >> public FunctionIdentifier of(String name) { } > >> // for temporary function and catalog function > >> public FunctionIdentifier of(ObjectIdentifier identifier){ } > >> public Optional<String> getFunctionName() { } > >> public Optional<ObjectIdentifier> getObjectIdentifier() { } > >> } > >> > >> 2) We can rename our function categories as following so there'll be > mainly > >> just two categories of functions, "system functions" and "catalog > >> functions", either of which can have temporary ones > >> > >> - built-in functions -> officially rename to "system functions" and > note > >> to users that "system" and "built-in" can be used interchangeably. We > >> prefer "system" because that's the keyword we decided to use in DDL that > >> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION") > >> - temporary system functions > >> - catalog functions > >> - temporary functions -> rename to "temporary catalog function" > >> > >> @Internal > >> class FunctionIdentifier { > >> // for temporary/non-temporary system function > >> public FunctionIdentifier ofSystemFunction(String name) { } > >> // for temporary/non-temporary catalog function > >> public FunctionIdentifier ofCatalogFunction(ObjectIdentifier > >> identifier){ } > >> public Optional<String> getSystemFunctionName() { } > >> public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() { > } > >> } > >> > >> WDYT? > >> > >> > >> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> wrote: > >> > >>> Thanks for the summary Bowen! > >>> > >>> Looks good to me. > >>> > >>> Cheers, > >>> Fabian > >>> > >>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li < > >> [hidden email] > >>>> : > >>>> Hi all, > >>>> > >>>> I've updated the FLIP wiki with the following changes: > >>>> > >>>> - Lifespan of temp functions are not tied to those of catalogs and > >>>> databases. Users can create temp functions even though catalogs/dbs in > >>>> their fully qualified names don't even exist. > >>>> - some new SQL commands > >>>> - "SHOW FUNCTIONS" - list names of temp and non-temp > >> system/built-in > >>>> functions, and names of temp and catalog functions in the current > >> catalog > >>>> and db > >>>> - "SHOW ALL FUNCTIONS" - list names of temp and non-temp > >> system/built > >>>> functions, and fully qualified names of temp and catalog functions in > >> all > >>>> catalogs and dbs > >>>> - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of > >> temp > >>>> functions in all catalog and db > >>>> - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp > >>> system > >>>> functions > >>>> > >>>> Let me know if you have any questions. > >>>> > >>>> Seems we have resolved all concerns. If there's no more ones, I'd like > >> to > >>>> close the vote by this time tomorrow. > >>>> > >>>> Cheers, > >>>> Bowen > >>>> > >>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> > wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I think above are some valid points, and we can adopt the > >> suggestions. > >>>>> To elaborate a bit on the new SQL syntax, it would imply that, unlike > >>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL > >> [TEMPORARY] > >>>>> FUNCTIONS" would return functions' fully qualified names with catalog > >>> and > >>>>> db names. > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> > >>> wrote: > >>>>>> Hi all, > >>>>>> > >>>>>> I support Fabian's arguments. In my opinion, temporary objects > >> should > >>>>>> just be an additional layer on top of the regular catalog/database > >>>>>> lookup logic. Thus, a temporary table or function has always highest > >>>>>> precedence and should be stable within the local session. Otherwise > >> it > >>>>>> could magically disappear while someone else is performing > >>> modifications > >>>>>> in the catalog. > >>>>>> > >>>>>> Furthermore, this feature is very useful for prototyping as users > >> can > >>>>>> simply express that a catalog/database is present even through they > >>>>>> might not have access to it currently. > >>>>>> > >>>>>> Regards, > >>>>>> Timo > >>>>>> > >>>>>> > >>>>>> On 30.09.19 14:57, Fabian Hueske wrote: > >>>>>>> Hi all, > >>>>>>> > >>>>>>> Sorry for the late reply. > >>>>>>> > >>>>>>> I think it might lead to confusing situations if temporary > >> functions > >>>> (or > >>>>>>> any temporary db objects for that matter) are bound to the life > >>> cycle > >>>>>> of an > >>>>>>> (external) db/catalog. > >>>>>>> Imaging a situation where you create a temp function in a db in an > >>>>>> external > >>>>>>> catalog and use it but at some point it does not work anymore > >>> because > >>>>>> some > >>>>>>> other dropped the database from the external catalog. > >>>>>>> Shouldn't temporary objects be only controlled by the owner of a > >>>>>> session? > >>>>>>> I agree that creating temp objects in non-existing db/catalogs > >>> sounds > >>>> a > >>>>>> bit > >>>>>>> strange, but IMO the opposite (the db/catalog must exist for a > >> temp > >>>>>>> function to be created/exist) can have significant implications > >> like > >>>> the > >>>>>>> one I described. > >>>>>>> I think it would be quite easy for users to understand that > >>> temporary > >>>>>>> objects are solely owned by them (and their session). > >>>>>>> The problem of listing temporary objects could be solved by > >> adding a > >>>> ALL > >>>>>>> [TEMPORARY] clause: > >>>>>>> > >>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the > >>>>>>> catalog/database including temporary functions. > >>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions > >>>>>> regardless > >>>>>>> of the catalog/database. > >>>>>>> > >>>>>>> Best, > >>>>>>> Fabian > >>>>>>> > >>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < > >>>>>> [hidden email]>: > >>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can > >> close > >>>> the > >>>>>>>> voting. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> > >>>> wrote: > >>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just > >> want > >>> to > >>>>>> try > >>>>>>>>> out temporary functions, they can create temporary system > >>> functions > >>>>>> which > >>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow > >>>>>> temporary > >>>>>>>>> catalog function is to let users override permanent catalog > >>>> functions. > >>>>>>>>> Therefore a temporary function in a non-existing catalog won't > >>> serve > >>>>>> that > >>>>>>>>> purpose. Besides, each session is provided with a default > >> catalog > >>>> and > >>>>>> DB. > >>>>>>>>> So even if users simply want to create some catalog functions > >> they > >>>> can > >>>>>>>>> forget about after the session, wouldn't the default catalog/DB > >> be > >>>>>> enough > >>>>>>>>> for such experiments? > >>>>>>>>> > >>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> > >>>> wrote: > >>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will > >> first > >>>>>> make > >>>>>>>>> sure > >>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is > >> triggered > >>> to > >>>>>>>>> remove > >>>>>>>>>> that temp function. > >>>>>>>>>> > >>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not > >>> copying > >>>>>>>> Hive. > >>>>>>>>>> IMHO, allowing registering temp functions in nonexistent > >>> catalog/db > >>>>>> is > >>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would > >> list > >>>>>> system > >>>>>>>>>> functions and functions in the current catalog/db, since users > >>>> cannot > >>>>>>>>>> designate a nonexistent catalog/db as current ones, how can > >> they > >>>> list > >>>>>>>>>> functions in nonexistent catalog/db? They may end up never > >>> knowing > >>>>>> what > >>>>>>>>>> temp functions they've created unless trying out with queries > >> or > >>> we > >>>>>>>>>> introducing some more nonstandard SQL statements. The same > >>> applies > >>>> to > >>>>>>>>> other > >>>>>>>>>> temp objects like temp tables. > >>>>>>>>>> > >>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me > >>>>>>>>>> > >>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > >>>>>>>>>> [hidden email]> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Ad. 1 > >>>>>>>>>>> I wouldn't say it is hacky. > >>>>>>>>>>> Moreover, how do you want ensure that the dB always exists > >> when > >>> a > >>>>>>>>>> temporary > >>>>>>>>>>> object is used?( in this particular case function). Do you > >> want > >>> to > >>>>>>>>> query > >>>>>>>>>>> for the database existence whenever e.g a temporary function > >> is > >>>>>>>> used? I > >>>>>>>>>>> think important aspect here is that the database can be > >> dropped > >>>> from > >>>>>>>>>>> external system, not just flink or a different flink session. > >>>>>>>>>>> > >>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a > >>>>>>>> database > >>>>>>>>>> that > >>>>>>>>>>> does not exist, that's true. But if you create a temporary > >> table > >>>> in > >>>>>> a > >>>>>>>>>>> database and drop that database from a different session, you > >>> can > >>>>>>>> still > >>>>>>>>>>> query the previously created temporary table from the original > >>>>>>>> session. > >>>>>>>>>> It > >>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we > >>>> make > >>>>>>>>> this > >>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime > >> of > >>> a > >>>>>>>>>> database > >>>>>>>>>>> explicit as part of the temporary objects contract? In the end > >>>> they > >>>>>>>>> exist > >>>>>>>>>>> in different layers. Permanent objects & databases in a > >> catalog > >>>> (in > >>>>>>>>> case > >>>>>>>>>> of > >>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions. > >>>> That's > >>>>>>>>> also > >>>>>>>>>>> true for the original hive client. The temporary objects live > >> in > >>>> the > >>>>>>>>> hive > >>>>>>>>>>> client whereas databases are created in the metastore. > >>>>>>>>>>> > >>>>>>>>>>> Ad.2 > >>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to > >> achieve > >>>>>> here > >>>>>>>>> is > >>>>>>>>>> so > >>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One > >>>>>> important > >>>>>>>>>> thing > >>>>>>>>>>> to remember here is that we need the function identifier to be > >>>> part > >>>>>>>> of > >>>>>>>>>> the > >>>>>>>>>>> FunctionDefinition object and not only as the result of the > >>>> function > >>>>>>>>>>> lookup. At some point we want to be able to store > >>> QueryOperations > >>>> in > >>>>>>>>> the > >>>>>>>>>>> catalogs. They can contain function calls within which we need > >>> to > >>>>>>>> have > >>>>>>>>>> the > >>>>>>>>>>> identifier. > >>>>>>>>>>> > >>>>>>>>>>> I agree my initial suggestion is over complicated. How about > >> we > >>>> have > >>>>>>>>> just > >>>>>>>>>>> the FunctionIdentifier as top level class without making the > >>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the > >>> same > >>>>>>>> what > >>>>>>>>>> you > >>>>>>>>>>> suggested. The only difference is that it would be a top level > >>>> class > >>>>>>>>>> with a > >>>>>>>>>>> more descriptive name. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> > >>> wrote: > >>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete > >>>>>>>> alternative > >>>>>>>>> is > >>>>>>>>>>> the > >>>>>>>>>>>> following, basically having separate APIs in FunctionLookup > >> for > >>>>>>>>>> ambiguous > >>>>>>>>>>>> and precise function lookup since planner is able to tell > >> which > >>>> API > >>>>>>>>> to > >>>>>>>>>>> call > >>>>>>>>>>>> with parsed queries, and have a unified result: > >>>>>>>>>>>> > >>>>>>>>>>>> ``` > >>>>>>>>>>>> class FunctionLookup { > >>>>>>>>>>>> > >>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name); > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> class Result { > >>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... > >> } > >>>>>>>>>>>> String getName() { ... } > >>>>>>>>>>>> // ... > >>>>>>>>>>>> } > >>>>>>>>>>>> > >>>>>>>>>>>> } > >>>>>>>>>>>> ``` > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Bowen > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li < > >> [hidden email]> > >>>>>>>>> wrote: > >>>>>>>>>>>>> Hi Dawid, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments > >> is > >>>>>>>>>>> important. > >>>>>>>>>>>>> However, I'm not sure allowing users to register temp > >>> functions > >>>>>>>> in > >>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit > >>> hacky, > >>>>>>>>> and > >>>>>>>>>>>> breaks > >>>>>>>>>>>>> the current contract between Flink and users that catalog/db > >>>> must > >>>>>>>>> be > >>>>>>>>>>>> valid > >>>>>>>>>>>>> in order to operate on. > >>>>>>>>>>>>> > >>>>>>>>>>>>> How about we instead focus on making it convenient to create > >>>>>>>>>> catalogs? > >>>>>>>>>>>>> Users actually can already do it with ease via program or > >> SQL > >>>> CLI > >>>>>>>>>> yaml > >>>>>>>>>>>> file > >>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency > >>> nor > >>>>>>>>>>> external > >>>>>>>>>>>>> connections. What we can further improve is DDL for > >> catalogs, > >>>>>>>> and I > >>>>>>>>>>>> raised > >>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] > >>> driven > >>>>>>>> by > >>>>>>>>>>> Terry > >>>>>>>>>>>>> now. > >>>>>>>>>>>>> > >>>>>>>>>>>>> In that case, if users would like to experiment via SQL, > >> they > >>>> can > >>>>>>>>>>> easily > >>>>>>>>>>>>> create an in memory catalog/database using DDL, then play > >> with > >>>>>>>> temp > >>>>>>>>>>>>> functions. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier > >>> has > >>>>>>>> not > >>>>>>>>>>> been > >>>>>>>>>>>>> resolved when stack call reaches > >>>>>>>> FunctionCatalog#lookupFunction(), > >>>>>>>>>> but > >>>>>>>>>>>> only > >>>>>>>>>>>>> been parsed? > >>>>>>>>>>>>> > >>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok > >>>> with > >>>>>>>>> the > >>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass > >>> of > >>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Another potentially simpler way is: > >>>>>>>>>>>>> > >>>>>>>>>>>>> ``` > >>>>>>>>>>>>> // in class FunctionLookup > >>>>>>>>>>>>> class Result { > >>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > >> ... } > >>>>>>>>>>>>> String getName() { ... } > >>>>>>>>>>>>> ... > >>>>>>>>>>>>> } > >>>>>>>>>>>>> ``` > >>>>>>>>>>>>> > >>>>>>>>>>>>> WDYT? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > >>>>>>>>>>>>> [hidden email]> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> I really like the flip and think it clarifies important > >>> aspects > >>>>>>>> of > >>>>>>>>>> the > >>>>>>>>>>>>>> system. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take > >>> much > >>>>>>>>> time > >>>>>>>>>> to > >>>>>>>>>>>>>> agree on. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the > >>>>>>>> existence > >>>>>>>>> of > >>>>>>>>>>>>>> cat/db > >>>>>>>>>>>>>> for temporary functions? That means not to check it, so > >> e.g. > >>>>>>>> it's > >>>>>>>>>>>> possible > >>>>>>>>>>>>>> to create a temporary function in a database that does not > >>>>>>>> exist. > >>>>>>>>> I > >>>>>>>>>>>> think > >>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform > >>>>>>>>>> experiments > >>>>>>>>>>>> but > >>>>>>>>>>>>>> does not have access to the db yet or temporarily does not > >>> have > >>>>>>>>>>>> connection > >>>>>>>>>>>>>> to a catalog. > >>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not > >>>> loosen > >>>>>>>>> the > >>>>>>>>>>>>>> requirements for all catalog objects such as tables, views, > >>>>>>>> types > >>>>>>>>>> just > >>>>>>>>>>>> for > >>>>>>>>>>>>>> the functions? It's really important later on from e.g the > >>>>>>>>>>>> serializability > >>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier > >> is > >>>>>>>> that > >>>>>>>>> we > >>>>>>>>>>>> know > >>>>>>>>>>>>>> that it has been resolved. The suggested changes break that > >>>>>>>>>>> assumption. > >>>>>>>>>>>>>> What do you think about adding an interface > >>> FunctionIdentifier > >>>> { > >>>>>>>>>>>>>> String getName(); > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> /** > >>>>>>>>>>>>>> Return 3-part identifier. Empty in case of a built-in > >>>>>>>> function. > >>>>>>>>>>>>>> */ > >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier { > >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > >>>>>>>>>>>>>> return Optional.of(this); > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> } > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> class SystemFunctionIdentifier implements > >> FunctionIdentifier > >>>>>>>> {...} > >>>>>>>>>>>>>> WDYT? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> > >>> wrote: > >>>>>>>>>>>>>>> +1. LGTM > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > >>>>>>>> [hidden email]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>> +1 > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>> Terry Wang > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> +1 > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > >>>>>>>>> [hidden email] > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], > >> which > >>>>>>>>>> we've > >>>>>>>>>>>>>> reached > >>>>>>>>>>>>>>>>>> consensus in [2]. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm > >>>>>>>>> UTC, > >>>>>>>>>>> Sep > >>>>>>>>>>>>>> 26. > >>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>> Bowen > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > >>>>>>>>>>>>>>>>>> [2] > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > >>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>> Xuefu Zhang > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> "In Honey We Trust!" > >>>>>>>>>>>>>>> > >>>>>>>>> -- > >>>>>>>>> Best regards! > >>>>>>>>> Rui Li > >>>>>>>>> > >>>>>> > > > > |
I'm glad to announce that the community has accepted the design of FLIP-57,
and we are moving forward to implementing it. Thanks everyone! On Wed, Oct 2, 2019 at 11:01 AM Bowen Li <[hidden email]> wrote: > Introducing a new term "path" to APIs like > "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to > users, thus I feel "getSimpleName/getIdentifier" is fine. > > To summarize the discussion result. > > - categorize functions into 2 dimensions - system v.s. catalog, > non-temp v.s. temp - and that give us 4 combinations > - definition of FunctionIdentifier > > @PublicEvolving > > Class FunctionIdentifier { > > String name; > > ObjectIdentifier oi; > > // for temporary/non-temporary system function > public FunctionIdentifier of(String name) { } > // for temporary/non-temporary catalog function > public FunctionIdentifier of(ObjectIdentifier identifier) { } > > > Optional<ObjectIdentifier> getIdentifier() {} > > Optional<String> getSimpleName() {} > > } > > > I've updated them to FLIP wiki. Please take a final look. I'll close the > voting if there's no other concern raised within 24 hours. > > Cheers > > On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <[hidden email]> > wrote: > >> Hi, >> >> I very much agree with Xuefu's summary of the two points, especially on >> the "functionIdentifier doesn't need to reflect the categories". >> >> For the factory methods I think methods of should be enough: >> >> // for temporary/non-temporary system function >> public FunctionIdentifier of(String name) { } >> // for temporary/non-temporary catalog function >> public FunctionIdentifier of(ObjectIdentifier identifier){ } >> >> In case of the getters I did not like the method name `getName` in the >> original proposal, as in my opinion it could imply that it can return >> also just the name part of an ObjectIdentifier, which should not be the >> case. >> >> I'm fine with getSimpleName/getIdentifier, but want to throw in few >> other suggestions: >> >> * getShortPath(Identifier)/getLongPath(Identifier), >> >> * getSystemPath(Identifier)/getCatalogPath(Identifier) >> >> +1 to any of the 3 options. >> >> One additional thing the FunctionIdentifier should be a PublicEvolving >> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which >> user might need to access e.g. in a filter pushdown. >> >> I also support the Xuefu's suggestion not to support the "ALL" keyword >> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design >> of it is not part of the FLIP-57, we do not need to agree on that in >> this thread. >> >> Overall I think after updating the FLIP with the outcome of the >> discussion I vote +1 for it. >> >> Best, >> >> Dawid >> >> >> On 02/10/2019 00:21, Xuefu Z wrote: >> > Here are some of my thoughts on the minor debates above: >> > >> > 1. +1 for 4 categories of functions. They are categorized along two >> > dimensions of binary values: X: *temporary* vs non-temporary >> (persistent); >> > Y: *system* vs non-system (so said catalog). >> > 2. In my opinion, class functionIdentifier doesn't really need to >> reflect >> > the categories of the functions. Instead, we should decouple them to >> make >> > the API more stable. Thus, my suggestion is: >> > >> > @Internal >> > class FunctionIdentifier { >> > // for temporary/non-temporary system function >> > public FunctionIdentifier ofSimpleName(String name) { } >> > // for temporary/non-temporary catalog function >> > public FunctionIdentifier ofIdentifier(ObjectIdentifier >> > identifier){ } >> > public Optional<String> getSimpleName() { } >> > public Optional<ObjectIdentifier> getIdentifier() { } >> > } >> > 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be: >> > >> > SHOW [TEMPORARY] [SYSTEM] FUNCTIONS. >> > >> > When either keyword is missing, "ALL" is implied along that dimension. >> We >> > should always limit the search to the system function catalog and the >> > current catalog/DB. I don't see a need of listing functions across >> > different catalogs and databases. (It can be added later if that >> arises.) >> > >> > Thanks, >> > Xuefu >> > >> > On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote: >> > >> >> Hi Dawid, >> >> >> >> Thanks for bringing the suggestions up. I was prototyping yesterday and >> >> found out those places exactly as what you suggested. >> >> >> >> For CallExpression and UnresolvedCallExpression, I've added them to >> >> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and >> mark >> >> that as a breaking change >> >> >> >> For FunctionIdentifier, the suggested changes LGTM. Just want to bring >> up >> >> an issue on naming. It seems to me how we now name functions >> categories is >> >> a bit unclear and confusing, which is reflected on the suggested APIs >> - in >> >> FunctionIdentifier you lay out, "builtin function" would include >> builtin >> >> functions and temporary system functions as we are kind of using >> "system" >> >> and "built-in" interchangeably, and "catalog function" would include >> >> catalog functions and temporary functions. I currently can see two >> >> approaches to make it clearer to users. >> >> >> >> 1) Simplify FunctionIdentifier to be the following. As it's internal, >> we >> >> add comments and explanation to devs on which case the APIs support. >> >> However, I feel this approach would be somehow a bit conflicting to >> what >> >> you want to achieve for the clarity of APIs >> >> >> >> @Internal >> >> class FunctionIdentifier { >> >> // for built-in function and temporary system function >> >> public FunctionIdentifier of(String name) { } >> >> // for temporary function and catalog function >> >> public FunctionIdentifier of(ObjectIdentifier identifier){ } >> >> public Optional<String> getFunctionName() { } >> >> public Optional<ObjectIdentifier> getObjectIdentifier() { } >> >> } >> >> >> >> 2) We can rename our function categories as following so there'll be >> mainly >> >> just two categories of functions, "system functions" and "catalog >> >> functions", either of which can have temporary ones >> >> >> >> - built-in functions -> officially rename to "system functions" and >> note >> >> to users that "system" and "built-in" can be used interchangeably. We >> >> prefer "system" because that's the keyword we decided to use in DDL >> that >> >> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION") >> >> - temporary system functions >> >> - catalog functions >> >> - temporary functions -> rename to "temporary catalog function" >> >> >> >> @Internal >> >> class FunctionIdentifier { >> >> // for temporary/non-temporary system function >> >> public FunctionIdentifier ofSystemFunction(String name) { } >> >> // for temporary/non-temporary catalog function >> >> public FunctionIdentifier ofCatalogFunction(ObjectIdentifier >> >> identifier){ } >> >> public Optional<String> getSystemFunctionName() { } >> >> public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() >> { } >> >> } >> >> >> >> WDYT? >> >> >> >> >> >> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> >> wrote: >> >> >> >>> Thanks for the summary Bowen! >> >>> >> >>> Looks good to me. >> >>> >> >>> Cheers, >> >>> Fabian >> >>> >> >>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li < >> >> [hidden email] >> >>>> : >> >>>> Hi all, >> >>>> >> >>>> I've updated the FLIP wiki with the following changes: >> >>>> >> >>>> - Lifespan of temp functions are not tied to those of catalogs and >> >>>> databases. Users can create temp functions even though catalogs/dbs >> in >> >>>> their fully qualified names don't even exist. >> >>>> - some new SQL commands >> >>>> - "SHOW FUNCTIONS" - list names of temp and non-temp >> >> system/built-in >> >>>> functions, and names of temp and catalog functions in the current >> >> catalog >> >>>> and db >> >>>> - "SHOW ALL FUNCTIONS" - list names of temp and non-temp >> >> system/built >> >>>> functions, and fully qualified names of temp and catalog functions in >> >> all >> >>>> catalogs and dbs >> >>>> - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of >> >> temp >> >>>> functions in all catalog and db >> >>>> - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp >> >>> system >> >>>> functions >> >>>> >> >>>> Let me know if you have any questions. >> >>>> >> >>>> Seems we have resolved all concerns. If there's no more ones, I'd >> like >> >> to >> >>>> close the vote by this time tomorrow. >> >>>> >> >>>> Cheers, >> >>>> Bowen >> >>>> >> >>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> >> wrote: >> >>>> >> >>>>> Hi, >> >>>>> >> >>>>> I think above are some valid points, and we can adopt the >> >> suggestions. >> >>>>> To elaborate a bit on the new SQL syntax, it would imply that, >> unlike >> >>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL >> >> [TEMPORARY] >> >>>>> FUNCTIONS" would return functions' fully qualified names with >> catalog >> >>> and >> >>>>> db names. >> >>>>> >> >>>>> >> >>>>> >> >>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> >> >>> wrote: >> >>>>>> Hi all, >> >>>>>> >> >>>>>> I support Fabian's arguments. In my opinion, temporary objects >> >> should >> >>>>>> just be an additional layer on top of the regular catalog/database >> >>>>>> lookup logic. Thus, a temporary table or function has always >> highest >> >>>>>> precedence and should be stable within the local session. Otherwise >> >> it >> >>>>>> could magically disappear while someone else is performing >> >>> modifications >> >>>>>> in the catalog. >> >>>>>> >> >>>>>> Furthermore, this feature is very useful for prototyping as users >> >> can >> >>>>>> simply express that a catalog/database is present even through they >> >>>>>> might not have access to it currently. >> >>>>>> >> >>>>>> Regards, >> >>>>>> Timo >> >>>>>> >> >>>>>> >> >>>>>> On 30.09.19 14:57, Fabian Hueske wrote: >> >>>>>>> Hi all, >> >>>>>>> >> >>>>>>> Sorry for the late reply. >> >>>>>>> >> >>>>>>> I think it might lead to confusing situations if temporary >> >> functions >> >>>> (or >> >>>>>>> any temporary db objects for that matter) are bound to the life >> >>> cycle >> >>>>>> of an >> >>>>>>> (external) db/catalog. >> >>>>>>> Imaging a situation where you create a temp function in a db in an >> >>>>>> external >> >>>>>>> catalog and use it but at some point it does not work anymore >> >>> because >> >>>>>> some >> >>>>>>> other dropped the database from the external catalog. >> >>>>>>> Shouldn't temporary objects be only controlled by the owner of a >> >>>>>> session? >> >>>>>>> I agree that creating temp objects in non-existing db/catalogs >> >>> sounds >> >>>> a >> >>>>>> bit >> >>>>>>> strange, but IMO the opposite (the db/catalog must exist for a >> >> temp >> >>>>>>> function to be created/exist) can have significant implications >> >> like >> >>>> the >> >>>>>>> one I described. >> >>>>>>> I think it would be quite easy for users to understand that >> >>> temporary >> >>>>>>> objects are solely owned by them (and their session). >> >>>>>>> The problem of listing temporary objects could be solved by >> >> adding a >> >>>> ALL >> >>>>>>> [TEMPORARY] clause: >> >>>>>>> >> >>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the >> >>>>>>> catalog/database including temporary functions. >> >>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions >> >>>>>> regardless >> >>>>>>> of the catalog/database. >> >>>>>>> >> >>>>>>> Best, >> >>>>>>> Fabian >> >>>>>>> >> >>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < >> >>>>>> [hidden email]>: >> >>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can >> >> close >> >>>> the >> >>>>>>>> voting. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> >> >>>> wrote: >> >>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just >> >> want >> >>> to >> >>>>>> try >> >>>>>>>>> out temporary functions, they can create temporary system >> >>> functions >> >>>>>> which >> >>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow >> >>>>>> temporary >> >>>>>>>>> catalog function is to let users override permanent catalog >> >>>> functions. >> >>>>>>>>> Therefore a temporary function in a non-existing catalog won't >> >>> serve >> >>>>>> that >> >>>>>>>>> purpose. Besides, each session is provided with a default >> >> catalog >> >>>> and >> >>>>>> DB. >> >>>>>>>>> So even if users simply want to create some catalog functions >> >> they >> >>>> can >> >>>>>>>>> forget about after the session, wouldn't the default catalog/DB >> >> be >> >>>>>> enough >> >>>>>>>>> for such experiments? >> >>>>>>>>> >> >>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> >> >>>> wrote: >> >>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will >> >> first >> >>>>>> make >> >>>>>>>>> sure >> >>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is >> >> triggered >> >>> to >> >>>>>>>>> remove >> >>>>>>>>>> that temp function. >> >>>>>>>>>> >> >>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not >> >>> copying >> >>>>>>>> Hive. >> >>>>>>>>>> IMHO, allowing registering temp functions in nonexistent >> >>> catalog/db >> >>>>>> is >> >>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would >> >> list >> >>>>>> system >> >>>>>>>>>> functions and functions in the current catalog/db, since users >> >>>> cannot >> >>>>>>>>>> designate a nonexistent catalog/db as current ones, how can >> >> they >> >>>> list >> >>>>>>>>>> functions in nonexistent catalog/db? They may end up never >> >>> knowing >> >>>>>> what >> >>>>>>>>>> temp functions they've created unless trying out with queries >> >> or >> >>> we >> >>>>>>>>>> introducing some more nonstandard SQL statements. The same >> >>> applies >> >>>> to >> >>>>>>>>> other >> >>>>>>>>>> temp objects like temp tables. >> >>>>>>>>>> >> >>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me >> >>>>>>>>>> >> >>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < >> >>>>>>>>>> [hidden email]> >> >>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> Ad. 1 >> >>>>>>>>>>> I wouldn't say it is hacky. >> >>>>>>>>>>> Moreover, how do you want ensure that the dB always exists >> >> when >> >>> a >> >>>>>>>>>> temporary >> >>>>>>>>>>> object is used?( in this particular case function). Do you >> >> want >> >>> to >> >>>>>>>>> query >> >>>>>>>>>>> for the database existence whenever e.g a temporary function >> >> is >> >>>>>>>> used? I >> >>>>>>>>>>> think important aspect here is that the database can be >> >> dropped >> >>>> from >> >>>>>>>>>>> external system, not just flink or a different flink session. >> >>>>>>>>>>> >> >>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a >> >>>>>>>> database >> >>>>>>>>>> that >> >>>>>>>>>>> does not exist, that's true. But if you create a temporary >> >> table >> >>>> in >> >>>>>> a >> >>>>>>>>>>> database and drop that database from a different session, you >> >>> can >> >>>>>>>> still >> >>>>>>>>>>> query the previously created temporary table from the original >> >>>>>>>> session. >> >>>>>>>>>> It >> >>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we >> >>>> make >> >>>>>>>>> this >> >>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime >> >> of >> >>> a >> >>>>>>>>>> database >> >>>>>>>>>>> explicit as part of the temporary objects contract? In the end >> >>>> they >> >>>>>>>>> exist >> >>>>>>>>>>> in different layers. Permanent objects & databases in a >> >> catalog >> >>>> (in >> >>>>>>>>> case >> >>>>>>>>>> of >> >>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions. >> >>>> That's >> >>>>>>>>> also >> >>>>>>>>>>> true for the original hive client. The temporary objects live >> >> in >> >>>> the >> >>>>>>>>> hive >> >>>>>>>>>>> client whereas databases are created in the metastore. >> >>>>>>>>>>> >> >>>>>>>>>>> Ad.2 >> >>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to >> >> achieve >> >>>>>> here >> >>>>>>>>> is >> >>>>>>>>>> so >> >>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One >> >>>>>> important >> >>>>>>>>>> thing >> >>>>>>>>>>> to remember here is that we need the function identifier to be >> >>>> part >> >>>>>>>> of >> >>>>>>>>>> the >> >>>>>>>>>>> FunctionDefinition object and not only as the result of the >> >>>> function >> >>>>>>>>>>> lookup. At some point we want to be able to store >> >>> QueryOperations >> >>>> in >> >>>>>>>>> the >> >>>>>>>>>>> catalogs. They can contain function calls within which we need >> >>> to >> >>>>>>>> have >> >>>>>>>>>> the >> >>>>>>>>>>> identifier. >> >>>>>>>>>>> >> >>>>>>>>>>> I agree my initial suggestion is over complicated. How about >> >> we >> >>>> have >> >>>>>>>>> just >> >>>>>>>>>>> the FunctionIdentifier as top level class without making the >> >>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the >> >>> same >> >>>>>>>> what >> >>>>>>>>>> you >> >>>>>>>>>>> suggested. The only difference is that it would be a top level >> >>>> class >> >>>>>>>>>> with a >> >>>>>>>>>>> more descriptive name. >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> >> >>> wrote: >> >>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete >> >>>>>>>> alternative >> >>>>>>>>> is >> >>>>>>>>>>> the >> >>>>>>>>>>>> following, basically having separate APIs in FunctionLookup >> >> for >> >>>>>>>>>> ambiguous >> >>>>>>>>>>>> and precise function lookup since planner is able to tell >> >> which >> >>>> API >> >>>>>>>>> to >> >>>>>>>>>>> call >> >>>>>>>>>>>> with parsed queries, and have a unified result: >> >>>>>>>>>>>> >> >>>>>>>>>>>> ``` >> >>>>>>>>>>>> class FunctionLookup { >> >>>>>>>>>>>> >> >>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name); >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> class Result { >> >>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... >> >> } >> >>>>>>>>>>>> String getName() { ... } >> >>>>>>>>>>>> // ... >> >>>>>>>>>>>> } >> >>>>>>>>>>>> >> >>>>>>>>>>>> } >> >>>>>>>>>>>> ``` >> >>>>>>>>>>>> >> >>>>>>>>>>>> Thanks, >> >>>>>>>>>>>> Bowen >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li < >> >> [hidden email]> >> >>>>>>>>> wrote: >> >>>>>>>>>>>>> Hi Dawid, >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments >> >> is >> >>>>>>>>>>> important. >> >>>>>>>>>>>>> However, I'm not sure allowing users to register temp >> >>> functions >> >>>>>>>> in >> >>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit >> >>> hacky, >> >>>>>>>>> and >> >>>>>>>>>>>> breaks >> >>>>>>>>>>>>> the current contract between Flink and users that catalog/db >> >>>> must >> >>>>>>>>> be >> >>>>>>>>>>>> valid >> >>>>>>>>>>>>> in order to operate on. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> How about we instead focus on making it convenient to create >> >>>>>>>>>> catalogs? >> >>>>>>>>>>>>> Users actually can already do it with ease via program or >> >> SQL >> >>>> CLI >> >>>>>>>>>> yaml >> >>>>>>>>>>>> file >> >>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency >> >>> nor >> >>>>>>>>>>> external >> >>>>>>>>>>>>> connections. What we can further improve is DDL for >> >> catalogs, >> >>>>>>>> and I >> >>>>>>>>>>>> raised >> >>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] >> >>> driven >> >>>>>>>> by >> >>>>>>>>>>> Terry >> >>>>>>>>>>>>> now. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> In that case, if users would like to experiment via SQL, >> >> they >> >>>> can >> >>>>>>>>>>> easily >> >>>>>>>>>>>>> create an in memory catalog/database using DDL, then play >> >> with >> >>>>>>>> temp >> >>>>>>>>>>>>> functions. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier >> >>> has >> >>>>>>>> not >> >>>>>>>>>>> been >> >>>>>>>>>>>>> resolved when stack call reaches >> >>>>>>>> FunctionCatalog#lookupFunction(), >> >>>>>>>>>> but >> >>>>>>>>>>>> only >> >>>>>>>>>>>>> been parsed? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok >> >>>> with >> >>>>>>>>> the >> >>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass >> >>> of >> >>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Another potentially simpler way is: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> ``` >> >>>>>>>>>>>>> // in class FunctionLookup >> >>>>>>>>>>>>> class Result { >> >>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >> >> ... } >> >>>>>>>>>>>>> String getName() { ... } >> >>>>>>>>>>>>> ... >> >>>>>>>>>>>>> } >> >>>>>>>>>>>>> ``` >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> WDYT? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < >> >>>>>>>>>>>>> [hidden email]> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>>> I really like the flip and think it clarifies important >> >>> aspects >> >>>>>>>> of >> >>>>>>>>>> the >> >>>>>>>>>>>>>> system. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take >> >>> much >> >>>>>>>>> time >> >>>>>>>>>> to >> >>>>>>>>>>>>>> agree on. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the >> >>>>>>>> existence >> >>>>>>>>> of >> >>>>>>>>>>>>>> cat/db >> >>>>>>>>>>>>>> for temporary functions? That means not to check it, so >> >> e.g. >> >>>>>>>> it's >> >>>>>>>>>>>> possible >> >>>>>>>>>>>>>> to create a temporary function in a database that does not >> >>>>>>>> exist. >> >>>>>>>>> I >> >>>>>>>>>>>> think >> >>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform >> >>>>>>>>>> experiments >> >>>>>>>>>>>> but >> >>>>>>>>>>>>>> does not have access to the db yet or temporarily does not >> >>> have >> >>>>>>>>>>>> connection >> >>>>>>>>>>>>>> to a catalog. >> >>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not >> >>>> loosen >> >>>>>>>>> the >> >>>>>>>>>>>>>> requirements for all catalog objects such as tables, views, >> >>>>>>>> types >> >>>>>>>>>> just >> >>>>>>>>>>>> for >> >>>>>>>>>>>>>> the functions? It's really important later on from e.g the >> >>>>>>>>>>>> serializability >> >>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier >> >> is >> >>>>>>>> that >> >>>>>>>>> we >> >>>>>>>>>>>> know >> >>>>>>>>>>>>>> that it has been resolved. The suggested changes break that >> >>>>>>>>>>> assumption. >> >>>>>>>>>>>>>> What do you think about adding an interface >> >>> FunctionIdentifier >> >>>> { >> >>>>>>>>>>>>>> String getName(); >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> /** >> >>>>>>>>>>>>>> Return 3-part identifier. Empty in case of a built-in >> >>>>>>>> function. >> >>>>>>>>>>>>>> */ >> >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() >> >>>>>>>>>>>>>> } >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier { >> >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >> >>>>>>>>>>>>>> return Optional.of(this); >> >>>>>>>>>>>>>> } >> >>>>>>>>>>>>>> } >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> class SystemFunctionIdentifier implements >> >> FunctionIdentifier >> >>>>>>>> {...} >> >>>>>>>>>>>>>> WDYT? >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> >> >>> wrote: >> >>>>>>>>>>>>>>> +1. LGTM >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < >> >>>>>>>> [hidden email]> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>> +1 >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>> Terry Wang >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> +1 >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> Best, >> >>>>>>>>>>>>>>>>> Kurt >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < >> >>>>>>>>> [hidden email] >> >>>>>>>>>>>>>> wrote: >> >>>>>>>>>>>>>>>>>> Hi all, >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], >> >> which >> >>>>>>>>>> we've >> >>>>>>>>>>>>>> reached >> >>>>>>>>>>>>>>>>>> consensus in [2]. >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm >> >>>>>>>>> UTC, >> >>>>>>>>>>> Sep >> >>>>>>>>>>>>>> 26. >> >>>>>>>>>>>>>>>>>> Thanks, >> >>>>>>>>>>>>>>>>>> Bowen >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> [1] >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog >> >>>>>>>>>>>>>>>>>> [2] >> >>>>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>>>> >> >> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 >> >>>>>>>>>>>>>>> -- >> >>>>>>>>>>>>>>> Xuefu Zhang >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> "In Honey We Trust!" >> >>>>>>>>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> Best regards! >> >>>>>>>>> Rui Li >> >>>>>>>>> >> >>>>>> >> > >> >> |
Hi,
I see there was quite some discussion and changes on the FLIP after this VOTE was started. I would suggest to start a new voting thread on the current state of the FLIP (keeping in mind that a FLIP vote needs at least three committer/PMC votes). For the future, we should probably keep discussion to the [DISCUSS] thread and use the vote thread only for voting. Best, Aljoscha > On 3. Oct 2019, at 21:17, Bowen Li <[hidden email]> wrote: > > I'm glad to announce that the community has accepted the design of FLIP-57, > and we are moving forward to implementing it. > > Thanks everyone! > > On Wed, Oct 2, 2019 at 11:01 AM Bowen Li <[hidden email]> wrote: > >> Introducing a new term "path" to APIs like >> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to >> users, thus I feel "getSimpleName/getIdentifier" is fine. >> >> To summarize the discussion result. >> >> - categorize functions into 2 dimensions - system v.s. catalog, >> non-temp v.s. temp - and that give us 4 combinations >> - definition of FunctionIdentifier >> >> @PublicEvolving >> >> Class FunctionIdentifier { >> >> String name; >> >> ObjectIdentifier oi; >> >> // for temporary/non-temporary system function >> public FunctionIdentifier of(String name) { } >> // for temporary/non-temporary catalog function >> public FunctionIdentifier of(ObjectIdentifier identifier) { } >> >> >> Optional<ObjectIdentifier> getIdentifier() {} >> >> Optional<String> getSimpleName() {} >> >> } >> >> >> I've updated them to FLIP wiki. Please take a final look. I'll close the >> voting if there's no other concern raised within 24 hours. >> >> Cheers >> >> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <[hidden email]> >> wrote: >> >>> Hi, >>> >>> I very much agree with Xuefu's summary of the two points, especially on >>> the "functionIdentifier doesn't need to reflect the categories". >>> >>> For the factory methods I think methods of should be enough: >>> >>> // for temporary/non-temporary system function >>> public FunctionIdentifier of(String name) { } >>> // for temporary/non-temporary catalog function >>> public FunctionIdentifier of(ObjectIdentifier identifier){ } >>> >>> In case of the getters I did not like the method name `getName` in the >>> original proposal, as in my opinion it could imply that it can return >>> also just the name part of an ObjectIdentifier, which should not be the >>> case. >>> >>> I'm fine with getSimpleName/getIdentifier, but want to throw in few >>> other suggestions: >>> >>> * getShortPath(Identifier)/getLongPath(Identifier), >>> >>> * getSystemPath(Identifier)/getCatalogPath(Identifier) >>> >>> +1 to any of the 3 options. >>> >>> One additional thing the FunctionIdentifier should be a PublicEvolving >>> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which >>> user might need to access e.g. in a filter pushdown. >>> >>> I also support the Xuefu's suggestion not to support the "ALL" keyword >>> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design >>> of it is not part of the FLIP-57, we do not need to agree on that in >>> this thread. >>> >>> Overall I think after updating the FLIP with the outcome of the >>> discussion I vote +1 for it. >>> >>> Best, >>> >>> Dawid >>> >>> >>> On 02/10/2019 00:21, Xuefu Z wrote: >>>> Here are some of my thoughts on the minor debates above: >>>> >>>> 1. +1 for 4 categories of functions. They are categorized along two >>>> dimensions of binary values: X: *temporary* vs non-temporary >>> (persistent); >>>> Y: *system* vs non-system (so said catalog). >>>> 2. In my opinion, class functionIdentifier doesn't really need to >>> reflect >>>> the categories of the functions. Instead, we should decouple them to >>> make >>>> the API more stable. Thus, my suggestion is: >>>> >>>> @Internal >>>> class FunctionIdentifier { >>>> // for temporary/non-temporary system function >>>> public FunctionIdentifier ofSimpleName(String name) { } >>>> // for temporary/non-temporary catalog function >>>> public FunctionIdentifier ofIdentifier(ObjectIdentifier >>>> identifier){ } >>>> public Optional<String> getSimpleName() { } >>>> public Optional<ObjectIdentifier> getIdentifier() { } >>>> } >>>> 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be: >>>> >>>> SHOW [TEMPORARY] [SYSTEM] FUNCTIONS. >>>> >>>> When either keyword is missing, "ALL" is implied along that dimension. >>> We >>>> should always limit the search to the system function catalog and the >>>> current catalog/DB. I don't see a need of listing functions across >>>> different catalogs and databases. (It can be added later if that >>> arises.) >>>> >>>> Thanks, >>>> Xuefu >>>> >>>> On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote: >>>> >>>>> Hi Dawid, >>>>> >>>>> Thanks for bringing the suggestions up. I was prototyping yesterday and >>>>> found out those places exactly as what you suggested. >>>>> >>>>> For CallExpression and UnresolvedCallExpression, I've added them to >>>>> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and >>> mark >>>>> that as a breaking change >>>>> >>>>> For FunctionIdentifier, the suggested changes LGTM. Just want to bring >>> up >>>>> an issue on naming. It seems to me how we now name functions >>> categories is >>>>> a bit unclear and confusing, which is reflected on the suggested APIs >>> - in >>>>> FunctionIdentifier you lay out, "builtin function" would include >>> builtin >>>>> functions and temporary system functions as we are kind of using >>> "system" >>>>> and "built-in" interchangeably, and "catalog function" would include >>>>> catalog functions and temporary functions. I currently can see two >>>>> approaches to make it clearer to users. >>>>> >>>>> 1) Simplify FunctionIdentifier to be the following. As it's internal, >>> we >>>>> add comments and explanation to devs on which case the APIs support. >>>>> However, I feel this approach would be somehow a bit conflicting to >>> what >>>>> you want to achieve for the clarity of APIs >>>>> >>>>> @Internal >>>>> class FunctionIdentifier { >>>>> // for built-in function and temporary system function >>>>> public FunctionIdentifier of(String name) { } >>>>> // for temporary function and catalog function >>>>> public FunctionIdentifier of(ObjectIdentifier identifier){ } >>>>> public Optional<String> getFunctionName() { } >>>>> public Optional<ObjectIdentifier> getObjectIdentifier() { } >>>>> } >>>>> >>>>> 2) We can rename our function categories as following so there'll be >>> mainly >>>>> just two categories of functions, "system functions" and "catalog >>>>> functions", either of which can have temporary ones >>>>> >>>>> - built-in functions -> officially rename to "system functions" and >>> note >>>>> to users that "system" and "built-in" can be used interchangeably. We >>>>> prefer "system" because that's the keyword we decided to use in DDL >>> that >>>>> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION") >>>>> - temporary system functions >>>>> - catalog functions >>>>> - temporary functions -> rename to "temporary catalog function" >>>>> >>>>> @Internal >>>>> class FunctionIdentifier { >>>>> // for temporary/non-temporary system function >>>>> public FunctionIdentifier ofSystemFunction(String name) { } >>>>> // for temporary/non-temporary catalog function >>>>> public FunctionIdentifier ofCatalogFunction(ObjectIdentifier >>>>> identifier){ } >>>>> public Optional<String> getSystemFunctionName() { } >>>>> public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() >>> { } >>>>> } >>>>> >>>>> WDYT? >>>>> >>>>> >>>>> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> >>> wrote: >>>>> >>>>>> Thanks for the summary Bowen! >>>>>> >>>>>> Looks good to me. >>>>>> >>>>>> Cheers, >>>>>> Fabian >>>>>> >>>>>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li < >>>>> [hidden email] >>>>>>> : >>>>>>> Hi all, >>>>>>> >>>>>>> I've updated the FLIP wiki with the following changes: >>>>>>> >>>>>>> - Lifespan of temp functions are not tied to those of catalogs and >>>>>>> databases. Users can create temp functions even though catalogs/dbs >>> in >>>>>>> their fully qualified names don't even exist. >>>>>>> - some new SQL commands >>>>>>> - "SHOW FUNCTIONS" - list names of temp and non-temp >>>>> system/built-in >>>>>>> functions, and names of temp and catalog functions in the current >>>>> catalog >>>>>>> and db >>>>>>> - "SHOW ALL FUNCTIONS" - list names of temp and non-temp >>>>> system/built >>>>>>> functions, and fully qualified names of temp and catalog functions in >>>>> all >>>>>>> catalogs and dbs >>>>>>> - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of >>>>> temp >>>>>>> functions in all catalog and db >>>>>>> - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp >>>>>> system >>>>>>> functions >>>>>>> >>>>>>> Let me know if you have any questions. >>>>>>> >>>>>>> Seems we have resolved all concerns. If there's no more ones, I'd >>> like >>>>> to >>>>>>> close the vote by this time tomorrow. >>>>>>> >>>>>>> Cheers, >>>>>>> Bowen >>>>>>> >>>>>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> >>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I think above are some valid points, and we can adopt the >>>>> suggestions. >>>>>>>> To elaborate a bit on the new SQL syntax, it would imply that, >>> unlike >>>>>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL >>>>> [TEMPORARY] >>>>>>>> FUNCTIONS" would return functions' fully qualified names with >>> catalog >>>>>> and >>>>>>>> db names. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> >>>>>> wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I support Fabian's arguments. In my opinion, temporary objects >>>>> should >>>>>>>>> just be an additional layer on top of the regular catalog/database >>>>>>>>> lookup logic. Thus, a temporary table or function has always >>> highest >>>>>>>>> precedence and should be stable within the local session. Otherwise >>>>> it >>>>>>>>> could magically disappear while someone else is performing >>>>>> modifications >>>>>>>>> in the catalog. >>>>>>>>> >>>>>>>>> Furthermore, this feature is very useful for prototyping as users >>>>> can >>>>>>>>> simply express that a catalog/database is present even through they >>>>>>>>> might not have access to it currently. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> >>>>>>>>> On 30.09.19 14:57, Fabian Hueske wrote: >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> Sorry for the late reply. >>>>>>>>>> >>>>>>>>>> I think it might lead to confusing situations if temporary >>>>> functions >>>>>>> (or >>>>>>>>>> any temporary db objects for that matter) are bound to the life >>>>>> cycle >>>>>>>>> of an >>>>>>>>>> (external) db/catalog. >>>>>>>>>> Imaging a situation where you create a temp function in a db in an >>>>>>>>> external >>>>>>>>>> catalog and use it but at some point it does not work anymore >>>>>> because >>>>>>>>> some >>>>>>>>>> other dropped the database from the external catalog. >>>>>>>>>> Shouldn't temporary objects be only controlled by the owner of a >>>>>>>>> session? >>>>>>>>>> I agree that creating temp objects in non-existing db/catalogs >>>>>> sounds >>>>>>> a >>>>>>>>> bit >>>>>>>>>> strange, but IMO the opposite (the db/catalog must exist for a >>>>> temp >>>>>>>>>> function to be created/exist) can have significant implications >>>>> like >>>>>>> the >>>>>>>>>> one I described. >>>>>>>>>> I think it would be quite easy for users to understand that >>>>>> temporary >>>>>>>>>> objects are solely owned by them (and their session). >>>>>>>>>> The problem of listing temporary objects could be solved by >>>>> adding a >>>>>>> ALL >>>>>>>>>> [TEMPORARY] clause: >>>>>>>>>> >>>>>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the >>>>>>>>>> catalog/database including temporary functions. >>>>>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions >>>>>>>>> regardless >>>>>>>>>> of the catalog/database. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Fabian >>>>>>>>>> >>>>>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < >>>>>>>>> [hidden email]>: >>>>>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can >>>>> close >>>>>>> the >>>>>>>>>>> voting. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> >>>>>>> wrote: >>>>>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just >>>>> want >>>>>> to >>>>>>>>> try >>>>>>>>>>>> out temporary functions, they can create temporary system >>>>>> functions >>>>>>>>> which >>>>>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow >>>>>>>>> temporary >>>>>>>>>>>> catalog function is to let users override permanent catalog >>>>>>> functions. >>>>>>>>>>>> Therefore a temporary function in a non-existing catalog won't >>>>>> serve >>>>>>>>> that >>>>>>>>>>>> purpose. Besides, each session is provided with a default >>>>> catalog >>>>>>> and >>>>>>>>> DB. >>>>>>>>>>>> So even if users simply want to create some catalog functions >>>>> they >>>>>>> can >>>>>>>>>>>> forget about after the session, wouldn't the default catalog/DB >>>>> be >>>>>>>>> enough >>>>>>>>>>>> for such experiments? >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> >>>>>>> wrote: >>>>>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will >>>>> first >>>>>>>>> make >>>>>>>>>>>> sure >>>>>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is >>>>> triggered >>>>>> to >>>>>>>>>>>> remove >>>>>>>>>>>>> that temp function. >>>>>>>>>>>>> >>>>>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not >>>>>> copying >>>>>>>>>>> Hive. >>>>>>>>>>>>> IMHO, allowing registering temp functions in nonexistent >>>>>> catalog/db >>>>>>>>> is >>>>>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would >>>>> list >>>>>>>>> system >>>>>>>>>>>>> functions and functions in the current catalog/db, since users >>>>>>> cannot >>>>>>>>>>>>> designate a nonexistent catalog/db as current ones, how can >>>>> they >>>>>>> list >>>>>>>>>>>>> functions in nonexistent catalog/db? They may end up never >>>>>> knowing >>>>>>>>> what >>>>>>>>>>>>> temp functions they've created unless trying out with queries >>>>> or >>>>>> we >>>>>>>>>>>>> introducing some more nonstandard SQL statements. The same >>>>>> applies >>>>>>> to >>>>>>>>>>>> other >>>>>>>>>>>>> temp objects like temp tables. >>>>>>>>>>>>> >>>>>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me >>>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < >>>>>>>>>>>>> [hidden email]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Ad. 1 >>>>>>>>>>>>>> I wouldn't say it is hacky. >>>>>>>>>>>>>> Moreover, how do you want ensure that the dB always exists >>>>> when >>>>>> a >>>>>>>>>>>>> temporary >>>>>>>>>>>>>> object is used?( in this particular case function). Do you >>>>> want >>>>>> to >>>>>>>>>>>> query >>>>>>>>>>>>>> for the database existence whenever e.g a temporary function >>>>> is >>>>>>>>>>> used? I >>>>>>>>>>>>>> think important aspect here is that the database can be >>>>> dropped >>>>>>> from >>>>>>>>>>>>>> external system, not just flink or a different flink session. >>>>>>>>>>>>>> >>>>>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a >>>>>>>>>>> database >>>>>>>>>>>>> that >>>>>>>>>>>>>> does not exist, that's true. But if you create a temporary >>>>> table >>>>>>> in >>>>>>>>> a >>>>>>>>>>>>>> database and drop that database from a different session, you >>>>>> can >>>>>>>>>>> still >>>>>>>>>>>>>> query the previously created temporary table from the original >>>>>>>>>>> session. >>>>>>>>>>>>> It >>>>>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we >>>>>>> make >>>>>>>>>>>> this >>>>>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime >>>>> of >>>>>> a >>>>>>>>>>>>> database >>>>>>>>>>>>>> explicit as part of the temporary objects contract? In the end >>>>>>> they >>>>>>>>>>>> exist >>>>>>>>>>>>>> in different layers. Permanent objects & databases in a >>>>> catalog >>>>>>> (in >>>>>>>>>>>> case >>>>>>>>>>>>> of >>>>>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions. >>>>>>> That's >>>>>>>>>>>> also >>>>>>>>>>>>>> true for the original hive client. The temporary objects live >>>>> in >>>>>>> the >>>>>>>>>>>> hive >>>>>>>>>>>>>> client whereas databases are created in the metastore. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ad.2 >>>>>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to >>>>> achieve >>>>>>>>> here >>>>>>>>>>>> is >>>>>>>>>>>>> so >>>>>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One >>>>>>>>> important >>>>>>>>>>>>> thing >>>>>>>>>>>>>> to remember here is that we need the function identifier to be >>>>>>> part >>>>>>>>>>> of >>>>>>>>>>>>> the >>>>>>>>>>>>>> FunctionDefinition object and not only as the result of the >>>>>>> function >>>>>>>>>>>>>> lookup. At some point we want to be able to store >>>>>> QueryOperations >>>>>>> in >>>>>>>>>>>> the >>>>>>>>>>>>>> catalogs. They can contain function calls within which we need >>>>>> to >>>>>>>>>>> have >>>>>>>>>>>>> the >>>>>>>>>>>>>> identifier. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree my initial suggestion is over complicated. How about >>>>> we >>>>>>> have >>>>>>>>>>>> just >>>>>>>>>>>>>> the FunctionIdentifier as top level class without making the >>>>>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the >>>>>> same >>>>>>>>>>> what >>>>>>>>>>>>> you >>>>>>>>>>>>>> suggested. The only difference is that it would be a top level >>>>>>> class >>>>>>>>>>>>> with a >>>>>>>>>>>>>> more descriptive name. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> >>>>>> wrote: >>>>>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete >>>>>>>>>>> alternative >>>>>>>>>>>> is >>>>>>>>>>>>>> the >>>>>>>>>>>>>>> following, basically having separate APIs in FunctionLookup >>>>> for >>>>>>>>>>>>> ambiguous >>>>>>>>>>>>>>> and precise function lookup since planner is able to tell >>>>> which >>>>>>> API >>>>>>>>>>>> to >>>>>>>>>>>>>> call >>>>>>>>>>>>>>> with parsed queries, and have a unified result: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>> class FunctionLookup { >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> class Result { >>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... >>>>> } >>>>>>>>>>>>>>> String getName() { ... } >>>>>>>>>>>>>>> // ... >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Bowen >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li < >>>>> [hidden email]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> Hi Dawid, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments >>>>> is >>>>>>>>>>>>>> important. >>>>>>>>>>>>>>>> However, I'm not sure allowing users to register temp >>>>>> functions >>>>>>>>>>> in >>>>>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit >>>>>> hacky, >>>>>>>>>>>> and >>>>>>>>>>>>>>> breaks >>>>>>>>>>>>>>>> the current contract between Flink and users that catalog/db >>>>>>> must >>>>>>>>>>>> be >>>>>>>>>>>>>>> valid >>>>>>>>>>>>>>>> in order to operate on. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> How about we instead focus on making it convenient to create >>>>>>>>>>>>> catalogs? >>>>>>>>>>>>>>>> Users actually can already do it with ease via program or >>>>> SQL >>>>>>> CLI >>>>>>>>>>>>> yaml >>>>>>>>>>>>>>> file >>>>>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency >>>>>> nor >>>>>>>>>>>>>> external >>>>>>>>>>>>>>>> connections. What we can further improve is DDL for >>>>> catalogs, >>>>>>>>>>> and I >>>>>>>>>>>>>>> raised >>>>>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] >>>>>> driven >>>>>>>>>>> by >>>>>>>>>>>>>> Terry >>>>>>>>>>>>>>>> now. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In that case, if users would like to experiment via SQL, >>>>> they >>>>>>> can >>>>>>>>>>>>>> easily >>>>>>>>>>>>>>>> create an in memory catalog/database using DDL, then play >>>>> with >>>>>>>>>>> temp >>>>>>>>>>>>>>>> functions. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier >>>>>> has >>>>>>>>>>> not >>>>>>>>>>>>>> been >>>>>>>>>>>>>>>> resolved when stack call reaches >>>>>>>>>>> FunctionCatalog#lookupFunction(), >>>>>>>>>>>>> but >>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>> been parsed? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok >>>>>>> with >>>>>>>>>>>> the >>>>>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass >>>>>> of >>>>>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Another potentially simpler way is: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>> // in class FunctionLookup >>>>>>>>>>>>>>>> class Result { >>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >>>>> ... } >>>>>>>>>>>>>>>> String getName() { ... } >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < >>>>>>>>>>>>>>>> [hidden email]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> I really like the flip and think it clarifies important >>>>>> aspects >>>>>>>>>>> of >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> system. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take >>>>>> much >>>>>>>>>>>> time >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> agree on. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the >>>>>>>>>>> existence >>>>>>>>>>>> of >>>>>>>>>>>>>>>>> cat/db >>>>>>>>>>>>>>>>> for temporary functions? That means not to check it, so >>>>> e.g. >>>>>>>>>>> it's >>>>>>>>>>>>>>> possible >>>>>>>>>>>>>>>>> to create a temporary function in a database that does not >>>>>>>>>>> exist. >>>>>>>>>>>> I >>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform >>>>>>>>>>>>> experiments >>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>> does not have access to the db yet or temporarily does not >>>>>> have >>>>>>>>>>>>>>> connection >>>>>>>>>>>>>>>>> to a catalog. >>>>>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not >>>>>>> loosen >>>>>>>>>>>> the >>>>>>>>>>>>>>>>> requirements for all catalog objects such as tables, views, >>>>>>>>>>> types >>>>>>>>>>>>> just >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>> the functions? It's really important later on from e.g the >>>>>>>>>>>>>>> serializability >>>>>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier >>>>> is >>>>>>>>>>> that >>>>>>>>>>>> we >>>>>>>>>>>>>>> know >>>>>>>>>>>>>>>>> that it has been resolved. The suggested changes break that >>>>>>>>>>>>>> assumption. >>>>>>>>>>>>>>>>> What do you think about adding an interface >>>>>> FunctionIdentifier >>>>>>> { >>>>>>>>>>>>>>>>> String getName(); >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> /** >>>>>>>>>>>>>>>>> Return 3-part identifier. Empty in case of a built-in >>>>>>>>>>> function. >>>>>>>>>>>>>>>>> */ >>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier { >>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >>>>>>>>>>>>>>>>> return Optional.of(this); >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> class SystemFunctionIdentifier implements >>>>> FunctionIdentifier >>>>>>>>>>> {...} >>>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> >>>>>> wrote: >>>>>>>>>>>>>>>>>> +1. LGTM >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < >>>>>>>>>>> [hidden email]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> Terry Wang >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> Kurt >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < >>>>>>>>>>>> [hidden email] >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], >>>>> which >>>>>>>>>>>>> we've >>>>>>>>>>>>>>>>> reached >>>>>>>>>>>>>>>>>>>>> consensus in [2]. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm >>>>>>>>>>>> UTC, >>>>>>>>>>>>>> Sep >>>>>>>>>>>>>>>>> 26. >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Bowen >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog >>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>> >>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Xuefu Zhang >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> "In Honey We Trust!" >>>>>>>>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best regards! >>>>>>>>>>>> Rui Li >>>>>>>>>>>> >>>>>>>>> >>>> >>> >>> |
Hi,
I agree with Aljoscha. It is not transparent to me which votes are binding to the current status of the FLIP. Some other minor comments from my side: - We don't need to deprecate methods in FunctionCatalog. This class is internal. We can simply change the method signatures. - `String name` is missing in the FunctionIdentifier code example; can we call FunctionIdentifier.getSimpleName() just FunctionIdentifier.getName()? - Add the methods that we discussed to the example: `of(String)`, `of(ObjectIdentifier)` Other than that, I'm happy to give my +1 to this proposal. Thanks for the productive discussion, Timo On 04.10.19 13:29, Aljoscha Krettek wrote: > Hi, > > I see there was quite some discussion and changes on the FLIP after this VOTE was started. I would suggest to start a new voting thread on the current state of the FLIP (keeping in mind that a FLIP vote needs at least three committer/PMC votes). > > For the future, we should probably keep discussion to the [DISCUSS] thread and use the vote thread only for voting. > > Best, > Aljoscha > >> On 3. Oct 2019, at 21:17, Bowen Li <[hidden email]> wrote: >> >> I'm glad to announce that the community has accepted the design of FLIP-57, >> and we are moving forward to implementing it. >> >> Thanks everyone! >> >> On Wed, Oct 2, 2019 at 11:01 AM Bowen Li <[hidden email]> wrote: >> >>> Introducing a new term "path" to APIs like >>> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to >>> users, thus I feel "getSimpleName/getIdentifier" is fine. >>> >>> To summarize the discussion result. >>> >>> - categorize functions into 2 dimensions - system v.s. catalog, >>> non-temp v.s. temp - and that give us 4 combinations >>> - definition of FunctionIdentifier >>> >>> @PublicEvolving >>> >>> Class FunctionIdentifier { >>> >>> String name; >>> >>> ObjectIdentifier oi; >>> >>> // for temporary/non-temporary system function >>> public FunctionIdentifier of(String name) { } >>> // for temporary/non-temporary catalog function >>> public FunctionIdentifier of(ObjectIdentifier identifier) { } >>> >>> >>> Optional<ObjectIdentifier> getIdentifier() {} >>> >>> Optional<String> getSimpleName() {} >>> >>> } >>> >>> >>> I've updated them to FLIP wiki. Please take a final look. I'll close the >>> voting if there's no other concern raised within 24 hours. >>> >>> Cheers >>> >>> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <[hidden email]> >>> wrote: >>> >>>> Hi, >>>> >>>> I very much agree with Xuefu's summary of the two points, especially on >>>> the "functionIdentifier doesn't need to reflect the categories". >>>> >>>> For the factory methods I think methods of should be enough: >>>> >>>> // for temporary/non-temporary system function >>>> public FunctionIdentifier of(String name) { } >>>> // for temporary/non-temporary catalog function >>>> public FunctionIdentifier of(ObjectIdentifier identifier){ } >>>> >>>> In case of the getters I did not like the method name `getName` in the >>>> original proposal, as in my opinion it could imply that it can return >>>> also just the name part of an ObjectIdentifier, which should not be the >>>> case. >>>> >>>> I'm fine with getSimpleName/getIdentifier, but want to throw in few >>>> other suggestions: >>>> >>>> * getShortPath(Identifier)/getLongPath(Identifier), >>>> >>>> * getSystemPath(Identifier)/getCatalogPath(Identifier) >>>> >>>> +1 to any of the 3 options. >>>> >>>> One additional thing the FunctionIdentifier should be a PublicEvolving >>>> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which >>>> user might need to access e.g. in a filter pushdown. >>>> >>>> I also support the Xuefu's suggestion not to support the "ALL" keyword >>>> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design >>>> of it is not part of the FLIP-57, we do not need to agree on that in >>>> this thread. >>>> >>>> Overall I think after updating the FLIP with the outcome of the >>>> discussion I vote +1 for it. >>>> >>>> Best, >>>> >>>> Dawid >>>> >>>> >>>> On 02/10/2019 00:21, Xuefu Z wrote: >>>>> Here are some of my thoughts on the minor debates above: >>>>> >>>>> 1. +1 for 4 categories of functions. They are categorized along two >>>>> dimensions of binary values: X: *temporary* vs non-temporary >>>> (persistent); >>>>> Y: *system* vs non-system (so said catalog). >>>>> 2. In my opinion, class functionIdentifier doesn't really need to >>>> reflect >>>>> the categories of the functions. Instead, we should decouple them to >>>> make >>>>> the API more stable. Thus, my suggestion is: >>>>> >>>>> @Internal >>>>> class FunctionIdentifier { >>>>> // for temporary/non-temporary system function >>>>> public FunctionIdentifier ofSimpleName(String name) { } >>>>> // for temporary/non-temporary catalog function >>>>> public FunctionIdentifier ofIdentifier(ObjectIdentifier >>>>> identifier){ } >>>>> public Optional<String> getSimpleName() { } >>>>> public Optional<ObjectIdentifier> getIdentifier() { } >>>>> } >>>>> 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be: >>>>> >>>>> SHOW [TEMPORARY] [SYSTEM] FUNCTIONS. >>>>> >>>>> When either keyword is missing, "ALL" is implied along that dimension. >>>> We >>>>> should always limit the search to the system function catalog and the >>>>> current catalog/DB. I don't see a need of listing functions across >>>>> different catalogs and databases. (It can be added later if that >>>> arises.) >>>>> Thanks, >>>>> Xuefu >>>>> >>>>> On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote: >>>>> >>>>>> Hi Dawid, >>>>>> >>>>>> Thanks for bringing the suggestions up. I was prototyping yesterday and >>>>>> found out those places exactly as what you suggested. >>>>>> >>>>>> For CallExpression and UnresolvedCallExpression, I've added them to >>>>>> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and >>>> mark >>>>>> that as a breaking change >>>>>> >>>>>> For FunctionIdentifier, the suggested changes LGTM. Just want to bring >>>> up >>>>>> an issue on naming. It seems to me how we now name functions >>>> categories is >>>>>> a bit unclear and confusing, which is reflected on the suggested APIs >>>> - in >>>>>> FunctionIdentifier you lay out, "builtin function" would include >>>> builtin >>>>>> functions and temporary system functions as we are kind of using >>>> "system" >>>>>> and "built-in" interchangeably, and "catalog function" would include >>>>>> catalog functions and temporary functions. I currently can see two >>>>>> approaches to make it clearer to users. >>>>>> >>>>>> 1) Simplify FunctionIdentifier to be the following. As it's internal, >>>> we >>>>>> add comments and explanation to devs on which case the APIs support. >>>>>> However, I feel this approach would be somehow a bit conflicting to >>>> what >>>>>> you want to achieve for the clarity of APIs >>>>>> >>>>>> @Internal >>>>>> class FunctionIdentifier { >>>>>> // for built-in function and temporary system function >>>>>> public FunctionIdentifier of(String name) { } >>>>>> // for temporary function and catalog function >>>>>> public FunctionIdentifier of(ObjectIdentifier identifier){ } >>>>>> public Optional<String> getFunctionName() { } >>>>>> public Optional<ObjectIdentifier> getObjectIdentifier() { } >>>>>> } >>>>>> >>>>>> 2) We can rename our function categories as following so there'll be >>>> mainly >>>>>> just two categories of functions, "system functions" and "catalog >>>>>> functions", either of which can have temporary ones >>>>>> >>>>>> - built-in functions -> officially rename to "system functions" and >>>> note >>>>>> to users that "system" and "built-in" can be used interchangeably. We >>>>>> prefer "system" because that's the keyword we decided to use in DDL >>>> that >>>>>> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION") >>>>>> - temporary system functions >>>>>> - catalog functions >>>>>> - temporary functions -> rename to "temporary catalog function" >>>>>> >>>>>> @Internal >>>>>> class FunctionIdentifier { >>>>>> // for temporary/non-temporary system function >>>>>> public FunctionIdentifier ofSystemFunction(String name) { } >>>>>> // for temporary/non-temporary catalog function >>>>>> public FunctionIdentifier ofCatalogFunction(ObjectIdentifier >>>>>> identifier){ } >>>>>> public Optional<String> getSystemFunctionName() { } >>>>>> public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() >>>> { } >>>>>> } >>>>>> >>>>>> WDYT? >>>>>> >>>>>> >>>>>> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> >>>> wrote: >>>>>>> Thanks for the summary Bowen! >>>>>>> >>>>>>> Looks good to me. >>>>>>> >>>>>>> Cheers, >>>>>>> Fabian >>>>>>> >>>>>>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li < >>>>>> [hidden email] >>>>>>>> : >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I've updated the FLIP wiki with the following changes: >>>>>>>> >>>>>>>> - Lifespan of temp functions are not tied to those of catalogs and >>>>>>>> databases. Users can create temp functions even though catalogs/dbs >>>> in >>>>>>>> their fully qualified names don't even exist. >>>>>>>> - some new SQL commands >>>>>>>> - "SHOW FUNCTIONS" - list names of temp and non-temp >>>>>> system/built-in >>>>>>>> functions, and names of temp and catalog functions in the current >>>>>> catalog >>>>>>>> and db >>>>>>>> - "SHOW ALL FUNCTIONS" - list names of temp and non-temp >>>>>> system/built >>>>>>>> functions, and fully qualified names of temp and catalog functions in >>>>>> all >>>>>>>> catalogs and dbs >>>>>>>> - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of >>>>>> temp >>>>>>>> functions in all catalog and db >>>>>>>> - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp >>>>>>> system >>>>>>>> functions >>>>>>>> >>>>>>>> Let me know if you have any questions. >>>>>>>> >>>>>>>> Seems we have resolved all concerns. If there's no more ones, I'd >>>> like >>>>>> to >>>>>>>> close the vote by this time tomorrow. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Bowen >>>>>>>> >>>>>>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> >>>> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I think above are some valid points, and we can adopt the >>>>>> suggestions. >>>>>>>>> To elaborate a bit on the new SQL syntax, it would imply that, >>>> unlike >>>>>>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL >>>>>> [TEMPORARY] >>>>>>>>> FUNCTIONS" would return functions' fully qualified names with >>>> catalog >>>>>>> and >>>>>>>>> db names. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> >>>>>>> wrote: >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I support Fabian's arguments. In my opinion, temporary objects >>>>>> should >>>>>>>>>> just be an additional layer on top of the regular catalog/database >>>>>>>>>> lookup logic. Thus, a temporary table or function has always >>>> highest >>>>>>>>>> precedence and should be stable within the local session. Otherwise >>>>>> it >>>>>>>>>> could magically disappear while someone else is performing >>>>>>> modifications >>>>>>>>>> in the catalog. >>>>>>>>>> >>>>>>>>>> Furthermore, this feature is very useful for prototyping as users >>>>>> can >>>>>>>>>> simply express that a catalog/database is present even through they >>>>>>>>>> might not have access to it currently. >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Timo >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 30.09.19 14:57, Fabian Hueske wrote: >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Sorry for the late reply. >>>>>>>>>>> >>>>>>>>>>> I think it might lead to confusing situations if temporary >>>>>> functions >>>>>>>> (or >>>>>>>>>>> any temporary db objects for that matter) are bound to the life >>>>>>> cycle >>>>>>>>>> of an >>>>>>>>>>> (external) db/catalog. >>>>>>>>>>> Imaging a situation where you create a temp function in a db in an >>>>>>>>>> external >>>>>>>>>>> catalog and use it but at some point it does not work anymore >>>>>>> because >>>>>>>>>> some >>>>>>>>>>> other dropped the database from the external catalog. >>>>>>>>>>> Shouldn't temporary objects be only controlled by the owner of a >>>>>>>>>> session? >>>>>>>>>>> I agree that creating temp objects in non-existing db/catalogs >>>>>>> sounds >>>>>>>> a >>>>>>>>>> bit >>>>>>>>>>> strange, but IMO the opposite (the db/catalog must exist for a >>>>>> temp >>>>>>>>>>> function to be created/exist) can have significant implications >>>>>> like >>>>>>>> the >>>>>>>>>>> one I described. >>>>>>>>>>> I think it would be quite easy for users to understand that >>>>>>> temporary >>>>>>>>>>> objects are solely owned by them (and their session). >>>>>>>>>>> The problem of listing temporary objects could be solved by >>>>>> adding a >>>>>>>> ALL >>>>>>>>>>> [TEMPORARY] clause: >>>>>>>>>>> >>>>>>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the >>>>>>>>>>> catalog/database including temporary functions. >>>>>>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions >>>>>>>>>> regardless >>>>>>>>>>> of the catalog/database. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Fabian >>>>>>>>>>> >>>>>>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < >>>>>>>>>> [hidden email]>: >>>>>>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can >>>>>> close >>>>>>>> the >>>>>>>>>>>> voting. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> >>>>>>>> wrote: >>>>>>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just >>>>>> want >>>>>>> to >>>>>>>>>> try >>>>>>>>>>>>> out temporary functions, they can create temporary system >>>>>>> functions >>>>>>>>>> which >>>>>>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow >>>>>>>>>> temporary >>>>>>>>>>>>> catalog function is to let users override permanent catalog >>>>>>>> functions. >>>>>>>>>>>>> Therefore a temporary function in a non-existing catalog won't >>>>>>> serve >>>>>>>>>> that >>>>>>>>>>>>> purpose. Besides, each session is provided with a default >>>>>> catalog >>>>>>>> and >>>>>>>>>> DB. >>>>>>>>>>>>> So even if users simply want to create some catalog functions >>>>>> they >>>>>>>> can >>>>>>>>>>>>> forget about after the session, wouldn't the default catalog/DB >>>>>> be >>>>>>>>>> enough >>>>>>>>>>>>> for such experiments? >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> >>>>>>>> wrote: >>>>>>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will >>>>>> first >>>>>>>>>> make >>>>>>>>>>>>> sure >>>>>>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is >>>>>> triggered >>>>>>> to >>>>>>>>>>>>> remove >>>>>>>>>>>>>> that temp function. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not >>>>>>> copying >>>>>>>>>>>> Hive. >>>>>>>>>>>>>> IMHO, allowing registering temp functions in nonexistent >>>>>>> catalog/db >>>>>>>>>> is >>>>>>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would >>>>>> list >>>>>>>>>> system >>>>>>>>>>>>>> functions and functions in the current catalog/db, since users >>>>>>>> cannot >>>>>>>>>>>>>> designate a nonexistent catalog/db as current ones, how can >>>>>> they >>>>>>>> list >>>>>>>>>>>>>> functions in nonexistent catalog/db? They may end up never >>>>>>> knowing >>>>>>>>>> what >>>>>>>>>>>>>> temp functions they've created unless trying out with queries >>>>>> or >>>>>>> we >>>>>>>>>>>>>> introducing some more nonstandard SQL statements. The same >>>>>>> applies >>>>>>>> to >>>>>>>>>>>>> other >>>>>>>>>>>>>> temp objects like temp tables. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < >>>>>>>>>>>>>> [hidden email]> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ad. 1 >>>>>>>>>>>>>>> I wouldn't say it is hacky. >>>>>>>>>>>>>>> Moreover, how do you want ensure that the dB always exists >>>>>> when >>>>>>> a >>>>>>>>>>>>>> temporary >>>>>>>>>>>>>>> object is used?( in this particular case function). Do you >>>>>> want >>>>>>> to >>>>>>>>>>>>> query >>>>>>>>>>>>>>> for the database existence whenever e.g a temporary function >>>>>> is >>>>>>>>>>>> used? I >>>>>>>>>>>>>>> think important aspect here is that the database can be >>>>>> dropped >>>>>>>> from >>>>>>>>>>>>>>> external system, not just flink or a different flink session. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a >>>>>>>>>>>> database >>>>>>>>>>>>>> that >>>>>>>>>>>>>>> does not exist, that's true. But if you create a temporary >>>>>> table >>>>>>>> in >>>>>>>>>> a >>>>>>>>>>>>>>> database and drop that database from a different session, you >>>>>>> can >>>>>>>>>>>> still >>>>>>>>>>>>>>> query the previously created temporary table from the original >>>>>>>>>>>> session. >>>>>>>>>>>>>> It >>>>>>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we >>>>>>>> make >>>>>>>>>>>>> this >>>>>>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime >>>>>> of >>>>>>> a >>>>>>>>>>>>>> database >>>>>>>>>>>>>>> explicit as part of the temporary objects contract? In the end >>>>>>>> they >>>>>>>>>>>>> exist >>>>>>>>>>>>>>> in different layers. Permanent objects & databases in a >>>>>> catalog >>>>>>>> (in >>>>>>>>>>>>> case >>>>>>>>>>>>>> of >>>>>>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions. >>>>>>>> That's >>>>>>>>>>>>> also >>>>>>>>>>>>>>> true for the original hive client. The temporary objects live >>>>>> in >>>>>>>> the >>>>>>>>>>>>> hive >>>>>>>>>>>>>>> client whereas databases are created in the metastore. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ad.2 >>>>>>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to >>>>>> achieve >>>>>>>>>> here >>>>>>>>>>>>> is >>>>>>>>>>>>>> so >>>>>>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One >>>>>>>>>> important >>>>>>>>>>>>>> thing >>>>>>>>>>>>>>> to remember here is that we need the function identifier to be >>>>>>>> part >>>>>>>>>>>> of >>>>>>>>>>>>>> the >>>>>>>>>>>>>>> FunctionDefinition object and not only as the result of the >>>>>>>> function >>>>>>>>>>>>>>> lookup. At some point we want to be able to store >>>>>>> QueryOperations >>>>>>>> in >>>>>>>>>>>>> the >>>>>>>>>>>>>>> catalogs. They can contain function calls within which we need >>>>>>> to >>>>>>>>>>>> have >>>>>>>>>>>>>> the >>>>>>>>>>>>>>> identifier. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I agree my initial suggestion is over complicated. How about >>>>>> we >>>>>>>> have >>>>>>>>>>>>> just >>>>>>>>>>>>>>> the FunctionIdentifier as top level class without making the >>>>>>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the >>>>>>> same >>>>>>>>>>>> what >>>>>>>>>>>>>> you >>>>>>>>>>>>>>> suggested. The only difference is that it would be a top level >>>>>>>> class >>>>>>>>>>>>>> with a >>>>>>>>>>>>>>> more descriptive name. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> >>>>>>> wrote: >>>>>>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete >>>>>>>>>>>> alternative >>>>>>>>>>>>> is >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> following, basically having separate APIs in FunctionLookup >>>>>> for >>>>>>>>>>>>>> ambiguous >>>>>>>>>>>>>>>> and precise function lookup since planner is able to tell >>>>>> which >>>>>>>> API >>>>>>>>>>>>> to >>>>>>>>>>>>>>> call >>>>>>>>>>>>>>>> with parsed queries, and have a unified result: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>> class FunctionLookup { >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi); >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> class Result { >>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { ... >>>>>> } >>>>>>>>>>>>>>>> String getName() { ... } >>>>>>>>>>>>>>>> // ... >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Bowen >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li < >>>>>> [hidden email]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> Hi Dawid, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments >>>>>> is >>>>>>>>>>>>>>> important. >>>>>>>>>>>>>>>>> However, I'm not sure allowing users to register temp >>>>>>> functions >>>>>>>>>>>> in >>>>>>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit >>>>>>> hacky, >>>>>>>>>>>>> and >>>>>>>>>>>>>>>> breaks >>>>>>>>>>>>>>>>> the current contract between Flink and users that catalog/db >>>>>>>> must >>>>>>>>>>>>> be >>>>>>>>>>>>>>>> valid >>>>>>>>>>>>>>>>> in order to operate on. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> How about we instead focus on making it convenient to create >>>>>>>>>>>>>> catalogs? >>>>>>>>>>>>>>>>> Users actually can already do it with ease via program or >>>>>> SQL >>>>>>>> CLI >>>>>>>>>>>>>> yaml >>>>>>>>>>>>>>>> file >>>>>>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency >>>>>>> nor >>>>>>>>>>>>>>> external >>>>>>>>>>>>>>>>> connections. What we can further improve is DDL for >>>>>> catalogs, >>>>>>>>>>>> and I >>>>>>>>>>>>>>>> raised >>>>>>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] >>>>>>> driven >>>>>>>>>>>> by >>>>>>>>>>>>>>> Terry >>>>>>>>>>>>>>>>> now. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In that case, if users would like to experiment via SQL, >>>>>> they >>>>>>>> can >>>>>>>>>>>>>>> easily >>>>>>>>>>>>>>>>> create an in memory catalog/database using DDL, then play >>>>>> with >>>>>>>>>>>> temp >>>>>>>>>>>>>>>>> functions. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier >>>>>>> has >>>>>>>>>>>> not >>>>>>>>>>>>>>> been >>>>>>>>>>>>>>>>> resolved when stack call reaches >>>>>>>>>>>> FunctionCatalog#lookupFunction(), >>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>> been parsed? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok >>>>>>>> with >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass >>>>>>> of >>>>>>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Another potentially simpler way is: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>> // in class FunctionLookup >>>>>>>>>>>>>>>>> class Result { >>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >>>>>> ... } >>>>>>>>>>>>>>>>> String getName() { ... } >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < >>>>>>>>>>>>>>>>> [hidden email]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> I really like the flip and think it clarifies important >>>>>>> aspects >>>>>>>>>>>> of >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> system. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take >>>>>>> much >>>>>>>>>>>>> time >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>> agree on. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the >>>>>>>>>>>> existence >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>> cat/db >>>>>>>>>>>>>>>>>> for temporary functions? That means not to check it, so >>>>>> e.g. >>>>>>>>>>>> it's >>>>>>>>>>>>>>>> possible >>>>>>>>>>>>>>>>>> to create a temporary function in a database that does not >>>>>>>>>>>> exist. >>>>>>>>>>>>> I >>>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform >>>>>>>>>>>>>> experiments >>>>>>>>>>>>>>>> but >>>>>>>>>>>>>>>>>> does not have access to the db yet or temporarily does not >>>>>>> have >>>>>>>>>>>>>>>> connection >>>>>>>>>>>>>>>>>> to a catalog. >>>>>>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not >>>>>>>> loosen >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> requirements for all catalog objects such as tables, views, >>>>>>>>>>>> types >>>>>>>>>>>>>> just >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> the functions? It's really important later on from e.g the >>>>>>>>>>>>>>>> serializability >>>>>>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier >>>>>> is >>>>>>>>>>>> that >>>>>>>>>>>>> we >>>>>>>>>>>>>>>> know >>>>>>>>>>>>>>>>>> that it has been resolved. The suggested changes break that >>>>>>>>>>>>>>> assumption. >>>>>>>>>>>>>>>>>> What do you think about adding an interface >>>>>>> FunctionIdentifier >>>>>>>> { >>>>>>>>>>>>>>>>>> String getName(); >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> /** >>>>>>>>>>>>>>>>>> Return 3-part identifier. Empty in case of a built-in >>>>>>>>>>>> function. >>>>>>>>>>>>>>>>>> */ >>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier { >>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { >>>>>>>>>>>>>>>>>> return Optional.of(this); >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> class SystemFunctionIdentifier implements >>>>>> FunctionIdentifier >>>>>>>>>>>> {...} >>>>>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> >>>>>>> wrote: >>>>>>>>>>>>>>>>>>> +1. LGTM >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < >>>>>>>>>>>> [hidden email]> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>> Terry Wang >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>> Kurt >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < >>>>>>>>>>>>> [hidden email] >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], >>>>>> which >>>>>>>>>>>>>> we've >>>>>>>>>>>>>>>>>> reached >>>>>>>>>>>>>>>>>>>>>> consensus in [2]. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm >>>>>>>>>>>>> UTC, >>>>>>>>>>>>>>> Sep >>>>>>>>>>>>>>>>>> 26. >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> Bowen >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> [1] >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog >>>>>>>>>>>>>>>>>>>>>> [2] >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 >>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>> Xuefu Zhang >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "In Honey We Trust!" >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> Best regards! >>>>>>>>>>>>> Rui Li >>>>>>>>>>>>> >>>> |
Hi Aljoscha, Timo
Thanks for the reminder. I've update the details in FLIP wiki, and will kick off a voting thread. On Fri, Oct 4, 2019 at 1:51 PM Timo Walther <[hidden email]> wrote: > Hi, > > I agree with Aljoscha. It is not transparent to me which votes are > binding to the current status of the FLIP. > > Some other minor comments from my side: > > - We don't need to deprecate methods in FunctionCatalog. This class is > internal. We can simply change the method signatures. > - `String name` is missing in the FunctionIdentifier code example; can > we call FunctionIdentifier.getSimpleName() just > FunctionIdentifier.getName()? > - Add the methods that we discussed to the example: `of(String)`, > `of(ObjectIdentifier)` > > Other than that, I'm happy to give my +1 to this proposal. > > Thanks for the productive discussion, > Timo > > > On 04.10.19 13:29, Aljoscha Krettek wrote: > > Hi, > > > > I see there was quite some discussion and changes on the FLIP after this > VOTE was started. I would suggest to start a new voting thread on the > current state of the FLIP (keeping in mind that a FLIP vote needs at least > three committer/PMC votes). > > > > For the future, we should probably keep discussion to the [DISCUSS] > thread and use the vote thread only for voting. > > > > Best, > > Aljoscha > > > >> On 3. Oct 2019, at 21:17, Bowen Li <[hidden email]> wrote: > >> > >> I'm glad to announce that the community has accepted the design of > FLIP-57, > >> and we are moving forward to implementing it. > >> > >> Thanks everyone! > >> > >> On Wed, Oct 2, 2019 at 11:01 AM Bowen Li <[hidden email]> wrote: > >> > >>> Introducing a new term "path" to APIs like > >>> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing > to > >>> users, thus I feel "getSimpleName/getIdentifier" is fine. > >>> > >>> To summarize the discussion result. > >>> > >>> - categorize functions into 2 dimensions - system v.s. catalog, > >>> non-temp v.s. temp - and that give us 4 combinations > >>> - definition of FunctionIdentifier > >>> > >>> @PublicEvolving > >>> > >>> Class FunctionIdentifier { > >>> > >>> String name; > >>> > >>> ObjectIdentifier oi; > >>> > >>> // for temporary/non-temporary system function > >>> public FunctionIdentifier of(String name) { } > >>> // for temporary/non-temporary catalog function > >>> public FunctionIdentifier of(ObjectIdentifier identifier) { } > >>> > >>> > >>> Optional<ObjectIdentifier> getIdentifier() {} > >>> > >>> Optional<String> getSimpleName() {} > >>> > >>> } > >>> > >>> > >>> I've updated them to FLIP wiki. Please take a final look. I'll close > the > >>> voting if there's no other concern raised within 24 hours. > >>> > >>> Cheers > >>> > >>> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz < > [hidden email]> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>> I very much agree with Xuefu's summary of the two points, especially > on > >>>> the "functionIdentifier doesn't need to reflect the categories". > >>>> > >>>> For the factory methods I think methods of should be enough: > >>>> > >>>> // for temporary/non-temporary system function > >>>> public FunctionIdentifier of(String name) { } > >>>> // for temporary/non-temporary catalog function > >>>> public FunctionIdentifier of(ObjectIdentifier identifier){ } > >>>> > >>>> In case of the getters I did not like the method name `getName` in the > >>>> original proposal, as in my opinion it could imply that it can return > >>>> also just the name part of an ObjectIdentifier, which should not be > the > >>>> case. > >>>> > >>>> I'm fine with getSimpleName/getIdentifier, but want to throw in few > >>>> other suggestions: > >>>> > >>>> * getShortPath(Identifier)/getLongPath(Identifier), > >>>> > >>>> * getSystemPath(Identifier)/getCatalogPath(Identifier) > >>>> > >>>> +1 to any of the 3 options. > >>>> > >>>> One additional thing the FunctionIdentifier should be a PublicEvolving > >>>> class, as it is part of a PublicEvolving APIs e.g. CallExpression, > which > >>>> user might need to access e.g. in a filter pushdown. > >>>> > >>>> I also support the Xuefu's suggestion not to support the "ALL" keyword > >>>> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design > >>>> of it is not part of the FLIP-57, we do not need to agree on that in > >>>> this thread. > >>>> > >>>> Overall I think after updating the FLIP with the outcome of the > >>>> discussion I vote +1 for it. > >>>> > >>>> Best, > >>>> > >>>> Dawid > >>>> > >>>> > >>>> On 02/10/2019 00:21, Xuefu Z wrote: > >>>>> Here are some of my thoughts on the minor debates above: > >>>>> > >>>>> 1. +1 for 4 categories of functions. They are categorized along two > >>>>> dimensions of binary values: X: *temporary* vs non-temporary > >>>> (persistent); > >>>>> Y: *system* vs non-system (so said catalog). > >>>>> 2. In my opinion, class functionIdentifier doesn't really need to > >>>> reflect > >>>>> the categories of the functions. Instead, we should decouple them to > >>>> make > >>>>> the API more stable. Thus, my suggestion is: > >>>>> > >>>>> @Internal > >>>>> class FunctionIdentifier { > >>>>> // for temporary/non-temporary system function > >>>>> public FunctionIdentifier ofSimpleName(String name) { } > >>>>> // for temporary/non-temporary catalog function > >>>>> public FunctionIdentifier ofIdentifier(ObjectIdentifier > >>>>> identifier){ } > >>>>> public Optional<String> getSimpleName() { } > >>>>> public Optional<ObjectIdentifier> getIdentifier() { } > >>>>> } > >>>>> 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just > be: > >>>>> > >>>>> SHOW [TEMPORARY] [SYSTEM] FUNCTIONS. > >>>>> > >>>>> When either keyword is missing, "ALL" is implied along that > dimension. > >>>> We > >>>>> should always limit the search to the system function catalog and the > >>>>> current catalog/DB. I don't see a need of listing functions across > >>>>> different catalogs and databases. (It can be added later if that > >>>> arises.) > >>>>> Thanks, > >>>>> Xuefu > >>>>> > >>>>> On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> > wrote: > >>>>> > >>>>>> Hi Dawid, > >>>>>> > >>>>>> Thanks for bringing the suggestions up. I was prototyping yesterday > and > >>>>>> found out those places exactly as what you suggested. > >>>>>> > >>>>>> For CallExpression and UnresolvedCallExpression, I've added them to > >>>>>> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier > and > >>>> mark > >>>>>> that as a breaking change > >>>>>> > >>>>>> For FunctionIdentifier, the suggested changes LGTM. Just want to > bring > >>>> up > >>>>>> an issue on naming. It seems to me how we now name functions > >>>> categories is > >>>>>> a bit unclear and confusing, which is reflected on the suggested > APIs > >>>> - in > >>>>>> FunctionIdentifier you lay out, "builtin function" would include > >>>> builtin > >>>>>> functions and temporary system functions as we are kind of using > >>>> "system" > >>>>>> and "built-in" interchangeably, and "catalog function" would include > >>>>>> catalog functions and temporary functions. I currently can see two > >>>>>> approaches to make it clearer to users. > >>>>>> > >>>>>> 1) Simplify FunctionIdentifier to be the following. As it's > internal, > >>>> we > >>>>>> add comments and explanation to devs on which case the APIs support. > >>>>>> However, I feel this approach would be somehow a bit conflicting to > >>>> what > >>>>>> you want to achieve for the clarity of APIs > >>>>>> > >>>>>> @Internal > >>>>>> class FunctionIdentifier { > >>>>>> // for built-in function and temporary system function > >>>>>> public FunctionIdentifier of(String name) { } > >>>>>> // for temporary function and catalog function > >>>>>> public FunctionIdentifier of(ObjectIdentifier identifier){ } > >>>>>> public Optional<String> getFunctionName() { } > >>>>>> public Optional<ObjectIdentifier> getObjectIdentifier() { } > >>>>>> } > >>>>>> > >>>>>> 2) We can rename our function categories as following so there'll be > >>>> mainly > >>>>>> just two categories of functions, "system functions" and "catalog > >>>>>> functions", either of which can have temporary ones > >>>>>> > >>>>>> - built-in functions -> officially rename to "system functions" > and > >>>> note > >>>>>> to users that "system" and "built-in" can be used interchangeably. > We > >>>>>> prefer "system" because that's the keyword we decided to use in DDL > >>>> that > >>>>>> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION") > >>>>>> - temporary system functions > >>>>>> - catalog functions > >>>>>> - temporary functions -> rename to "temporary catalog function" > >>>>>> > >>>>>> @Internal > >>>>>> class FunctionIdentifier { > >>>>>> // for temporary/non-temporary system function > >>>>>> public FunctionIdentifier ofSystemFunction(String name) { } > >>>>>> // for temporary/non-temporary catalog function > >>>>>> public FunctionIdentifier ofCatalogFunction(ObjectIdentifier > >>>>>> identifier){ } > >>>>>> public Optional<String> getSystemFunctionName() { } > >>>>>> public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() > >>>> { } > >>>>>> } > >>>>>> > >>>>>> WDYT? > >>>>>> > >>>>>> > >>>>>> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> > >>>> wrote: > >>>>>>> Thanks for the summary Bowen! > >>>>>>> > >>>>>>> Looks good to me. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Fabian > >>>>>>> > >>>>>>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li < > >>>>>> [hidden email] > >>>>>>>> : > >>>>>>>> Hi all, > >>>>>>>> > >>>>>>>> I've updated the FLIP wiki with the following changes: > >>>>>>>> > >>>>>>>> - Lifespan of temp functions are not tied to those of catalogs and > >>>>>>>> databases. Users can create temp functions even though > catalogs/dbs > >>>> in > >>>>>>>> their fully qualified names don't even exist. > >>>>>>>> - some new SQL commands > >>>>>>>> - "SHOW FUNCTIONS" - list names of temp and non-temp > >>>>>> system/built-in > >>>>>>>> functions, and names of temp and catalog functions in the current > >>>>>> catalog > >>>>>>>> and db > >>>>>>>> - "SHOW ALL FUNCTIONS" - list names of temp and non-temp > >>>>>> system/built > >>>>>>>> functions, and fully qualified names of temp and catalog > functions in > >>>>>> all > >>>>>>>> catalogs and dbs > >>>>>>>> - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names > of > >>>>>> temp > >>>>>>>> functions in all catalog and db > >>>>>>>> - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all > temp > >>>>>>> system > >>>>>>>> functions > >>>>>>>> > >>>>>>>> Let me know if you have any questions. > >>>>>>>> > >>>>>>>> Seems we have resolved all concerns. If there's no more ones, I'd > >>>> like > >>>>>> to > >>>>>>>> close the vote by this time tomorrow. > >>>>>>>> > >>>>>>>> Cheers, > >>>>>>>> Bowen > >>>>>>>> > >>>>>>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> > >>>> wrote: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> I think above are some valid points, and we can adopt the > >>>>>> suggestions. > >>>>>>>>> To elaborate a bit on the new SQL syntax, it would imply that, > >>>> unlike > >>>>>>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL > >>>>>> [TEMPORARY] > >>>>>>>>> FUNCTIONS" would return functions' fully qualified names with > >>>> catalog > >>>>>>> and > >>>>>>>>> db names. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email] > > > >>>>>>> wrote: > >>>>>>>>>> Hi all, > >>>>>>>>>> > >>>>>>>>>> I support Fabian's arguments. In my opinion, temporary objects > >>>>>> should > >>>>>>>>>> just be an additional layer on top of the regular > catalog/database > >>>>>>>>>> lookup logic. Thus, a temporary table or function has always > >>>> highest > >>>>>>>>>> precedence and should be stable within the local session. > Otherwise > >>>>>> it > >>>>>>>>>> could magically disappear while someone else is performing > >>>>>>> modifications > >>>>>>>>>> in the catalog. > >>>>>>>>>> > >>>>>>>>>> Furthermore, this feature is very useful for prototyping as > users > >>>>>> can > >>>>>>>>>> simply express that a catalog/database is present even through > they > >>>>>>>>>> might not have access to it currently. > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Timo > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 30.09.19 14:57, Fabian Hueske wrote: > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> Sorry for the late reply. > >>>>>>>>>>> > >>>>>>>>>>> I think it might lead to confusing situations if temporary > >>>>>> functions > >>>>>>>> (or > >>>>>>>>>>> any temporary db objects for that matter) are bound to the life > >>>>>>> cycle > >>>>>>>>>> of an > >>>>>>>>>>> (external) db/catalog. > >>>>>>>>>>> Imaging a situation where you create a temp function in a db > in an > >>>>>>>>>> external > >>>>>>>>>>> catalog and use it but at some point it does not work anymore > >>>>>>> because > >>>>>>>>>> some > >>>>>>>>>>> other dropped the database from the external catalog. > >>>>>>>>>>> Shouldn't temporary objects be only controlled by the owner of > a > >>>>>>>>>> session? > >>>>>>>>>>> I agree that creating temp objects in non-existing db/catalogs > >>>>>>> sounds > >>>>>>>> a > >>>>>>>>>> bit > >>>>>>>>>>> strange, but IMO the opposite (the db/catalog must exist for a > >>>>>> temp > >>>>>>>>>>> function to be created/exist) can have significant implications > >>>>>> like > >>>>>>>> the > >>>>>>>>>>> one I described. > >>>>>>>>>>> I think it would be quite easy for users to understand that > >>>>>>> temporary > >>>>>>>>>>> objects are solely owned by them (and their session). > >>>>>>>>>>> The problem of listing temporary objects could be solved by > >>>>>> adding a > >>>>>>>> ALL > >>>>>>>>>>> [TEMPORARY] clause: > >>>>>>>>>>> > >>>>>>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the > >>>>>>>>>>> catalog/database including temporary functions. > >>>>>>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary > functions > >>>>>>>>>> regardless > >>>>>>>>>>> of the catalog/database. > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Fabian > >>>>>>>>>>> > >>>>>>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li < > >>>>>>>>>> [hidden email]>: > >>>>>>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can > >>>>>> close > >>>>>>>> the > >>>>>>>>>>>> voting. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email] > > > >>>>>>>> wrote: > >>>>>>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just > >>>>>> want > >>>>>>> to > >>>>>>>>>> try > >>>>>>>>>>>>> out temporary functions, they can create temporary system > >>>>>>> functions > >>>>>>>>>> which > >>>>>>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we > allow > >>>>>>>>>> temporary > >>>>>>>>>>>>> catalog function is to let users override permanent catalog > >>>>>>>> functions. > >>>>>>>>>>>>> Therefore a temporary function in a non-existing catalog > won't > >>>>>>> serve > >>>>>>>>>> that > >>>>>>>>>>>>> purpose. Besides, each session is provided with a default > >>>>>> catalog > >>>>>>>> and > >>>>>>>>>> DB. > >>>>>>>>>>>>> So even if users simply want to create some catalog functions > >>>>>> they > >>>>>>>> can > >>>>>>>>>>>>> forget about after the session, wouldn't the default > catalog/DB > >>>>>> be > >>>>>>>>>> enough > >>>>>>>>>>>>> for such experiments? > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li < > [hidden email]> > >>>>>>>> wrote: > >>>>>>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will > >>>>>> first > >>>>>>>>>> make > >>>>>>>>>>>>> sure > >>>>>>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is > >>>>>> triggered > >>>>>>> to > >>>>>>>>>>>>> remove > >>>>>>>>>>>>>> that temp function. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not > >>>>>>> copying > >>>>>>>>>>>> Hive. > >>>>>>>>>>>>>> IMHO, allowing registering temp functions in nonexistent > >>>>>>> catalog/db > >>>>>>>>>> is > >>>>>>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would > >>>>>> list > >>>>>>>>>> system > >>>>>>>>>>>>>> functions and functions in the current catalog/db, since > users > >>>>>>>> cannot > >>>>>>>>>>>>>> designate a nonexistent catalog/db as current ones, how can > >>>>>> they > >>>>>>>> list > >>>>>>>>>>>>>> functions in nonexistent catalog/db? They may end up never > >>>>>>> knowing > >>>>>>>>>> what > >>>>>>>>>>>>>> temp functions they've created unless trying out with > queries > >>>>>> or > >>>>>>> we > >>>>>>>>>>>>>> introducing some more nonstandard SQL statements. The same > >>>>>>> applies > >>>>>>>> to > >>>>>>>>>>>>> other > >>>>>>>>>>>>>> temp objects like temp tables. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz < > >>>>>>>>>>>>>> [hidden email]> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Ad. 1 > >>>>>>>>>>>>>>> I wouldn't say it is hacky. > >>>>>>>>>>>>>>> Moreover, how do you want ensure that the dB always exists > >>>>>> when > >>>>>>> a > >>>>>>>>>>>>>> temporary > >>>>>>>>>>>>>>> object is used?( in this particular case function). Do you > >>>>>> want > >>>>>>> to > >>>>>>>>>>>>> query > >>>>>>>>>>>>>>> for the database existence whenever e.g a temporary > function > >>>>>> is > >>>>>>>>>>>> used? I > >>>>>>>>>>>>>>> think important aspect here is that the database can be > >>>>>> dropped > >>>>>>>> from > >>>>>>>>>>>>>>> external system, not just flink or a different flink > session. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> E.g in case of hive, you cannot create a temporary table > in a > >>>>>>>>>>>> database > >>>>>>>>>>>>>> that > >>>>>>>>>>>>>>> does not exist, that's true. But if you create a temporary > >>>>>> table > >>>>>>>> in > >>>>>>>>>> a > >>>>>>>>>>>>>>> database and drop that database from a different session, > you > >>>>>>> can > >>>>>>>>>>>> still > >>>>>>>>>>>>>>> query the previously created temporary table from the > original > >>>>>>>>>>>> session. > >>>>>>>>>>>>>> It > >>>>>>>>>>>>>>> does not sound like a consistent behaviour to me. Why > don't we > >>>>>>>> make > >>>>>>>>>>>>> this > >>>>>>>>>>>>>>> behaviour of not binding a temporary objects to the > lifetime > >>>>>> of > >>>>>>> a > >>>>>>>>>>>>>> database > >>>>>>>>>>>>>>> explicit as part of the temporary objects contract? In the > end > >>>>>>>> they > >>>>>>>>>>>>> exist > >>>>>>>>>>>>>>> in different layers. Permanent objects & databases in a > >>>>>> catalog > >>>>>>>> (in > >>>>>>>>>>>>> case > >>>>>>>>>>>>>> of > >>>>>>>>>>>>>>> hive megastore) whereas temporary objects in flink > sessions. > >>>>>>>> That's > >>>>>>>>>>>>> also > >>>>>>>>>>>>>>> true for the original hive client. The temporary objects > live > >>>>>> in > >>>>>>>> the > >>>>>>>>>>>>> hive > >>>>>>>>>>>>>>> client whereas databases are created in the metastore. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Ad.2 > >>>>>>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to > >>>>>> achieve > >>>>>>>>>> here > >>>>>>>>>>>>> is > >>>>>>>>>>>>>> so > >>>>>>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One > >>>>>>>>>> important > >>>>>>>>>>>>>> thing > >>>>>>>>>>>>>>> to remember here is that we need the function identifier > to be > >>>>>>>> part > >>>>>>>>>>>> of > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> FunctionDefinition object and not only as the result of the > >>>>>>>> function > >>>>>>>>>>>>>>> lookup. At some point we want to be able to store > >>>>>>> QueryOperations > >>>>>>>> in > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>> catalogs. They can contain function calls within which we > need > >>>>>>> to > >>>>>>>>>>>> have > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> identifier. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I agree my initial suggestion is over complicated. How > about > >>>>>> we > >>>>>>>> have > >>>>>>>>>>>>> just > >>>>>>>>>>>>>>> the FunctionIdentifier as top level class without making > the > >>>>>>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much > the > >>>>>>> same > >>>>>>>>>>>> what > >>>>>>>>>>>>>> you > >>>>>>>>>>>>>>> suggested. The only difference is that it would be a top > level > >>>>>>>> class > >>>>>>>>>>>>>> with a > >>>>>>>>>>>>>>> more descriptive name. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> > >>>>>>> wrote: > >>>>>>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete > >>>>>>>>>>>> alternative > >>>>>>>>>>>>> is > >>>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>> following, basically having separate APIs in > FunctionLookup > >>>>>> for > >>>>>>>>>>>>>> ambiguous > >>>>>>>>>>>>>>>> and precise function lookup since planner is able to tell > >>>>>> which > >>>>>>>> API > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> call > >>>>>>>>>>>>>>>> with parsed queries, and have a unified result: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> ``` > >>>>>>>>>>>>>>>> class FunctionLookup { > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier > oi); > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> class Result { > >>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > ... > >>>>>> } > >>>>>>>>>>>>>>>> String getName() { ... } > >>>>>>>>>>>>>>>> // ... > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>> ``` > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>> Bowen > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li < > >>>>>> [hidden email]> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> Hi Dawid, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Re 1): I agree making it easy for users to run > experiments > >>>>>> is > >>>>>>>>>>>>>>> important. > >>>>>>>>>>>>>>>>> However, I'm not sure allowing users to register temp > >>>>>>> functions > >>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit > >>>>>>> hacky, > >>>>>>>>>>>>> and > >>>>>>>>>>>>>>>> breaks > >>>>>>>>>>>>>>>>> the current contract between Flink and users that > catalog/db > >>>>>>>> must > >>>>>>>>>>>>> be > >>>>>>>>>>>>>>>> valid > >>>>>>>>>>>>>>>>> in order to operate on. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> How about we instead focus on making it convenient to > create > >>>>>>>>>>>>>> catalogs? > >>>>>>>>>>>>>>>>> Users actually can already do it with ease via program or > >>>>>> SQL > >>>>>>>> CLI > >>>>>>>>>>>>>> yaml > >>>>>>>>>>>>>>>> file > >>>>>>>>>>>>>>>>> for an in-memory catalog which has neither extra > dependency > >>>>>>> nor > >>>>>>>>>>>>>>> external > >>>>>>>>>>>>>>>>> connections. What we can further improve is DDL for > >>>>>> catalogs, > >>>>>>>>>>>> and I > >>>>>>>>>>>>>>>> raised > >>>>>>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] > >>>>>>> driven > >>>>>>>>>>>> by > >>>>>>>>>>>>>>> Terry > >>>>>>>>>>>>>>>>> now. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> In that case, if users would like to experiment via SQL, > >>>>>> they > >>>>>>>> can > >>>>>>>>>>>>>>> easily > >>>>>>>>>>>>>>>>> create an in memory catalog/database using DDL, then play > >>>>>> with > >>>>>>>>>>>> temp > >>>>>>>>>>>>>>>>> functions. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function > ObjectIdentifier > >>>>>>> has > >>>>>>>>>>>> not > >>>>>>>>>>>>>>> been > >>>>>>>>>>>>>>>>> resolved when stack call reaches > >>>>>>>>>>>> FunctionCatalog#lookupFunction(), > >>>>>>>>>>>>>> but > >>>>>>>>>>>>>>>> only > >>>>>>>>>>>>>>>>> been parsed? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. > I'm ok > >>>>>>>> with > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a > subclass > >>>>>>> of > >>>>>>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Another potentially simpler way is: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> ``` > >>>>>>>>>>>>>>>>> // in class FunctionLookup > >>>>>>>>>>>>>>>>> class Result { > >>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > >>>>>> ... } > >>>>>>>>>>>>>>>>> String getName() { ... } > >>>>>>>>>>>>>>>>> ... > >>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>> ``` > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> WDYT? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz < > >>>>>>>>>>>>>>>>> [hidden email]> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>> I really like the flip and think it clarifies important > >>>>>>> aspects > >>>>>>>>>>>> of > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> system. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I have two, I hope small suggestions, which will not > take > >>>>>>> much > >>>>>>>>>>>>> time > >>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>>> agree on. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the > >>>>>>>>>>>> existence > >>>>>>>>>>>>> of > >>>>>>>>>>>>>>>>>> cat/db > >>>>>>>>>>>>>>>>>> for temporary functions? That means not to check it, so > >>>>>> e.g. > >>>>>>>>>>>> it's > >>>>>>>>>>>>>>>> possible > >>>>>>>>>>>>>>>>>> to create a temporary function in a database that does > not > >>>>>>>>>>>> exist. > >>>>>>>>>>>>> I > >>>>>>>>>>>>>>>> think > >>>>>>>>>>>>>>>>>> it's really useful e.g in cases when user wants to > perform > >>>>>>>>>>>>>> experiments > >>>>>>>>>>>>>>>> but > >>>>>>>>>>>>>>>>>> does not have access to the db yet or temporarily does > not > >>>>>>> have > >>>>>>>>>>>>>>>> connection > >>>>>>>>>>>>>>>>>> to a catalog. > >>>>>>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we > not > >>>>>>>> loosen > >>>>>>>>>>>>> the > >>>>>>>>>>>>>>>>>> requirements for all catalog objects such as tables, > views, > >>>>>>>>>>>> types > >>>>>>>>>>>>>> just > >>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>> the functions? It's really important later on from e.g > the > >>>>>>>>>>>>>>>> serializability > >>>>>>>>>>>>>>>>>> perspective. The important aspect of the > ObjectIdentifier > >>>>>> is > >>>>>>>>>>>> that > >>>>>>>>>>>>> we > >>>>>>>>>>>>>>>> know > >>>>>>>>>>>>>>>>>> that it has been resolved. The suggested changes break > that > >>>>>>>>>>>>>>> assumption. > >>>>>>>>>>>>>>>>>> What do you think about adding an interface > >>>>>>> FunctionIdentifier > >>>>>>>> { > >>>>>>>>>>>>>>>>>> String getName(); > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> /** > >>>>>>>>>>>>>>>>>> Return 3-part identifier. Empty in case of a built-in > >>>>>>>>>>>> function. > >>>>>>>>>>>>>>>>>> */ > >>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() > >>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier { > >>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() { > >>>>>>>>>>>>>>>>>> return Optional.of(this); > >>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>> } > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> class SystemFunctionIdentifier implements > >>>>>> FunctionIdentifier > >>>>>>>>>>>> {...} > >>>>>>>>>>>>>>>>>> WDYT? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> > >>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> +1. LGTM > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang < > >>>>>>>>>>>> [hidden email]> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>> +1 > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>> Terry Wang > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> > 写道: > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> +1 > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>>>> Kurt > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li < > >>>>>>>>>>>>> [hidden email] > >>>>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], > >>>>>> which > >>>>>>>>>>>>>> we've > >>>>>>>>>>>>>>>>>> reached > >>>>>>>>>>>>>>>>>>>>>> consensus in [2]. > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till > 6:30pm > >>>>>>>>>>>>> UTC, > >>>>>>>>>>>>>>> Sep > >>>>>>>>>>>>>>>>>> 26. > >>>>>>>>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>>>>>>>> Bowen > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> [1] > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > >>>>>>>>>>>>>>>>>>>>>> [2] > >>>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>>> > >>>> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613 > >>>>>>>>>>>>>>>>>>> -- > >>>>>>>>>>>>>>>>>>> Xuefu Zhang > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> "In Honey We Trust!" > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>> -- > >>>>>>>>>>>>> Best regards! > >>>>>>>>>>>>> Rui Li > >>>>>>>>>>>>> > >>>> > > |
Free forum by Nabble | Edit this page |