[VOTE] FLIP-57: Rework FunctionCatalog

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Introducing a new term "path" to APIs like
"getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to
users, thus I feel "getSimpleName/getIdentifier" is fine.

To summarize the discussion result.

   - categorize functions into 2 dimensions - system v.s. catalog, non-temp
   v.s. temp - and that give us 4 combinations
   - definition of FunctionIdentifier

                     @PublicEvolving

Class FunctionIdentifier {

    String name;

    ObjectIdentifier oi;

        // for temporary/non-temporary system function
    public FunctionIdentifier of(String name) {  }
    // for temporary/non-temporary catalog function
    public FunctionIdentifier of(ObjectIdentifier identifier) {  }


    Optional<ObjectIdentifier> getIdentifier() {}

    Optional<String> getSimpleName() {}

}


I've updated them to FLIP wiki. Please take a final look. I'll close the
voting if there's no other concern raised within 24 hours.

Cheers

On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi,
>
> I very much agree with Xuefu's summary of the two points, especially on
> the "functionIdentifier doesn't need to reflect the categories".
>
> For the factory methods I think methods of should be enough:
>
>   // for temporary/non-temporary system function
>     public FunctionIdentifier of(String name) {  }
>   // for temporary/non-temporary catalog function
>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
>
> In case of the getters I did not like the method name `getName` in the
> original proposal, as in my opinion it could imply that it can return
> also just the name part of an ObjectIdentifier, which should not be the
> case.
>
> I'm fine with getSimpleName/getIdentifier, but want to throw in few
> other suggestions:
>
> * getShortPath(Identifier)/getLongPath(Identifier),
>
> * getSystemPath(Identifier)/getCatalogPath(Identifier)
>
> +1 to any of the 3 options.
>
> One additional thing the FunctionIdentifier should be a PublicEvolving
> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which
> user might need to access e.g. in a filter pushdown.
>
> I also support the Xuefu's suggestion not to support the "ALL" keyword
> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
> of it  is not part of the FLIP-57, we do not need to agree on that in
> this thread.
>
> Overall I think after updating the FLIP with the outcome of the
> discussion I vote +1 for it.
>
> Best,
>
> Dawid
>
>
> On 02/10/2019 00:21, Xuefu Z wrote:
> > Here are some of my thoughts on the minor debates above:
> >
> > 1. +1 for 4 categories of functions. They are categorized along two
> > dimensions of binary values: X: *temporary* vs non-temporary
> (persistent);
> > Y: *system* vs non-system (so said catalog).
> > 2. In my opinion, class functionIdentifier doesn't really need to reflect
> > the categories of the functions. Instead, we should decouple them to make
> > the API more stable. Thus, my suggestion is:
> >
> > @Internal
> > class FunctionIdentifier {
> >   // for temporary/non-temporary system function
> >     public FunctionIdentifier ofSimpleName(String name) {  }
> >   // for temporary/non-temporary catalog function
> >     public FunctionIdentifier ofIdentifier(ObjectIdentifier
> > identifier){  }
> >     public Optional<String> getSimpleName() {  }
> >     public Optional<ObjectIdentifier> getIdentifier() {  }
> > }
> > 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be:
> >
> > SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.
> >
> > When either keyword is missing, "ALL" is implied along that dimension. We
> > should always limit the search to the system function catalog and the
> > current catalog/DB. I don't see a need of listing functions across
> > different catalogs and databases. (It can be added later if that arises.)
> >
> > Thanks,
> > Xuefu
> >
> > On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote:
> >
> >> Hi Dawid,
> >>
> >> Thanks for bringing the suggestions up. I was prototyping yesterday and
> >> found out those places exactly as what you suggested.
> >>
> >> For CallExpression and UnresolvedCallExpression, I've added them to
> >> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and
> mark
> >> that as a breaking change
> >>
> >> For FunctionIdentifier, the suggested changes LGTM. Just want to bring
> up
> >> an issue on naming. It seems to me how we now name functions categories
> is
> >> a bit unclear and confusing, which is reflected on the suggested APIs -
> in
> >> FunctionIdentifier you lay out, "builtin function" would include builtin
> >> functions and temporary system functions as we are kind of using
> "system"
> >> and "built-in" interchangeably, and "catalog function" would include
> >> catalog functions and temporary functions. I currently can see two
> >> approaches to make it clearer to users.
> >>
> >> 1) Simplify FunctionIdentifier to be the following. As it's internal, we
> >> add comments and explanation to devs on which case the APIs support.
> >> However, I feel this approach would be somehow a bit conflicting to what
> >> you want to achieve for the clarity of APIs
> >>
> >> @Internal
> >> class FunctionIdentifier {
> >>   // for built-in function and temporary system function
> >>     public FunctionIdentifier of(String name) {  }
> >>   // for temporary function and catalog function
> >>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
> >>     public Optional<String> getFunctionName() {  }
> >>     public Optional<ObjectIdentifier> getObjectIdentifier() {  }
> >> }
> >>
> >> 2) We can rename our function categories as following so there'll be
> mainly
> >> just two categories of functions, "system functions" and "catalog
> >> functions", either of which can have temporary ones
> >>
> >>   - built-in functions -> officially rename to "system functions" and
> note
> >> to users that "system" and "built-in" can be used interchangeably. We
> >> prefer "system" because that's the keyword we decided to use in DDL that
> >> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
> >>   - temporary system functions
> >>   - catalog functions
> >>   - temporary functions  -> rename to "temporary catalog function"
> >>
> >> @Internal
> >> class FunctionIdentifier {
> >>   // for temporary/non-temporary system function
> >>     public FunctionIdentifier ofSystemFunction(String name) {  }
> >>   // for temporary/non-temporary catalog function
> >>     public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
> >> identifier){  }
> >>     public Optional<String> getSystemFunctionName() {  }
> >>     public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() {
> }
> >> }
> >>
> >> WDYT?
> >>
> >>
> >> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> wrote:
> >>
> >>> Thanks for the summary Bowen!
> >>>
> >>> Looks good to me.
> >>>
> >>> Cheers,
> >>> Fabian
> >>>
> >>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <
> >> [hidden email]
> >>>> :
> >>>> Hi all,
> >>>>
> >>>> I've updated the FLIP wiki with the following changes:
> >>>>
> >>>> - Lifespan of temp functions are not tied to those of catalogs and
> >>>> databases. Users can create temp functions even though catalogs/dbs in
> >>>> their fully qualified names don't even exist.
> >>>> - some new SQL commands
> >>>>     - "SHOW FUNCTIONS" - list names of temp and non-temp
> >> system/built-in
> >>>> functions, and names of temp and catalog functions in the current
> >> catalog
> >>>> and db
> >>>>     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp
> >> system/built
> >>>> functions, and fully qualified names of temp and catalog functions in
> >> all
> >>>> catalogs and dbs
> >>>>     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of
> >> temp
> >>>> functions in all catalog and db
> >>>>     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp
> >>> system
> >>>> functions
> >>>>
> >>>> Let me know if you have any questions.
> >>>>
> >>>> Seems we have resolved all concerns. If there's no more ones, I'd like
> >> to
> >>>> close the vote by this time tomorrow.
> >>>>
> >>>> Cheers,
> >>>> Bowen
> >>>>
> >>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]>
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I think above are some valid points, and we can adopt the
> >> suggestions.
> >>>>> To elaborate a bit on the new SQL syntax, it would imply that, unlike
> >>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL
> >> [TEMPORARY]
> >>>>> FUNCTIONS" would return functions' fully qualified names with catalog
> >>> and
> >>>>> db names.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]>
> >>> wrote:
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I support Fabian's arguments. In my opinion, temporary objects
> >> should
> >>>>>> just be an additional layer on top of the regular catalog/database
> >>>>>> lookup logic. Thus, a temporary table or function has always highest
> >>>>>> precedence and should be stable within the local session. Otherwise
> >> it
> >>>>>> could magically disappear while someone else is performing
> >>> modifications
> >>>>>> in the catalog.
> >>>>>>
> >>>>>> Furthermore, this feature is very useful for prototyping as users
> >> can
> >>>>>> simply express that a catalog/database is present even through they
> >>>>>> might not have access to it currently.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Timo
> >>>>>>
> >>>>>>
> >>>>>> On 30.09.19 14:57, Fabian Hueske wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> Sorry for the late reply.
> >>>>>>>
> >>>>>>> I think it might lead to confusing situations if temporary
> >> functions
> >>>> (or
> >>>>>>> any temporary db objects for that matter) are bound to the life
> >>> cycle
> >>>>>> of an
> >>>>>>> (external) db/catalog.
> >>>>>>> Imaging a situation where you create a temp function in a db in an
> >>>>>> external
> >>>>>>> catalog and use it but at some point it does not work anymore
> >>> because
> >>>>>> some
> >>>>>>> other dropped the database from the external catalog.
> >>>>>>> Shouldn't temporary objects be only controlled by the owner of a
> >>>>>> session?
> >>>>>>> I agree that creating temp objects in non-existing db/catalogs
> >>> sounds
> >>>> a
> >>>>>> bit
> >>>>>>> strange, but IMO the opposite (the db/catalog must exist for a
> >> temp
> >>>>>>> function to be created/exist) can have significant implications
> >> like
> >>>> the
> >>>>>>> one I described.
> >>>>>>> I think it would be quite easy for users to understand that
> >>> temporary
> >>>>>>> objects are solely owned by them (and their session).
> >>>>>>> The problem of listing temporary objects could be solved by
> >> adding a
> >>>> ALL
> >>>>>>> [TEMPORARY] clause:
> >>>>>>>
> >>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the
> >>>>>>> catalog/database including temporary functions.
> >>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
> >>>>>> regardless
> >>>>>>> of the catalog/database.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Fabian
> >>>>>>>
> >>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
> >>>>>> [hidden email]>:
> >>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can
> >> close
> >>>> the
> >>>>>>>> voting.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]>
> >>>> wrote:
> >>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just
> >> want
> >>> to
> >>>>>> try
> >>>>>>>>> out temporary functions, they can create temporary system
> >>> functions
> >>>>>> which
> >>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow
> >>>>>> temporary
> >>>>>>>>> catalog function is to let users override permanent catalog
> >>>> functions.
> >>>>>>>>> Therefore a temporary function in a non-existing catalog won't
> >>> serve
> >>>>>> that
> >>>>>>>>> purpose. Besides, each session is provided with a default
> >> catalog
> >>>> and
> >>>>>> DB.
> >>>>>>>>> So even if users simply want to create some catalog functions
> >> they
> >>>> can
> >>>>>>>>> forget about after the session, wouldn't the default catalog/DB
> >> be
> >>>>>> enough
> >>>>>>>>> for such experiments?
> >>>>>>>>>
> >>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]>
> >>>> wrote:
> >>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will
> >> first
> >>>>>> make
> >>>>>>>>> sure
> >>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is
> >> triggered
> >>> to
> >>>>>>>>> remove
> >>>>>>>>>> that temp function.
> >>>>>>>>>>
> >>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not
> >>> copying
> >>>>>>>> Hive.
> >>>>>>>>>> IMHO, allowing registering temp functions in nonexistent
> >>> catalog/db
> >>>>>> is
> >>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would
> >> list
> >>>>>> system
> >>>>>>>>>> functions and functions in the current catalog/db, since users
> >>>> cannot
> >>>>>>>>>> designate a nonexistent catalog/db as current ones, how can
> >> they
> >>>> list
> >>>>>>>>>> functions in nonexistent catalog/db? They may end up never
> >>> knowing
> >>>>>> what
> >>>>>>>>>> temp functions they've created unless trying out with queries
> >> or
> >>> we
> >>>>>>>>>> introducing some more nonstandard SQL statements. The same
> >>> applies
> >>>> to
> >>>>>>>>> other
> >>>>>>>>>> temp objects like temp tables.
> >>>>>>>>>>
> >>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> >>>>>>>>>> [hidden email]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Ad. 1
> >>>>>>>>>>> I wouldn't say it is hacky.
> >>>>>>>>>>> Moreover, how do you want ensure that the dB always exists
> >> when
> >>> a
> >>>>>>>>>> temporary
> >>>>>>>>>>> object is used?( in this particular case function). Do you
> >> want
> >>> to
> >>>>>>>>> query
> >>>>>>>>>>> for the database existence whenever e.g a temporary function
> >> is
> >>>>>>>> used? I
> >>>>>>>>>>> think important aspect here is that the database can be
> >> dropped
> >>>> from
> >>>>>>>>>>> external system, not just flink or a different flink session.
> >>>>>>>>>>>
> >>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a
> >>>>>>>> database
> >>>>>>>>>> that
> >>>>>>>>>>> does not exist, that's true. But if you create a temporary
> >> table
> >>>> in
> >>>>>> a
> >>>>>>>>>>> database and drop that database from a different session, you
> >>> can
> >>>>>>>> still
> >>>>>>>>>>> query the previously created temporary table from the original
> >>>>>>>> session.
> >>>>>>>>>> It
> >>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we
> >>>> make
> >>>>>>>>> this
> >>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime
> >> of
> >>> a
> >>>>>>>>>> database
> >>>>>>>>>>> explicit as part of the temporary objects contract? In the end
> >>>> they
> >>>>>>>>> exist
> >>>>>>>>>>> in different layers. Permanent objects & databases in a
> >> catalog
> >>>> (in
> >>>>>>>>> case
> >>>>>>>>>> of
> >>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions.
> >>>> That's
> >>>>>>>>> also
> >>>>>>>>>>> true for the original hive client. The temporary objects live
> >> in
> >>>> the
> >>>>>>>>> hive
> >>>>>>>>>>> client whereas databases are created in the metastore.
> >>>>>>>>>>>
> >>>>>>>>>>> Ad.2
> >>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to
> >> achieve
> >>>>>> here
> >>>>>>>>> is
> >>>>>>>>>> so
> >>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One
> >>>>>> important
> >>>>>>>>>> thing
> >>>>>>>>>>> to remember here is that we need the function identifier to be
> >>>> part
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>> FunctionDefinition object and not only as the result of the
> >>>> function
> >>>>>>>>>>> lookup. At some point we want to be able to store
> >>> QueryOperations
> >>>> in
> >>>>>>>>> the
> >>>>>>>>>>> catalogs. They can contain function calls within which we need
> >>> to
> >>>>>>>> have
> >>>>>>>>>> the
> >>>>>>>>>>> identifier.
> >>>>>>>>>>>
> >>>>>>>>>>> I agree my initial suggestion is over complicated. How about
> >> we
> >>>> have
> >>>>>>>>> just
> >>>>>>>>>>> the FunctionIdentifier as top level class without making the
> >>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the
> >>> same
> >>>>>>>> what
> >>>>>>>>>> you
> >>>>>>>>>>> suggested. The only difference is that it would be a top level
> >>>> class
> >>>>>>>>>> with a
> >>>>>>>>>>> more descriptive name.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]>
> >>> wrote:
> >>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete
> >>>>>>>> alternative
> >>>>>>>>> is
> >>>>>>>>>>> the
> >>>>>>>>>>>> following, basically having separate APIs in FunctionLookup
> >> for
> >>>>>>>>>> ambiguous
> >>>>>>>>>>>> and precise function lookup since planner is able to tell
> >> which
> >>>> API
> >>>>>>>>> to
> >>>>>>>>>>> call
> >>>>>>>>>>>> with parsed queries, and have a unified result:
> >>>>>>>>>>>>
> >>>>>>>>>>>> ```
> >>>>>>>>>>>> class FunctionLookup {
> >>>>>>>>>>>>
> >>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name);
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> class Result {
> >>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ...
> >> }
> >>>>>>>>>>>>      String getName() { ... }
> >>>>>>>>>>>>      // ...
> >>>>>>>>>>>> }
> >>>>>>>>>>>>
> >>>>>>>>>>>> }
> >>>>>>>>>>>> ```
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Bowen
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <
> >> [hidden email]>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>> Hi Dawid,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments
> >> is
> >>>>>>>>>>> important.
> >>>>>>>>>>>>> However, I'm not sure allowing users to register temp
> >>> functions
> >>>>>>>> in
> >>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit
> >>> hacky,
> >>>>>>>>> and
> >>>>>>>>>>>> breaks
> >>>>>>>>>>>>> the current contract between Flink and users that catalog/db
> >>>> must
> >>>>>>>>> be
> >>>>>>>>>>>> valid
> >>>>>>>>>>>>> in order to operate on.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> How about we instead focus on making it convenient to create
> >>>>>>>>>> catalogs?
> >>>>>>>>>>>>> Users actually can already do it with ease via program or
> >> SQL
> >>>> CLI
> >>>>>>>>>> yaml
> >>>>>>>>>>>> file
> >>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency
> >>> nor
> >>>>>>>>>>> external
> >>>>>>>>>>>>> connections. What we can further improve is DDL for
> >> catalogs,
> >>>>>>>> and I
> >>>>>>>>>>>> raised
> >>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement]
> >>> driven
> >>>>>>>> by
> >>>>>>>>>>> Terry
> >>>>>>>>>>>>> now.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> In that case, if users would like to experiment via SQL,
> >> they
> >>>> can
> >>>>>>>>>>> easily
> >>>>>>>>>>>>> create an in memory catalog/database using DDL, then play
> >> with
> >>>>>>>> temp
> >>>>>>>>>>>>> functions.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier
> >>> has
> >>>>>>>> not
> >>>>>>>>>>> been
> >>>>>>>>>>>>> resolved when stack call reaches
> >>>>>>>> FunctionCatalog#lookupFunction(),
> >>>>>>>>>> but
> >>>>>>>>>>>> only
> >>>>>>>>>>>>> been parsed?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
> >>>> with
> >>>>>>>>> the
> >>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass
> >>> of
> >>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Another potentially simpler way is:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ```
> >>>>>>>>>>>>> // in class FunctionLookup
> >>>>>>>>>>>>> class Result {
> >>>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() {
> >> ... }
> >>>>>>>>>>>>>      String getName() { ... }
> >>>>>>>>>>>>>      ...
> >>>>>>>>>>>>> }
> >>>>>>>>>>>>> ```
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> >>>>>>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>> I really like the flip and think it clarifies important
> >>> aspects
> >>>>>>>> of
> >>>>>>>>>> the
> >>>>>>>>>>>>>> system.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take
> >>> much
> >>>>>>>>> time
> >>>>>>>>>> to
> >>>>>>>>>>>>>> agree on.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the
> >>>>>>>> existence
> >>>>>>>>> of
> >>>>>>>>>>>>>> cat/db
> >>>>>>>>>>>>>> for temporary functions? That means not to check it, so
> >> e.g.
> >>>>>>>> it's
> >>>>>>>>>>>> possible
> >>>>>>>>>>>>>> to create a temporary function in a database that does not
> >>>>>>>> exist.
> >>>>>>>>> I
> >>>>>>>>>>>> think
> >>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform
> >>>>>>>>>> experiments
> >>>>>>>>>>>> but
> >>>>>>>>>>>>>> does not have access to the db yet or temporarily does not
> >>> have
> >>>>>>>>>>>> connection
> >>>>>>>>>>>>>> to a catalog.
> >>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
> >>>> loosen
> >>>>>>>>> the
> >>>>>>>>>>>>>> requirements for all catalog objects such as tables, views,
> >>>>>>>> types
> >>>>>>>>>> just
> >>>>>>>>>>>> for
> >>>>>>>>>>>>>> the functions? It's really important later on from e.g the
> >>>>>>>>>>>> serializability
> >>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier
> >> is
> >>>>>>>> that
> >>>>>>>>> we
> >>>>>>>>>>>> know
> >>>>>>>>>>>>>> that it has been resolved. The suggested changes break that
> >>>>>>>>>>> assumption.
> >>>>>>>>>>>>>> What do you think about adding an interface
> >>> FunctionIdentifier
> >>>> {
> >>>>>>>>>>>>>> String getName();
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> /**
> >>>>>>>>>>>>>>    Return 3-part identifier. Empty in case of a built-in
> >>>>>>>> function.
> >>>>>>>>>>>>>> */
> >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
> >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
> >>>>>>>>>>>>>>   return Optional.of(this);
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> class SystemFunctionIdentifier implements
> >> FunctionIdentifier
> >>>>>>>> {...}
> >>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]>
> >>> wrote:
> >>>>>>>>>>>>>>> +1. LGTM
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
> >>>>>>>> [hidden email]>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>> Terry Wang
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> >>>>>>>>> [hidden email]
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1],
> >> which
> >>>>>>>>>> we've
> >>>>>>>>>>>>>> reached
> >>>>>>>>>>>>>>>>>> consensus in [2].
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
> >>>>>>>>> UTC,
> >>>>>>>>>>> Sep
> >>>>>>>>>>>>>> 26.
> >>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>> Bowen
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> >>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> >>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>> Xuefu Zhang
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Best regards!
> >>>>>>>>> Rui Li
> >>>>>>>>>
> >>>>>>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
I'm glad to announce that the community has accepted the design of FLIP-57,
and we are moving forward to implementing it.

Thanks everyone!

On Wed, Oct 2, 2019 at 11:01 AM Bowen Li <[hidden email]> wrote:

> Introducing a new term "path" to APIs like
> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to
> users, thus I feel "getSimpleName/getIdentifier" is fine.
>
> To summarize the discussion result.
>
>    - categorize functions into 2 dimensions - system v.s. catalog,
>    non-temp v.s. temp - and that give us 4 combinations
>    - definition of FunctionIdentifier
>
>                      @PublicEvolving
>
> Class FunctionIdentifier {
>
>     String name;
>
>     ObjectIdentifier oi;
>
>         // for temporary/non-temporary system function
>     public FunctionIdentifier of(String name) {  }
>     // for temporary/non-temporary catalog function
>     public FunctionIdentifier of(ObjectIdentifier identifier) {  }
>
>
>     Optional<ObjectIdentifier> getIdentifier() {}
>
>     Optional<String> getSimpleName() {}
>
> }
>
>
> I've updated them to FLIP wiki. Please take a final look. I'll close the
> voting if there's no other concern raised within 24 hours.
>
> Cheers
>
> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <[hidden email]>
> wrote:
>
>> Hi,
>>
>> I very much agree with Xuefu's summary of the two points, especially on
>> the "functionIdentifier doesn't need to reflect the categories".
>>
>> For the factory methods I think methods of should be enough:
>>
>>   // for temporary/non-temporary system function
>>     public FunctionIdentifier of(String name) {  }
>>   // for temporary/non-temporary catalog function
>>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
>>
>> In case of the getters I did not like the method name `getName` in the
>> original proposal, as in my opinion it could imply that it can return
>> also just the name part of an ObjectIdentifier, which should not be the
>> case.
>>
>> I'm fine with getSimpleName/getIdentifier, but want to throw in few
>> other suggestions:
>>
>> * getShortPath(Identifier)/getLongPath(Identifier),
>>
>> * getSystemPath(Identifier)/getCatalogPath(Identifier)
>>
>> +1 to any of the 3 options.
>>
>> One additional thing the FunctionIdentifier should be a PublicEvolving
>> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which
>> user might need to access e.g. in a filter pushdown.
>>
>> I also support the Xuefu's suggestion not to support the "ALL" keyword
>> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
>> of it  is not part of the FLIP-57, we do not need to agree on that in
>> this thread.
>>
>> Overall I think after updating the FLIP with the outcome of the
>> discussion I vote +1 for it.
>>
>> Best,
>>
>> Dawid
>>
>>
>> On 02/10/2019 00:21, Xuefu Z wrote:
>> > Here are some of my thoughts on the minor debates above:
>> >
>> > 1. +1 for 4 categories of functions. They are categorized along two
>> > dimensions of binary values: X: *temporary* vs non-temporary
>> (persistent);
>> > Y: *system* vs non-system (so said catalog).
>> > 2. In my opinion, class functionIdentifier doesn't really need to
>> reflect
>> > the categories of the functions. Instead, we should decouple them to
>> make
>> > the API more stable. Thus, my suggestion is:
>> >
>> > @Internal
>> > class FunctionIdentifier {
>> >   // for temporary/non-temporary system function
>> >     public FunctionIdentifier ofSimpleName(String name) {  }
>> >   // for temporary/non-temporary catalog function
>> >     public FunctionIdentifier ofIdentifier(ObjectIdentifier
>> > identifier){  }
>> >     public Optional<String> getSimpleName() {  }
>> >     public Optional<ObjectIdentifier> getIdentifier() {  }
>> > }
>> > 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be:
>> >
>> > SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.
>> >
>> > When either keyword is missing, "ALL" is implied along that dimension.
>> We
>> > should always limit the search to the system function catalog and the
>> > current catalog/DB. I don't see a need of listing functions across
>> > different catalogs and databases. (It can be added later if that
>> arises.)
>> >
>> > Thanks,
>> > Xuefu
>> >
>> > On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote:
>> >
>> >> Hi Dawid,
>> >>
>> >> Thanks for bringing the suggestions up. I was prototyping yesterday and
>> >> found out those places exactly as what you suggested.
>> >>
>> >> For CallExpression and UnresolvedCallExpression, I've added them to
>> >> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and
>> mark
>> >> that as a breaking change
>> >>
>> >> For FunctionIdentifier, the suggested changes LGTM. Just want to bring
>> up
>> >> an issue on naming. It seems to me how we now name functions
>> categories is
>> >> a bit unclear and confusing, which is reflected on the suggested APIs
>> - in
>> >> FunctionIdentifier you lay out, "builtin function" would include
>> builtin
>> >> functions and temporary system functions as we are kind of using
>> "system"
>> >> and "built-in" interchangeably, and "catalog function" would include
>> >> catalog functions and temporary functions. I currently can see two
>> >> approaches to make it clearer to users.
>> >>
>> >> 1) Simplify FunctionIdentifier to be the following. As it's internal,
>> we
>> >> add comments and explanation to devs on which case the APIs support.
>> >> However, I feel this approach would be somehow a bit conflicting to
>> what
>> >> you want to achieve for the clarity of APIs
>> >>
>> >> @Internal
>> >> class FunctionIdentifier {
>> >>   // for built-in function and temporary system function
>> >>     public FunctionIdentifier of(String name) {  }
>> >>   // for temporary function and catalog function
>> >>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
>> >>     public Optional<String> getFunctionName() {  }
>> >>     public Optional<ObjectIdentifier> getObjectIdentifier() {  }
>> >> }
>> >>
>> >> 2) We can rename our function categories as following so there'll be
>> mainly
>> >> just two categories of functions, "system functions" and "catalog
>> >> functions", either of which can have temporary ones
>> >>
>> >>   - built-in functions -> officially rename to "system functions" and
>> note
>> >> to users that "system" and "built-in" can be used interchangeably. We
>> >> prefer "system" because that's the keyword we decided to use in DDL
>> that
>> >> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
>> >>   - temporary system functions
>> >>   - catalog functions
>> >>   - temporary functions  -> rename to "temporary catalog function"
>> >>
>> >> @Internal
>> >> class FunctionIdentifier {
>> >>   // for temporary/non-temporary system function
>> >>     public FunctionIdentifier ofSystemFunction(String name) {  }
>> >>   // for temporary/non-temporary catalog function
>> >>     public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
>> >> identifier){  }
>> >>     public Optional<String> getSystemFunctionName() {  }
>> >>     public Optional<ObjectIdentifier> getCatalogFunctionIdentifier()
>> {  }
>> >> }
>> >>
>> >> WDYT?
>> >>
>> >>
>> >> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]>
>> wrote:
>> >>
>> >>> Thanks for the summary Bowen!
>> >>>
>> >>> Looks good to me.
>> >>>
>> >>> Cheers,
>> >>> Fabian
>> >>>
>> >>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <
>> >> [hidden email]
>> >>>> :
>> >>>> Hi all,
>> >>>>
>> >>>> I've updated the FLIP wiki with the following changes:
>> >>>>
>> >>>> - Lifespan of temp functions are not tied to those of catalogs and
>> >>>> databases. Users can create temp functions even though catalogs/dbs
>> in
>> >>>> their fully qualified names don't even exist.
>> >>>> - some new SQL commands
>> >>>>     - "SHOW FUNCTIONS" - list names of temp and non-temp
>> >> system/built-in
>> >>>> functions, and names of temp and catalog functions in the current
>> >> catalog
>> >>>> and db
>> >>>>     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp
>> >> system/built
>> >>>> functions, and fully qualified names of temp and catalog functions in
>> >> all
>> >>>> catalogs and dbs
>> >>>>     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of
>> >> temp
>> >>>> functions in all catalog and db
>> >>>>     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp
>> >>> system
>> >>>> functions
>> >>>>
>> >>>> Let me know if you have any questions.
>> >>>>
>> >>>> Seems we have resolved all concerns. If there's no more ones, I'd
>> like
>> >> to
>> >>>> close the vote by this time tomorrow.
>> >>>>
>> >>>> Cheers,
>> >>>> Bowen
>> >>>>
>> >>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]>
>> wrote:
>> >>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I think above are some valid points, and we can adopt the
>> >> suggestions.
>> >>>>> To elaborate a bit on the new SQL syntax, it would imply that,
>> unlike
>> >>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL
>> >> [TEMPORARY]
>> >>>>> FUNCTIONS" would return functions' fully qualified names with
>> catalog
>> >>> and
>> >>>>> db names.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]>
>> >>> wrote:
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> I support Fabian's arguments. In my opinion, temporary objects
>> >> should
>> >>>>>> just be an additional layer on top of the regular catalog/database
>> >>>>>> lookup logic. Thus, a temporary table or function has always
>> highest
>> >>>>>> precedence and should be stable within the local session. Otherwise
>> >> it
>> >>>>>> could magically disappear while someone else is performing
>> >>> modifications
>> >>>>>> in the catalog.
>> >>>>>>
>> >>>>>> Furthermore, this feature is very useful for prototyping as users
>> >> can
>> >>>>>> simply express that a catalog/database is present even through they
>> >>>>>> might not have access to it currently.
>> >>>>>>
>> >>>>>> Regards,
>> >>>>>> Timo
>> >>>>>>
>> >>>>>>
>> >>>>>> On 30.09.19 14:57, Fabian Hueske wrote:
>> >>>>>>> Hi all,
>> >>>>>>>
>> >>>>>>> Sorry for the late reply.
>> >>>>>>>
>> >>>>>>> I think it might lead to confusing situations if temporary
>> >> functions
>> >>>> (or
>> >>>>>>> any temporary db objects for that matter) are bound to the life
>> >>> cycle
>> >>>>>> of an
>> >>>>>>> (external) db/catalog.
>> >>>>>>> Imaging a situation where you create a temp function in a db in an
>> >>>>>> external
>> >>>>>>> catalog and use it but at some point it does not work anymore
>> >>> because
>> >>>>>> some
>> >>>>>>> other dropped the database from the external catalog.
>> >>>>>>> Shouldn't temporary objects be only controlled by the owner of a
>> >>>>>> session?
>> >>>>>>> I agree that creating temp objects in non-existing db/catalogs
>> >>> sounds
>> >>>> a
>> >>>>>> bit
>> >>>>>>> strange, but IMO the opposite (the db/catalog must exist for a
>> >> temp
>> >>>>>>> function to be created/exist) can have significant implications
>> >> like
>> >>>> the
>> >>>>>>> one I described.
>> >>>>>>> I think it would be quite easy for users to understand that
>> >>> temporary
>> >>>>>>> objects are solely owned by them (and their session).
>> >>>>>>> The problem of listing temporary objects could be solved by
>> >> adding a
>> >>>> ALL
>> >>>>>>> [TEMPORARY] clause:
>> >>>>>>>
>> >>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the
>> >>>>>>> catalog/database including temporary functions.
>> >>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
>> >>>>>> regardless
>> >>>>>>> of the catalog/database.
>> >>>>>>>
>> >>>>>>> Best,
>> >>>>>>> Fabian
>> >>>>>>>
>> >>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
>> >>>>>> [hidden email]>:
>> >>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can
>> >> close
>> >>>> the
>> >>>>>>>> voting.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]>
>> >>>> wrote:
>> >>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just
>> >> want
>> >>> to
>> >>>>>> try
>> >>>>>>>>> out temporary functions, they can create temporary system
>> >>> functions
>> >>>>>> which
>> >>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow
>> >>>>>> temporary
>> >>>>>>>>> catalog function is to let users override permanent catalog
>> >>>> functions.
>> >>>>>>>>> Therefore a temporary function in a non-existing catalog won't
>> >>> serve
>> >>>>>> that
>> >>>>>>>>> purpose. Besides, each session is provided with a default
>> >> catalog
>> >>>> and
>> >>>>>> DB.
>> >>>>>>>>> So even if users simply want to create some catalog functions
>> >> they
>> >>>> can
>> >>>>>>>>> forget about after the session, wouldn't the default catalog/DB
>> >> be
>> >>>>>> enough
>> >>>>>>>>> for such experiments?
>> >>>>>>>>>
>> >>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]>
>> >>>> wrote:
>> >>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will
>> >> first
>> >>>>>> make
>> >>>>>>>>> sure
>> >>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is
>> >> triggered
>> >>> to
>> >>>>>>>>> remove
>> >>>>>>>>>> that temp function.
>> >>>>>>>>>>
>> >>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not
>> >>> copying
>> >>>>>>>> Hive.
>> >>>>>>>>>> IMHO, allowing registering temp functions in nonexistent
>> >>> catalog/db
>> >>>>>> is
>> >>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would
>> >> list
>> >>>>>> system
>> >>>>>>>>>> functions and functions in the current catalog/db, since users
>> >>>> cannot
>> >>>>>>>>>> designate a nonexistent catalog/db as current ones, how can
>> >> they
>> >>>> list
>> >>>>>>>>>> functions in nonexistent catalog/db? They may end up never
>> >>> knowing
>> >>>>>> what
>> >>>>>>>>>> temp functions they've created unless trying out with queries
>> >> or
>> >>> we
>> >>>>>>>>>> introducing some more nonstandard SQL statements. The same
>> >>> applies
>> >>>> to
>> >>>>>>>>> other
>> >>>>>>>>>> temp objects like temp tables.
>> >>>>>>>>>>
>> >>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
>> >>>>>>>>>> [hidden email]>
>> >>>>>>>>>> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> Ad. 1
>> >>>>>>>>>>> I wouldn't say it is hacky.
>> >>>>>>>>>>> Moreover, how do you want ensure that the dB always exists
>> >> when
>> >>> a
>> >>>>>>>>>> temporary
>> >>>>>>>>>>> object is used?( in this particular case function). Do you
>> >> want
>> >>> to
>> >>>>>>>>> query
>> >>>>>>>>>>> for the database existence whenever e.g a temporary function
>> >> is
>> >>>>>>>> used? I
>> >>>>>>>>>>> think important aspect here is that the database can be
>> >> dropped
>> >>>> from
>> >>>>>>>>>>> external system, not just flink or a different flink session.
>> >>>>>>>>>>>
>> >>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a
>> >>>>>>>> database
>> >>>>>>>>>> that
>> >>>>>>>>>>> does not exist, that's true. But if you create a temporary
>> >> table
>> >>>> in
>> >>>>>> a
>> >>>>>>>>>>> database and drop that database from a different session, you
>> >>> can
>> >>>>>>>> still
>> >>>>>>>>>>> query the previously created temporary table from the original
>> >>>>>>>> session.
>> >>>>>>>>>> It
>> >>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we
>> >>>> make
>> >>>>>>>>> this
>> >>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime
>> >> of
>> >>> a
>> >>>>>>>>>> database
>> >>>>>>>>>>> explicit as part of the temporary objects contract? In the end
>> >>>> they
>> >>>>>>>>> exist
>> >>>>>>>>>>> in different layers. Permanent objects & databases in a
>> >> catalog
>> >>>> (in
>> >>>>>>>>> case
>> >>>>>>>>>> of
>> >>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions.
>> >>>> That's
>> >>>>>>>>> also
>> >>>>>>>>>>> true for the original hive client. The temporary objects live
>> >> in
>> >>>> the
>> >>>>>>>>> hive
>> >>>>>>>>>>> client whereas databases are created in the metastore.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Ad.2
>> >>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to
>> >> achieve
>> >>>>>> here
>> >>>>>>>>> is
>> >>>>>>>>>> so
>> >>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One
>> >>>>>> important
>> >>>>>>>>>> thing
>> >>>>>>>>>>> to remember here is that we need the function identifier to be
>> >>>> part
>> >>>>>>>> of
>> >>>>>>>>>> the
>> >>>>>>>>>>> FunctionDefinition object and not only as the result of the
>> >>>> function
>> >>>>>>>>>>> lookup. At some point we want to be able to store
>> >>> QueryOperations
>> >>>> in
>> >>>>>>>>> the
>> >>>>>>>>>>> catalogs. They can contain function calls within which we need
>> >>> to
>> >>>>>>>> have
>> >>>>>>>>>> the
>> >>>>>>>>>>> identifier.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I agree my initial suggestion is over complicated. How about
>> >> we
>> >>>> have
>> >>>>>>>>> just
>> >>>>>>>>>>> the FunctionIdentifier as top level class without making the
>> >>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the
>> >>> same
>> >>>>>>>> what
>> >>>>>>>>>> you
>> >>>>>>>>>>> suggested. The only difference is that it would be a top level
>> >>>> class
>> >>>>>>>>>> with a
>> >>>>>>>>>>> more descriptive name.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]>
>> >>> wrote:
>> >>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete
>> >>>>>>>> alternative
>> >>>>>>>>> is
>> >>>>>>>>>>> the
>> >>>>>>>>>>>> following, basically having separate APIs in FunctionLookup
>> >> for
>> >>>>>>>>>> ambiguous
>> >>>>>>>>>>>> and precise function lookup since planner is able to tell
>> >> which
>> >>>> API
>> >>>>>>>>> to
>> >>>>>>>>>>> call
>> >>>>>>>>>>>> with parsed queries, and have a unified result:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> ```
>> >>>>>>>>>>>> class FunctionLookup {
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name);
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> class Result {
>> >>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ...
>> >> }
>> >>>>>>>>>>>>      String getName() { ... }
>> >>>>>>>>>>>>      // ...
>> >>>>>>>>>>>> }
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> }
>> >>>>>>>>>>>> ```
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>> Bowen
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <
>> >> [hidden email]>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>>> Hi Dawid,
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments
>> >> is
>> >>>>>>>>>>> important.
>> >>>>>>>>>>>>> However, I'm not sure allowing users to register temp
>> >>> functions
>> >>>>>>>> in
>> >>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit
>> >>> hacky,
>> >>>>>>>>> and
>> >>>>>>>>>>>> breaks
>> >>>>>>>>>>>>> the current contract between Flink and users that catalog/db
>> >>>> must
>> >>>>>>>>> be
>> >>>>>>>>>>>> valid
>> >>>>>>>>>>>>> in order to operate on.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> How about we instead focus on making it convenient to create
>> >>>>>>>>>> catalogs?
>> >>>>>>>>>>>>> Users actually can already do it with ease via program or
>> >> SQL
>> >>>> CLI
>> >>>>>>>>>> yaml
>> >>>>>>>>>>>> file
>> >>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency
>> >>> nor
>> >>>>>>>>>>> external
>> >>>>>>>>>>>>> connections. What we can further improve is DDL for
>> >> catalogs,
>> >>>>>>>> and I
>> >>>>>>>>>>>> raised
>> >>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement]
>> >>> driven
>> >>>>>>>> by
>> >>>>>>>>>>> Terry
>> >>>>>>>>>>>>> now.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> In that case, if users would like to experiment via SQL,
>> >> they
>> >>>> can
>> >>>>>>>>>>> easily
>> >>>>>>>>>>>>> create an in memory catalog/database using DDL, then play
>> >> with
>> >>>>>>>> temp
>> >>>>>>>>>>>>> functions.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier
>> >>> has
>> >>>>>>>> not
>> >>>>>>>>>>> been
>> >>>>>>>>>>>>> resolved when stack call reaches
>> >>>>>>>> FunctionCatalog#lookupFunction(),
>> >>>>>>>>>> but
>> >>>>>>>>>>>> only
>> >>>>>>>>>>>>> been parsed?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
>> >>>> with
>> >>>>>>>>> the
>> >>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass
>> >>> of
>> >>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Another potentially simpler way is:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> ```
>> >>>>>>>>>>>>> // in class FunctionLookup
>> >>>>>>>>>>>>> class Result {
>> >>>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() {
>> >> ... }
>> >>>>>>>>>>>>>      String getName() { ... }
>> >>>>>>>>>>>>>      ...
>> >>>>>>>>>>>>> }
>> >>>>>>>>>>>>> ```
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> WDYT?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
>> >>>>>>>>>>>>> [hidden email]> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Hi,
>> >>>>>>>>>>>>>> I really like the flip and think it clarifies important
>> >>> aspects
>> >>>>>>>> of
>> >>>>>>>>>> the
>> >>>>>>>>>>>>>> system.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take
>> >>> much
>> >>>>>>>>> time
>> >>>>>>>>>> to
>> >>>>>>>>>>>>>> agree on.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the
>> >>>>>>>> existence
>> >>>>>>>>> of
>> >>>>>>>>>>>>>> cat/db
>> >>>>>>>>>>>>>> for temporary functions? That means not to check it, so
>> >> e.g.
>> >>>>>>>> it's
>> >>>>>>>>>>>> possible
>> >>>>>>>>>>>>>> to create a temporary function in a database that does not
>> >>>>>>>> exist.
>> >>>>>>>>> I
>> >>>>>>>>>>>> think
>> >>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform
>> >>>>>>>>>> experiments
>> >>>>>>>>>>>> but
>> >>>>>>>>>>>>>> does not have access to the db yet or temporarily does not
>> >>> have
>> >>>>>>>>>>>> connection
>> >>>>>>>>>>>>>> to a catalog.
>> >>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
>> >>>> loosen
>> >>>>>>>>> the
>> >>>>>>>>>>>>>> requirements for all catalog objects such as tables, views,
>> >>>>>>>> types
>> >>>>>>>>>> just
>> >>>>>>>>>>>> for
>> >>>>>>>>>>>>>> the functions? It's really important later on from e.g the
>> >>>>>>>>>>>> serializability
>> >>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier
>> >> is
>> >>>>>>>> that
>> >>>>>>>>> we
>> >>>>>>>>>>>> know
>> >>>>>>>>>>>>>> that it has been resolved. The suggested changes break that
>> >>>>>>>>>>> assumption.
>> >>>>>>>>>>>>>> What do you think about adding an interface
>> >>> FunctionIdentifier
>> >>>> {
>> >>>>>>>>>>>>>> String getName();
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> /**
>> >>>>>>>>>>>>>>    Return 3-part identifier. Empty in case of a built-in
>> >>>>>>>> function.
>> >>>>>>>>>>>>>> */
>> >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
>> >>>>>>>>>>>>>> }
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
>> >>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
>> >>>>>>>>>>>>>>   return Optional.of(this);
>> >>>>>>>>>>>>>> }
>> >>>>>>>>>>>>>> }
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> class SystemFunctionIdentifier implements
>> >> FunctionIdentifier
>> >>>>>>>> {...}
>> >>>>>>>>>>>>>> WDYT?
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]>
>> >>> wrote:
>> >>>>>>>>>>>>>>> +1. LGTM
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
>> >>>>>>>> [hidden email]>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>> +1
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>> Terry Wang
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> +1
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Best,
>> >>>>>>>>>>>>>>>>> Kurt
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
>> >>>>>>>>> [hidden email]
>> >>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1],
>> >> which
>> >>>>>>>>>> we've
>> >>>>>>>>>>>>>> reached
>> >>>>>>>>>>>>>>>>>> consensus in [2].
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
>> >>>>>>>>> UTC,
>> >>>>>>>>>>> Sep
>> >>>>>>>>>>>>>> 26.
>> >>>>>>>>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>>>>>>>> Bowen
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> [1]
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>> >>>>>>>>>>>>>>>>>> [2]
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>> >>>>>>>>>>>>>>> --
>> >>>>>>>>>>>>>>> Xuefu Zhang
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> "In Honey We Trust!"
>> >>>>>>>>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Best regards!
>> >>>>>>>>> Rui Li
>> >>>>>>>>>
>> >>>>>>
>> >
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Aljoscha Krettek-2
Hi,

I see there was quite some discussion and changes on the FLIP after this VOTE was started. I would suggest to start a new voting thread on the current state of the FLIP (keeping in mind that a FLIP vote needs at least three committer/PMC votes).

For the future, we should probably keep discussion to the [DISCUSS] thread and use the vote thread only for voting.

Best,
Aljoscha

> On 3. Oct 2019, at 21:17, Bowen Li <[hidden email]> wrote:
>
> I'm glad to announce that the community has accepted the design of FLIP-57,
> and we are moving forward to implementing it.
>
> Thanks everyone!
>
> On Wed, Oct 2, 2019 at 11:01 AM Bowen Li <[hidden email]> wrote:
>
>> Introducing a new term "path" to APIs like
>> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to
>> users, thus I feel "getSimpleName/getIdentifier" is fine.
>>
>> To summarize the discussion result.
>>
>>   - categorize functions into 2 dimensions - system v.s. catalog,
>>   non-temp v.s. temp - and that give us 4 combinations
>>   - definition of FunctionIdentifier
>>
>>                     @PublicEvolving
>>
>> Class FunctionIdentifier {
>>
>>    String name;
>>
>>    ObjectIdentifier oi;
>>
>>        // for temporary/non-temporary system function
>>    public FunctionIdentifier of(String name) {  }
>>    // for temporary/non-temporary catalog function
>>    public FunctionIdentifier of(ObjectIdentifier identifier) {  }
>>
>>
>>    Optional<ObjectIdentifier> getIdentifier() {}
>>
>>    Optional<String> getSimpleName() {}
>>
>> }
>>
>>
>> I've updated them to FLIP wiki. Please take a final look. I'll close the
>> voting if there's no other concern raised within 24 hours.
>>
>> Cheers
>>
>> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <[hidden email]>
>> wrote:
>>
>>> Hi,
>>>
>>> I very much agree with Xuefu's summary of the two points, especially on
>>> the "functionIdentifier doesn't need to reflect the categories".
>>>
>>> For the factory methods I think methods of should be enough:
>>>
>>>  // for temporary/non-temporary system function
>>>    public FunctionIdentifier of(String name) {  }
>>>  // for temporary/non-temporary catalog function
>>>    public FunctionIdentifier of(ObjectIdentifier identifier){  }
>>>
>>> In case of the getters I did not like the method name `getName` in the
>>> original proposal, as in my opinion it could imply that it can return
>>> also just the name part of an ObjectIdentifier, which should not be the
>>> case.
>>>
>>> I'm fine with getSimpleName/getIdentifier, but want to throw in few
>>> other suggestions:
>>>
>>> * getShortPath(Identifier)/getLongPath(Identifier),
>>>
>>> * getSystemPath(Identifier)/getCatalogPath(Identifier)
>>>
>>> +1 to any of the 3 options.
>>>
>>> One additional thing the FunctionIdentifier should be a PublicEvolving
>>> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which
>>> user might need to access e.g. in a filter pushdown.
>>>
>>> I also support the Xuefu's suggestion not to support the "ALL" keyword
>>> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
>>> of it  is not part of the FLIP-57, we do not need to agree on that in
>>> this thread.
>>>
>>> Overall I think after updating the FLIP with the outcome of the
>>> discussion I vote +1 for it.
>>>
>>> Best,
>>>
>>> Dawid
>>>
>>>
>>> On 02/10/2019 00:21, Xuefu Z wrote:
>>>> Here are some of my thoughts on the minor debates above:
>>>>
>>>> 1. +1 for 4 categories of functions. They are categorized along two
>>>> dimensions of binary values: X: *temporary* vs non-temporary
>>> (persistent);
>>>> Y: *system* vs non-system (so said catalog).
>>>> 2. In my opinion, class functionIdentifier doesn't really need to
>>> reflect
>>>> the categories of the functions. Instead, we should decouple them to
>>> make
>>>> the API more stable. Thus, my suggestion is:
>>>>
>>>> @Internal
>>>> class FunctionIdentifier {
>>>>  // for temporary/non-temporary system function
>>>>    public FunctionIdentifier ofSimpleName(String name) {  }
>>>>  // for temporary/non-temporary catalog function
>>>>    public FunctionIdentifier ofIdentifier(ObjectIdentifier
>>>> identifier){  }
>>>>    public Optional<String> getSimpleName() {  }
>>>>    public Optional<ObjectIdentifier> getIdentifier() {  }
>>>> }
>>>> 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be:
>>>>
>>>> SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.
>>>>
>>>> When either keyword is missing, "ALL" is implied along that dimension.
>>> We
>>>> should always limit the search to the system function catalog and the
>>>> current catalog/DB. I don't see a need of listing functions across
>>>> different catalogs and databases. (It can be added later if that
>>> arises.)
>>>>
>>>> Thanks,
>>>> Xuefu
>>>>
>>>> On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote:
>>>>
>>>>> Hi Dawid,
>>>>>
>>>>> Thanks for bringing the suggestions up. I was prototyping yesterday and
>>>>> found out those places exactly as what you suggested.
>>>>>
>>>>> For CallExpression and UnresolvedCallExpression, I've added them to
>>>>> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and
>>> mark
>>>>> that as a breaking change
>>>>>
>>>>> For FunctionIdentifier, the suggested changes LGTM. Just want to bring
>>> up
>>>>> an issue on naming. It seems to me how we now name functions
>>> categories is
>>>>> a bit unclear and confusing, which is reflected on the suggested APIs
>>> - in
>>>>> FunctionIdentifier you lay out, "builtin function" would include
>>> builtin
>>>>> functions and temporary system functions as we are kind of using
>>> "system"
>>>>> and "built-in" interchangeably, and "catalog function" would include
>>>>> catalog functions and temporary functions. I currently can see two
>>>>> approaches to make it clearer to users.
>>>>>
>>>>> 1) Simplify FunctionIdentifier to be the following. As it's internal,
>>> we
>>>>> add comments and explanation to devs on which case the APIs support.
>>>>> However, I feel this approach would be somehow a bit conflicting to
>>> what
>>>>> you want to achieve for the clarity of APIs
>>>>>
>>>>> @Internal
>>>>> class FunctionIdentifier {
>>>>>  // for built-in function and temporary system function
>>>>>    public FunctionIdentifier of(String name) {  }
>>>>>  // for temporary function and catalog function
>>>>>    public FunctionIdentifier of(ObjectIdentifier identifier){  }
>>>>>    public Optional<String> getFunctionName() {  }
>>>>>    public Optional<ObjectIdentifier> getObjectIdentifier() {  }
>>>>> }
>>>>>
>>>>> 2) We can rename our function categories as following so there'll be
>>> mainly
>>>>> just two categories of functions, "system functions" and "catalog
>>>>> functions", either of which can have temporary ones
>>>>>
>>>>>  - built-in functions -> officially rename to "system functions" and
>>> note
>>>>> to users that "system" and "built-in" can be used interchangeably. We
>>>>> prefer "system" because that's the keyword we decided to use in DDL
>>> that
>>>>> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
>>>>>  - temporary system functions
>>>>>  - catalog functions
>>>>>  - temporary functions  -> rename to "temporary catalog function"
>>>>>
>>>>> @Internal
>>>>> class FunctionIdentifier {
>>>>>  // for temporary/non-temporary system function
>>>>>    public FunctionIdentifier ofSystemFunction(String name) {  }
>>>>>  // for temporary/non-temporary catalog function
>>>>>    public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
>>>>> identifier){  }
>>>>>    public Optional<String> getSystemFunctionName() {  }
>>>>>    public Optional<ObjectIdentifier> getCatalogFunctionIdentifier()
>>> {  }
>>>>> }
>>>>>
>>>>> WDYT?
>>>>>
>>>>>
>>>>> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]>
>>> wrote:
>>>>>
>>>>>> Thanks for the summary Bowen!
>>>>>>
>>>>>> Looks good to me.
>>>>>>
>>>>>> Cheers,
>>>>>> Fabian
>>>>>>
>>>>>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <
>>>>> [hidden email]
>>>>>>> :
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've updated the FLIP wiki with the following changes:
>>>>>>>
>>>>>>> - Lifespan of temp functions are not tied to those of catalogs and
>>>>>>> databases. Users can create temp functions even though catalogs/dbs
>>> in
>>>>>>> their fully qualified names don't even exist.
>>>>>>> - some new SQL commands
>>>>>>>    - "SHOW FUNCTIONS" - list names of temp and non-temp
>>>>> system/built-in
>>>>>>> functions, and names of temp and catalog functions in the current
>>>>> catalog
>>>>>>> and db
>>>>>>>    - "SHOW ALL FUNCTIONS" - list names of temp and non-temp
>>>>> system/built
>>>>>>> functions, and fully qualified names of temp and catalog functions in
>>>>> all
>>>>>>> catalogs and dbs
>>>>>>>    - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of
>>>>> temp
>>>>>>> functions in all catalog and db
>>>>>>>    - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp
>>>>>> system
>>>>>>> functions
>>>>>>>
>>>>>>> Let me know if you have any questions.
>>>>>>>
>>>>>>> Seems we have resolved all concerns. If there's no more ones, I'd
>>> like
>>>>> to
>>>>>>> close the vote by this time tomorrow.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Bowen
>>>>>>>
>>>>>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]>
>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I think above are some valid points, and we can adopt the
>>>>> suggestions.
>>>>>>>> To elaborate a bit on the new SQL syntax, it would imply that,
>>> unlike
>>>>>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL
>>>>> [TEMPORARY]
>>>>>>>> FUNCTIONS" would return functions' fully qualified names with
>>> catalog
>>>>>> and
>>>>>>>> db names.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]>
>>>>>> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I support Fabian's arguments. In my opinion, temporary objects
>>>>> should
>>>>>>>>> just be an additional layer on top of the regular catalog/database
>>>>>>>>> lookup logic. Thus, a temporary table or function has always
>>> highest
>>>>>>>>> precedence and should be stable within the local session. Otherwise
>>>>> it
>>>>>>>>> could magically disappear while someone else is performing
>>>>>> modifications
>>>>>>>>> in the catalog.
>>>>>>>>>
>>>>>>>>> Furthermore, this feature is very useful for prototyping as users
>>>>> can
>>>>>>>>> simply express that a catalog/database is present even through they
>>>>>>>>> might not have access to it currently.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Timo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 30.09.19 14:57, Fabian Hueske wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Sorry for the late reply.
>>>>>>>>>>
>>>>>>>>>> I think it might lead to confusing situations if temporary
>>>>> functions
>>>>>>> (or
>>>>>>>>>> any temporary db objects for that matter) are bound to the life
>>>>>> cycle
>>>>>>>>> of an
>>>>>>>>>> (external) db/catalog.
>>>>>>>>>> Imaging a situation where you create a temp function in a db in an
>>>>>>>>> external
>>>>>>>>>> catalog and use it but at some point it does not work anymore
>>>>>> because
>>>>>>>>> some
>>>>>>>>>> other dropped the database from the external catalog.
>>>>>>>>>> Shouldn't temporary objects be only controlled by the owner of a
>>>>>>>>> session?
>>>>>>>>>> I agree that creating temp objects in non-existing db/catalogs
>>>>>> sounds
>>>>>>> a
>>>>>>>>> bit
>>>>>>>>>> strange, but IMO the opposite (the db/catalog must exist for a
>>>>> temp
>>>>>>>>>> function to be created/exist) can have significant implications
>>>>> like
>>>>>>> the
>>>>>>>>>> one I described.
>>>>>>>>>> I think it would be quite easy for users to understand that
>>>>>> temporary
>>>>>>>>>> objects are solely owned by them (and their session).
>>>>>>>>>> The problem of listing temporary objects could be solved by
>>>>> adding a
>>>>>>> ALL
>>>>>>>>>> [TEMPORARY] clause:
>>>>>>>>>>
>>>>>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the
>>>>>>>>>> catalog/database including temporary functions.
>>>>>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
>>>>>>>>> regardless
>>>>>>>>>> of the catalog/database.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Fabian
>>>>>>>>>>
>>>>>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
>>>>>>>>> [hidden email]>:
>>>>>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can
>>>>> close
>>>>>>> the
>>>>>>>>>>> voting.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just
>>>>> want
>>>>>> to
>>>>>>>>> try
>>>>>>>>>>>> out temporary functions, they can create temporary system
>>>>>> functions
>>>>>>>>> which
>>>>>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow
>>>>>>>>> temporary
>>>>>>>>>>>> catalog function is to let users override permanent catalog
>>>>>>> functions.
>>>>>>>>>>>> Therefore a temporary function in a non-existing catalog won't
>>>>>> serve
>>>>>>>>> that
>>>>>>>>>>>> purpose. Besides, each session is provided with a default
>>>>> catalog
>>>>>>> and
>>>>>>>>> DB.
>>>>>>>>>>>> So even if users simply want to create some catalog functions
>>>>> they
>>>>>>> can
>>>>>>>>>>>> forget about after the session, wouldn't the default catalog/DB
>>>>> be
>>>>>>>>> enough
>>>>>>>>>>>> for such experiments?
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will
>>>>> first
>>>>>>>>> make
>>>>>>>>>>>> sure
>>>>>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is
>>>>> triggered
>>>>>> to
>>>>>>>>>>>> remove
>>>>>>>>>>>>> that temp function.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not
>>>>>> copying
>>>>>>>>>>> Hive.
>>>>>>>>>>>>> IMHO, allowing registering temp functions in nonexistent
>>>>>> catalog/db
>>>>>>>>> is
>>>>>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would
>>>>> list
>>>>>>>>> system
>>>>>>>>>>>>> functions and functions in the current catalog/db, since users
>>>>>>> cannot
>>>>>>>>>>>>> designate a nonexistent catalog/db as current ones, how can
>>>>> they
>>>>>>> list
>>>>>>>>>>>>> functions in nonexistent catalog/db? They may end up never
>>>>>> knowing
>>>>>>>>> what
>>>>>>>>>>>>> temp functions they've created unless trying out with queries
>>>>> or
>>>>>> we
>>>>>>>>>>>>> introducing some more nonstandard SQL statements. The same
>>>>>> applies
>>>>>>> to
>>>>>>>>>>>> other
>>>>>>>>>>>>> temp objects like temp tables.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
>>>>>>>>>>>>> [hidden email]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ad. 1
>>>>>>>>>>>>>> I wouldn't say it is hacky.
>>>>>>>>>>>>>> Moreover, how do you want ensure that the dB always exists
>>>>> when
>>>>>> a
>>>>>>>>>>>>> temporary
>>>>>>>>>>>>>> object is used?( in this particular case function). Do you
>>>>> want
>>>>>> to
>>>>>>>>>>>> query
>>>>>>>>>>>>>> for the database existence whenever e.g a temporary function
>>>>> is
>>>>>>>>>>> used? I
>>>>>>>>>>>>>> think important aspect here is that the database can be
>>>>> dropped
>>>>>>> from
>>>>>>>>>>>>>> external system, not just flink or a different flink session.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a
>>>>>>>>>>> database
>>>>>>>>>>>>> that
>>>>>>>>>>>>>> does not exist, that's true. But if you create a temporary
>>>>> table
>>>>>>> in
>>>>>>>>> a
>>>>>>>>>>>>>> database and drop that database from a different session, you
>>>>>> can
>>>>>>>>>>> still
>>>>>>>>>>>>>> query the previously created temporary table from the original
>>>>>>>>>>> session.
>>>>>>>>>>>>> It
>>>>>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we
>>>>>>> make
>>>>>>>>>>>> this
>>>>>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime
>>>>> of
>>>>>> a
>>>>>>>>>>>>> database
>>>>>>>>>>>>>> explicit as part of the temporary objects contract? In the end
>>>>>>> they
>>>>>>>>>>>> exist
>>>>>>>>>>>>>> in different layers. Permanent objects & databases in a
>>>>> catalog
>>>>>>> (in
>>>>>>>>>>>> case
>>>>>>>>>>>>> of
>>>>>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions.
>>>>>>> That's
>>>>>>>>>>>> also
>>>>>>>>>>>>>> true for the original hive client. The temporary objects live
>>>>> in
>>>>>>> the
>>>>>>>>>>>> hive
>>>>>>>>>>>>>> client whereas databases are created in the metastore.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ad.2
>>>>>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to
>>>>> achieve
>>>>>>>>> here
>>>>>>>>>>>> is
>>>>>>>>>>>>> so
>>>>>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One
>>>>>>>>> important
>>>>>>>>>>>>> thing
>>>>>>>>>>>>>> to remember here is that we need the function identifier to be
>>>>>>> part
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> FunctionDefinition object and not only as the result of the
>>>>>>> function
>>>>>>>>>>>>>> lookup. At some point we want to be able to store
>>>>>> QueryOperations
>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>>> catalogs. They can contain function calls within which we need
>>>>>> to
>>>>>>>>>>> have
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> identifier.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree my initial suggestion is over complicated. How about
>>>>> we
>>>>>>> have
>>>>>>>>>>>> just
>>>>>>>>>>>>>> the FunctionIdentifier as top level class without making the
>>>>>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the
>>>>>> same
>>>>>>>>>>> what
>>>>>>>>>>>>> you
>>>>>>>>>>>>>> suggested. The only difference is that it would be a top level
>>>>>>> class
>>>>>>>>>>>>> with a
>>>>>>>>>>>>>> more descriptive name.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]>
>>>>>> wrote:
>>>>>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete
>>>>>>>>>>> alternative
>>>>>>>>>>>> is
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> following, basically having separate APIs in FunctionLookup
>>>>> for
>>>>>>>>>>>>> ambiguous
>>>>>>>>>>>>>>> and precise function lookup since planner is able to tell
>>>>> which
>>>>>>> API
>>>>>>>>>>>> to
>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>> with parsed queries, and have a unified result:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>> class FunctionLookup {
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> class Result {
>>>>>>>>>>>>>>>     Optional<ObjectIdentifier>  getObjectIdentifier() { ...
>>>>> }
>>>>>>>>>>>>>>>     String getName() { ... }
>>>>>>>>>>>>>>>     // ...
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <
>>>>> [hidden email]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Hi Dawid,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments
>>>>> is
>>>>>>>>>>>>>> important.
>>>>>>>>>>>>>>>> However, I'm not sure allowing users to register temp
>>>>>> functions
>>>>>>>>>>> in
>>>>>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit
>>>>>> hacky,
>>>>>>>>>>>> and
>>>>>>>>>>>>>>> breaks
>>>>>>>>>>>>>>>> the current contract between Flink and users that catalog/db
>>>>>>> must
>>>>>>>>>>>> be
>>>>>>>>>>>>>>> valid
>>>>>>>>>>>>>>>> in order to operate on.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> How about we instead focus on making it convenient to create
>>>>>>>>>>>>> catalogs?
>>>>>>>>>>>>>>>> Users actually can already do it with ease via program or
>>>>> SQL
>>>>>>> CLI
>>>>>>>>>>>>> yaml
>>>>>>>>>>>>>>> file
>>>>>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency
>>>>>> nor
>>>>>>>>>>>>>> external
>>>>>>>>>>>>>>>> connections. What we can further improve is DDL for
>>>>> catalogs,
>>>>>>>>>>> and I
>>>>>>>>>>>>>>> raised
>>>>>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement]
>>>>>> driven
>>>>>>>>>>> by
>>>>>>>>>>>>>> Terry
>>>>>>>>>>>>>>>> now.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In that case, if users would like to experiment via SQL,
>>>>> they
>>>>>>> can
>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>> create an in memory catalog/database using DDL, then play
>>>>> with
>>>>>>>>>>> temp
>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier
>>>>>> has
>>>>>>>>>>> not
>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> resolved when stack call reaches
>>>>>>>>>>> FunctionCatalog#lookupFunction(),
>>>>>>>>>>>>> but
>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> been parsed?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
>>>>>>> with
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass
>>>>>> of
>>>>>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Another potentially simpler way is:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> // in class FunctionLookup
>>>>>>>>>>>>>>>> class Result {
>>>>>>>>>>>>>>>>     Optional<ObjectIdentifier>  getObjectIdentifier() {
>>>>> ... }
>>>>>>>>>>>>>>>>     String getName() { ... }
>>>>>>>>>>>>>>>>     ...
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
>>>>>>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>> I really like the flip and think it clarifies important
>>>>>> aspects
>>>>>>>>>>> of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> system.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take
>>>>>> much
>>>>>>>>>>>> time
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> agree on.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the
>>>>>>>>>>> existence
>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> cat/db
>>>>>>>>>>>>>>>>> for temporary functions? That means not to check it, so
>>>>> e.g.
>>>>>>>>>>> it's
>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>> to create a temporary function in a database that does not
>>>>>>>>>>> exist.
>>>>>>>>>>>> I
>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform
>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> does not have access to the db yet or temporarily does not
>>>>>> have
>>>>>>>>>>>>>>> connection
>>>>>>>>>>>>>>>>> to a catalog.
>>>>>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
>>>>>>> loosen
>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> requirements for all catalog objects such as tables, views,
>>>>>>>>>>> types
>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>> the functions? It's really important later on from e.g the
>>>>>>>>>>>>>>> serializability
>>>>>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier
>>>>> is
>>>>>>>>>>> that
>>>>>>>>>>>> we
>>>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>>> that it has been resolved. The suggested changes break that
>>>>>>>>>>>>>> assumption.
>>>>>>>>>>>>>>>>> What do you think about adding an interface
>>>>>> FunctionIdentifier
>>>>>>> {
>>>>>>>>>>>>>>>>> String getName();
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> /**
>>>>>>>>>>>>>>>>>   Return 3-part identifier. Empty in case of a built-in
>>>>>>>>>>> function.
>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
>>>>>>>>>>>>>>>>>  return Optional.of(this);
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> class SystemFunctionIdentifier implements
>>>>> FunctionIdentifier
>>>>>>>>>>> {...}
>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]>
>>>>>> wrote:
>>>>>>>>>>>>>>>>>> +1. LGTM
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
>>>>>>>>>>> [hidden email]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>> Terry Wang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1],
>>>>> which
>>>>>>>>>>>>> we've
>>>>>>>>>>>>>>>>> reached
>>>>>>>>>>>>>>>>>>>>> consensus in [2].
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
>>>>>>>>>>>> UTC,
>>>>>>>>>>>>>> Sep
>>>>>>>>>>>>>>>>> 26.
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Best regards!
>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>
>>>>>>>>>
>>>>
>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Timo Walther-2
Hi,

I agree with Aljoscha. It is not transparent to me which votes are
binding to the current status of the FLIP.

Some other minor comments from my side:

- We don't need to deprecate methods in FunctionCatalog. This class is
internal. We can simply change the method signatures.
- `String name` is missing in the FunctionIdentifier code example; can
we call FunctionIdentifier.getSimpleName() just
FunctionIdentifier.getName()?
- Add the methods that we discussed to the example:  `of(String)`,
`of(ObjectIdentifier)`

Other than that, I'm happy to give my +1 to this proposal.

Thanks for the productive discussion,
Timo


On 04.10.19 13:29, Aljoscha Krettek wrote:

> Hi,
>
> I see there was quite some discussion and changes on the FLIP after this VOTE was started. I would suggest to start a new voting thread on the current state of the FLIP (keeping in mind that a FLIP vote needs at least three committer/PMC votes).
>
> For the future, we should probably keep discussion to the [DISCUSS] thread and use the vote thread only for voting.
>
> Best,
> Aljoscha
>
>> On 3. Oct 2019, at 21:17, Bowen Li <[hidden email]> wrote:
>>
>> I'm glad to announce that the community has accepted the design of FLIP-57,
>> and we are moving forward to implementing it.
>>
>> Thanks everyone!
>>
>> On Wed, Oct 2, 2019 at 11:01 AM Bowen Li <[hidden email]> wrote:
>>
>>> Introducing a new term "path" to APIs like
>>> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing to
>>> users, thus I feel "getSimpleName/getIdentifier" is fine.
>>>
>>> To summarize the discussion result.
>>>
>>>    - categorize functions into 2 dimensions - system v.s. catalog,
>>>    non-temp v.s. temp - and that give us 4 combinations
>>>    - definition of FunctionIdentifier
>>>
>>>                      @PublicEvolving
>>>
>>> Class FunctionIdentifier {
>>>
>>>     String name;
>>>
>>>     ObjectIdentifier oi;
>>>
>>>         // for temporary/non-temporary system function
>>>     public FunctionIdentifier of(String name) {  }
>>>     // for temporary/non-temporary catalog function
>>>     public FunctionIdentifier of(ObjectIdentifier identifier) {  }
>>>
>>>
>>>     Optional<ObjectIdentifier> getIdentifier() {}
>>>
>>>     Optional<String> getSimpleName() {}
>>>
>>> }
>>>
>>>
>>> I've updated them to FLIP wiki. Please take a final look. I'll close the
>>> voting if there's no other concern raised within 24 hours.
>>>
>>> Cheers
>>>
>>> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <[hidden email]>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I very much agree with Xuefu's summary of the two points, especially on
>>>> the "functionIdentifier doesn't need to reflect the categories".
>>>>
>>>> For the factory methods I think methods of should be enough:
>>>>
>>>>   // for temporary/non-temporary system function
>>>>     public FunctionIdentifier of(String name) {  }
>>>>   // for temporary/non-temporary catalog function
>>>>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
>>>>
>>>> In case of the getters I did not like the method name `getName` in the
>>>> original proposal, as in my opinion it could imply that it can return
>>>> also just the name part of an ObjectIdentifier, which should not be the
>>>> case.
>>>>
>>>> I'm fine with getSimpleName/getIdentifier, but want to throw in few
>>>> other suggestions:
>>>>
>>>> * getShortPath(Identifier)/getLongPath(Identifier),
>>>>
>>>> * getSystemPath(Identifier)/getCatalogPath(Identifier)
>>>>
>>>> +1 to any of the 3 options.
>>>>
>>>> One additional thing the FunctionIdentifier should be a PublicEvolving
>>>> class, as it is part of a PublicEvolving APIs e.g. CallExpression, which
>>>> user might need to access e.g. in a filter pushdown.
>>>>
>>>> I also support the Xuefu's suggestion not to support the "ALL" keyword
>>>> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
>>>> of it  is not part of the FLIP-57, we do not need to agree on that in
>>>> this thread.
>>>>
>>>> Overall I think after updating the FLIP with the outcome of the
>>>> discussion I vote +1 for it.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>>>> On 02/10/2019 00:21, Xuefu Z wrote:
>>>>> Here are some of my thoughts on the minor debates above:
>>>>>
>>>>> 1. +1 for 4 categories of functions. They are categorized along two
>>>>> dimensions of binary values: X: *temporary* vs non-temporary
>>>> (persistent);
>>>>> Y: *system* vs non-system (so said catalog).
>>>>> 2. In my opinion, class functionIdentifier doesn't really need to
>>>> reflect
>>>>> the categories of the functions. Instead, we should decouple them to
>>>> make
>>>>> the API more stable. Thus, my suggestion is:
>>>>>
>>>>> @Internal
>>>>> class FunctionIdentifier {
>>>>>   // for temporary/non-temporary system function
>>>>>     public FunctionIdentifier ofSimpleName(String name) {  }
>>>>>   // for temporary/non-temporary catalog function
>>>>>     public FunctionIdentifier ofIdentifier(ObjectIdentifier
>>>>> identifier){  }
>>>>>     public Optional<String> getSimpleName() {  }
>>>>>     public Optional<ObjectIdentifier> getIdentifier() {  }
>>>>> }
>>>>> 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be:
>>>>>
>>>>> SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.
>>>>>
>>>>> When either keyword is missing, "ALL" is implied along that dimension.
>>>> We
>>>>> should always limit the search to the system function catalog and the
>>>>> current catalog/DB. I don't see a need of listing functions across
>>>>> different catalogs and databases. (It can be added later if that
>>>> arises.)
>>>>> Thanks,
>>>>> Xuefu
>>>>>
>>>>> On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote:
>>>>>
>>>>>> Hi Dawid,
>>>>>>
>>>>>> Thanks for bringing the suggestions up. I was prototyping yesterday and
>>>>>> found out those places exactly as what you suggested.
>>>>>>
>>>>>> For CallExpression and UnresolvedCallExpression, I've added them to
>>>>>> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and
>>>> mark
>>>>>> that as a breaking change
>>>>>>
>>>>>> For FunctionIdentifier, the suggested changes LGTM. Just want to bring
>>>> up
>>>>>> an issue on naming. It seems to me how we now name functions
>>>> categories is
>>>>>> a bit unclear and confusing, which is reflected on the suggested APIs
>>>> - in
>>>>>> FunctionIdentifier you lay out, "builtin function" would include
>>>> builtin
>>>>>> functions and temporary system functions as we are kind of using
>>>> "system"
>>>>>> and "built-in" interchangeably, and "catalog function" would include
>>>>>> catalog functions and temporary functions. I currently can see two
>>>>>> approaches to make it clearer to users.
>>>>>>
>>>>>> 1) Simplify FunctionIdentifier to be the following. As it's internal,
>>>> we
>>>>>> add comments and explanation to devs on which case the APIs support.
>>>>>> However, I feel this approach would be somehow a bit conflicting to
>>>> what
>>>>>> you want to achieve for the clarity of APIs
>>>>>>
>>>>>> @Internal
>>>>>> class FunctionIdentifier {
>>>>>>   // for built-in function and temporary system function
>>>>>>     public FunctionIdentifier of(String name) {  }
>>>>>>   // for temporary function and catalog function
>>>>>>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
>>>>>>     public Optional<String> getFunctionName() {  }
>>>>>>     public Optional<ObjectIdentifier> getObjectIdentifier() {  }
>>>>>> }
>>>>>>
>>>>>> 2) We can rename our function categories as following so there'll be
>>>> mainly
>>>>>> just two categories of functions, "system functions" and "catalog
>>>>>> functions", either of which can have temporary ones
>>>>>>
>>>>>>   - built-in functions -> officially rename to "system functions" and
>>>> note
>>>>>> to users that "system" and "built-in" can be used interchangeably. We
>>>>>> prefer "system" because that's the keyword we decided to use in DDL
>>>> that
>>>>>> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
>>>>>>   - temporary system functions
>>>>>>   - catalog functions
>>>>>>   - temporary functions  -> rename to "temporary catalog function"
>>>>>>
>>>>>> @Internal
>>>>>> class FunctionIdentifier {
>>>>>>   // for temporary/non-temporary system function
>>>>>>     public FunctionIdentifier ofSystemFunction(String name) {  }
>>>>>>   // for temporary/non-temporary catalog function
>>>>>>     public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
>>>>>> identifier){  }
>>>>>>     public Optional<String> getSystemFunctionName() {  }
>>>>>>     public Optional<ObjectIdentifier> getCatalogFunctionIdentifier()
>>>> {  }
>>>>>> }
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]>
>>>> wrote:
>>>>>>> Thanks for the summary Bowen!
>>>>>>>
>>>>>>> Looks good to me.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Fabian
>>>>>>>
>>>>>>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <
>>>>>> [hidden email]
>>>>>>>> :
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I've updated the FLIP wiki with the following changes:
>>>>>>>>
>>>>>>>> - Lifespan of temp functions are not tied to those of catalogs and
>>>>>>>> databases. Users can create temp functions even though catalogs/dbs
>>>> in
>>>>>>>> their fully qualified names don't even exist.
>>>>>>>> - some new SQL commands
>>>>>>>>     - "SHOW FUNCTIONS" - list names of temp and non-temp
>>>>>> system/built-in
>>>>>>>> functions, and names of temp and catalog functions in the current
>>>>>> catalog
>>>>>>>> and db
>>>>>>>>     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp
>>>>>> system/built
>>>>>>>> functions, and fully qualified names of temp and catalog functions in
>>>>>> all
>>>>>>>> catalogs and dbs
>>>>>>>>     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of
>>>>>> temp
>>>>>>>> functions in all catalog and db
>>>>>>>>     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp
>>>>>>> system
>>>>>>>> functions
>>>>>>>>
>>>>>>>> Let me know if you have any questions.
>>>>>>>>
>>>>>>>> Seems we have resolved all concerns. If there's no more ones, I'd
>>>> like
>>>>>> to
>>>>>>>> close the vote by this time tomorrow.
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Bowen
>>>>>>>>
>>>>>>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]>
>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I think above are some valid points, and we can adopt the
>>>>>> suggestions.
>>>>>>>>> To elaborate a bit on the new SQL syntax, it would imply that,
>>>> unlike
>>>>>>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL
>>>>>> [TEMPORARY]
>>>>>>>>> FUNCTIONS" would return functions' fully qualified names with
>>>> catalog
>>>>>>> and
>>>>>>>>> db names.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I support Fabian's arguments. In my opinion, temporary objects
>>>>>> should
>>>>>>>>>> just be an additional layer on top of the regular catalog/database
>>>>>>>>>> lookup logic. Thus, a temporary table or function has always
>>>> highest
>>>>>>>>>> precedence and should be stable within the local session. Otherwise
>>>>>> it
>>>>>>>>>> could magically disappear while someone else is performing
>>>>>>> modifications
>>>>>>>>>> in the catalog.
>>>>>>>>>>
>>>>>>>>>> Furthermore, this feature is very useful for prototyping as users
>>>>>> can
>>>>>>>>>> simply express that a catalog/database is present even through they
>>>>>>>>>> might not have access to it currently.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Timo
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 30.09.19 14:57, Fabian Hueske wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> Sorry for the late reply.
>>>>>>>>>>>
>>>>>>>>>>> I think it might lead to confusing situations if temporary
>>>>>> functions
>>>>>>>> (or
>>>>>>>>>>> any temporary db objects for that matter) are bound to the life
>>>>>>> cycle
>>>>>>>>>> of an
>>>>>>>>>>> (external) db/catalog.
>>>>>>>>>>> Imaging a situation where you create a temp function in a db in an
>>>>>>>>>> external
>>>>>>>>>>> catalog and use it but at some point it does not work anymore
>>>>>>> because
>>>>>>>>>> some
>>>>>>>>>>> other dropped the database from the external catalog.
>>>>>>>>>>> Shouldn't temporary objects be only controlled by the owner of a
>>>>>>>>>> session?
>>>>>>>>>>> I agree that creating temp objects in non-existing db/catalogs
>>>>>>> sounds
>>>>>>>> a
>>>>>>>>>> bit
>>>>>>>>>>> strange, but IMO the opposite (the db/catalog must exist for a
>>>>>> temp
>>>>>>>>>>> function to be created/exist) can have significant implications
>>>>>> like
>>>>>>>> the
>>>>>>>>>>> one I described.
>>>>>>>>>>> I think it would be quite easy for users to understand that
>>>>>>> temporary
>>>>>>>>>>> objects are solely owned by them (and their session).
>>>>>>>>>>> The problem of listing temporary objects could be solved by
>>>>>> adding a
>>>>>>>> ALL
>>>>>>>>>>> [TEMPORARY] clause:
>>>>>>>>>>>
>>>>>>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the
>>>>>>>>>>> catalog/database including temporary functions.
>>>>>>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
>>>>>>>>>> regardless
>>>>>>>>>>> of the catalog/database.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Fabian
>>>>>>>>>>>
>>>>>>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
>>>>>>>>>> [hidden email]>:
>>>>>>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can
>>>>>> close
>>>>>>>> the
>>>>>>>>>>>> voting.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just
>>>>>> want
>>>>>>> to
>>>>>>>>>> try
>>>>>>>>>>>>> out temporary functions, they can create temporary system
>>>>>>> functions
>>>>>>>>>> which
>>>>>>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow
>>>>>>>>>> temporary
>>>>>>>>>>>>> catalog function is to let users override permanent catalog
>>>>>>>> functions.
>>>>>>>>>>>>> Therefore a temporary function in a non-existing catalog won't
>>>>>>> serve
>>>>>>>>>> that
>>>>>>>>>>>>> purpose. Besides, each session is provided with a default
>>>>>> catalog
>>>>>>>> and
>>>>>>>>>> DB.
>>>>>>>>>>>>> So even if users simply want to create some catalog functions
>>>>>> they
>>>>>>>> can
>>>>>>>>>>>>> forget about after the session, wouldn't the default catalog/DB
>>>>>> be
>>>>>>>>>> enough
>>>>>>>>>>>>> for such experiments?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will
>>>>>> first
>>>>>>>>>> make
>>>>>>>>>>>>> sure
>>>>>>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is
>>>>>> triggered
>>>>>>> to
>>>>>>>>>>>>> remove
>>>>>>>>>>>>>> that temp function.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not
>>>>>>> copying
>>>>>>>>>>>> Hive.
>>>>>>>>>>>>>> IMHO, allowing registering temp functions in nonexistent
>>>>>>> catalog/db
>>>>>>>>>> is
>>>>>>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would
>>>>>> list
>>>>>>>>>> system
>>>>>>>>>>>>>> functions and functions in the current catalog/db, since users
>>>>>>>> cannot
>>>>>>>>>>>>>> designate a nonexistent catalog/db as current ones, how can
>>>>>> they
>>>>>>>> list
>>>>>>>>>>>>>> functions in nonexistent catalog/db? They may end up never
>>>>>>> knowing
>>>>>>>>>> what
>>>>>>>>>>>>>> temp functions they've created unless trying out with queries
>>>>>> or
>>>>>>> we
>>>>>>>>>>>>>> introducing some more nonstandard SQL statements. The same
>>>>>>> applies
>>>>>>>> to
>>>>>>>>>>>>> other
>>>>>>>>>>>>>> temp objects like temp tables.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
>>>>>>>>>>>>>> [hidden email]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ad. 1
>>>>>>>>>>>>>>> I wouldn't say it is hacky.
>>>>>>>>>>>>>>> Moreover, how do you want ensure that the dB always exists
>>>>>> when
>>>>>>> a
>>>>>>>>>>>>>> temporary
>>>>>>>>>>>>>>> object is used?( in this particular case function). Do you
>>>>>> want
>>>>>>> to
>>>>>>>>>>>>> query
>>>>>>>>>>>>>>> for the database existence whenever e.g a temporary function
>>>>>> is
>>>>>>>>>>>> used? I
>>>>>>>>>>>>>>> think important aspect here is that the database can be
>>>>>> dropped
>>>>>>>> from
>>>>>>>>>>>>>>> external system, not just flink or a different flink session.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a
>>>>>>>>>>>> database
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> does not exist, that's true. But if you create a temporary
>>>>>> table
>>>>>>>> in
>>>>>>>>>> a
>>>>>>>>>>>>>>> database and drop that database from a different session, you
>>>>>>> can
>>>>>>>>>>>> still
>>>>>>>>>>>>>>> query the previously created temporary table from the original
>>>>>>>>>>>> session.
>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we
>>>>>>>> make
>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime
>>>>>> of
>>>>>>> a
>>>>>>>>>>>>>> database
>>>>>>>>>>>>>>> explicit as part of the temporary objects contract? In the end
>>>>>>>> they
>>>>>>>>>>>>> exist
>>>>>>>>>>>>>>> in different layers. Permanent objects & databases in a
>>>>>> catalog
>>>>>>>> (in
>>>>>>>>>>>>> case
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions.
>>>>>>>> That's
>>>>>>>>>>>>> also
>>>>>>>>>>>>>>> true for the original hive client. The temporary objects live
>>>>>> in
>>>>>>>> the
>>>>>>>>>>>>> hive
>>>>>>>>>>>>>>> client whereas databases are created in the metastore.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ad.2
>>>>>>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to
>>>>>> achieve
>>>>>>>>>> here
>>>>>>>>>>>>> is
>>>>>>>>>>>>>> so
>>>>>>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One
>>>>>>>>>> important
>>>>>>>>>>>>>> thing
>>>>>>>>>>>>>>> to remember here is that we need the function identifier to be
>>>>>>>> part
>>>>>>>>>>>> of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> FunctionDefinition object and not only as the result of the
>>>>>>>> function
>>>>>>>>>>>>>>> lookup. At some point we want to be able to store
>>>>>>> QueryOperations
>>>>>>>> in
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> catalogs. They can contain function calls within which we need
>>>>>>> to
>>>>>>>>>>>> have
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> identifier.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I agree my initial suggestion is over complicated. How about
>>>>>> we
>>>>>>>> have
>>>>>>>>>>>>> just
>>>>>>>>>>>>>>> the FunctionIdentifier as top level class without making the
>>>>>>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the
>>>>>>> same
>>>>>>>>>>>> what
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> suggested. The only difference is that it would be a top level
>>>>>>>> class
>>>>>>>>>>>>>> with a
>>>>>>>>>>>>>>> more descriptive name.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete
>>>>>>>>>>>> alternative
>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> following, basically having separate APIs in FunctionLookup
>>>>>> for
>>>>>>>>>>>>>> ambiguous
>>>>>>>>>>>>>>>> and precise function lookup since planner is able to tell
>>>>>> which
>>>>>>>> API
>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> call
>>>>>>>>>>>>>>>> with parsed queries, and have a unified result:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>> class FunctionLookup {
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> class Result {
>>>>>>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ...
>>>>>> }
>>>>>>>>>>>>>>>>      String getName() { ... }
>>>>>>>>>>>>>>>>      // ...
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <
>>>>>> [hidden email]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Hi Dawid,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments
>>>>>> is
>>>>>>>>>>>>>>> important.
>>>>>>>>>>>>>>>>> However, I'm not sure allowing users to register temp
>>>>>>> functions
>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit
>>>>>>> hacky,
>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> breaks
>>>>>>>>>>>>>>>>> the current contract between Flink and users that catalog/db
>>>>>>>> must
>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> valid
>>>>>>>>>>>>>>>>> in order to operate on.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> How about we instead focus on making it convenient to create
>>>>>>>>>>>>>> catalogs?
>>>>>>>>>>>>>>>>> Users actually can already do it with ease via program or
>>>>>> SQL
>>>>>>>> CLI
>>>>>>>>>>>>>> yaml
>>>>>>>>>>>>>>>> file
>>>>>>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency
>>>>>>> nor
>>>>>>>>>>>>>>> external
>>>>>>>>>>>>>>>>> connections. What we can further improve is DDL for
>>>>>> catalogs,
>>>>>>>>>>>> and I
>>>>>>>>>>>>>>>> raised
>>>>>>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement]
>>>>>>> driven
>>>>>>>>>>>> by
>>>>>>>>>>>>>>> Terry
>>>>>>>>>>>>>>>>> now.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In that case, if users would like to experiment via SQL,
>>>>>> they
>>>>>>>> can
>>>>>>>>>>>>>>> easily
>>>>>>>>>>>>>>>>> create an in memory catalog/database using DDL, then play
>>>>>> with
>>>>>>>>>>>> temp
>>>>>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier
>>>>>>> has
>>>>>>>>>>>> not
>>>>>>>>>>>>>>> been
>>>>>>>>>>>>>>>>> resolved when stack call reaches
>>>>>>>>>>>> FunctionCatalog#lookupFunction(),
>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>> been parsed?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
>>>>>>>> with
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass
>>>>>>> of
>>>>>>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Another potentially simpler way is:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>> // in class FunctionLookup
>>>>>>>>>>>>>>>>> class Result {
>>>>>>>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() {
>>>>>> ... }
>>>>>>>>>>>>>>>>>      String getName() { ... }
>>>>>>>>>>>>>>>>>      ...
>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>> ```
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
>>>>>>>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>> I really like the flip and think it clarifies important
>>>>>>> aspects
>>>>>>>>>>>> of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> system.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take
>>>>>>> much
>>>>>>>>>>>>> time
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> agree on.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the
>>>>>>>>>>>> existence
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>> cat/db
>>>>>>>>>>>>>>>>>> for temporary functions? That means not to check it, so
>>>>>> e.g.
>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>> possible
>>>>>>>>>>>>>>>>>> to create a temporary function in a database that does not
>>>>>>>>>>>> exist.
>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform
>>>>>>>>>>>>>> experiments
>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>> does not have access to the db yet or temporarily does not
>>>>>>> have
>>>>>>>>>>>>>>>> connection
>>>>>>>>>>>>>>>>>> to a catalog.
>>>>>>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
>>>>>>>> loosen
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> requirements for all catalog objects such as tables, views,
>>>>>>>>>>>> types
>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> the functions? It's really important later on from e.g the
>>>>>>>>>>>>>>>> serializability
>>>>>>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier
>>>>>> is
>>>>>>>>>>>> that
>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> know
>>>>>>>>>>>>>>>>>> that it has been resolved. The suggested changes break that
>>>>>>>>>>>>>>> assumption.
>>>>>>>>>>>>>>>>>> What do you think about adding an interface
>>>>>>> FunctionIdentifier
>>>>>>>> {
>>>>>>>>>>>>>>>>>> String getName();
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> /**
>>>>>>>>>>>>>>>>>>    Return 3-part identifier. Empty in case of a built-in
>>>>>>>>>>>> function.
>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
>>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
>>>>>>>>>>>>>>>>>>   return Optional.of(this);
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> class SystemFunctionIdentifier implements
>>>>>> FunctionIdentifier
>>>>>>>>>>>> {...}
>>>>>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>> +1. LGTM
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
>>>>>>>>>>>> [hidden email]>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>> Terry Wang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
>>>>>>>>>>>>> [hidden email]
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1],
>>>>>> which
>>>>>>>>>>>>>> we've
>>>>>>>>>>>>>>>>>> reached
>>>>>>>>>>>>>>>>>>>>>> consensus in [2].
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
>>>>>>>>>>>>> UTC,
>>>>>>>>>>>>>>> Sep
>>>>>>>>>>>>>>>>>> 26.
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>>>>>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Best regards!
>>>>>>>>>>>>> Rui Li
>>>>>>>>>>>>>
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Hi Aljoscha, Timo

Thanks for the reminder. I've update the details in FLIP wiki, and will
kick off a voting thread.

On Fri, Oct 4, 2019 at 1:51 PM Timo Walther <[hidden email]> wrote:

> Hi,
>
> I agree with Aljoscha. It is not transparent to me which votes are
> binding to the current status of the FLIP.
>
> Some other minor comments from my side:
>
> - We don't need to deprecate methods in FunctionCatalog. This class is
> internal. We can simply change the method signatures.
> - `String name` is missing in the FunctionIdentifier code example; can
> we call FunctionIdentifier.getSimpleName() just
> FunctionIdentifier.getName()?
> - Add the methods that we discussed to the example:  `of(String)`,
> `of(ObjectIdentifier)`
>
> Other than that, I'm happy to give my +1 to this proposal.
>
> Thanks for the productive discussion,
> Timo
>
>
> On 04.10.19 13:29, Aljoscha Krettek wrote:
> > Hi,
> >
> > I see there was quite some discussion and changes on the FLIP after this
> VOTE was started. I would suggest to start a new voting thread on the
> current state of the FLIP (keeping in mind that a FLIP vote needs at least
> three committer/PMC votes).
> >
> > For the future, we should probably keep discussion to the [DISCUSS]
> thread and use the vote thread only for voting.
> >
> > Best,
> > Aljoscha
> >
> >> On 3. Oct 2019, at 21:17, Bowen Li <[hidden email]> wrote:
> >>
> >> I'm glad to announce that the community has accepted the design of
> FLIP-57,
> >> and we are moving forward to implementing it.
> >>
> >> Thanks everyone!
> >>
> >> On Wed, Oct 2, 2019 at 11:01 AM Bowen Li <[hidden email]> wrote:
> >>
> >>> Introducing a new term "path" to APIs like
> >>> "getShortPath(Identifier)/getLongPath(Identifier)" would be confusing
> to
> >>> users, thus I feel "getSimpleName/getIdentifier" is fine.
> >>>
> >>> To summarize the discussion result.
> >>>
> >>>    - categorize functions into 2 dimensions - system v.s. catalog,
> >>>    non-temp v.s. temp - and that give us 4 combinations
> >>>    - definition of FunctionIdentifier
> >>>
> >>>                      @PublicEvolving
> >>>
> >>> Class FunctionIdentifier {
> >>>
> >>>     String name;
> >>>
> >>>     ObjectIdentifier oi;
> >>>
> >>>         // for temporary/non-temporary system function
> >>>     public FunctionIdentifier of(String name) {  }
> >>>     // for temporary/non-temporary catalog function
> >>>     public FunctionIdentifier of(ObjectIdentifier identifier) {  }
> >>>
> >>>
> >>>     Optional<ObjectIdentifier> getIdentifier() {}
> >>>
> >>>     Optional<String> getSimpleName() {}
> >>>
> >>> }
> >>>
> >>>
> >>> I've updated them to FLIP wiki. Please take a final look. I'll close
> the
> >>> voting if there's no other concern raised within 24 hours.
> >>>
> >>> Cheers
> >>>
> >>> On Wed, Oct 2, 2019 at 4:54 AM Dawid Wysakowicz <
> [hidden email]>
> >>> wrote:
> >>>
> >>>> Hi,
> >>>>
> >>>> I very much agree with Xuefu's summary of the two points, especially
> on
> >>>> the "functionIdentifier doesn't need to reflect the categories".
> >>>>
> >>>> For the factory methods I think methods of should be enough:
> >>>>
> >>>>   // for temporary/non-temporary system function
> >>>>     public FunctionIdentifier of(String name) {  }
> >>>>   // for temporary/non-temporary catalog function
> >>>>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
> >>>>
> >>>> In case of the getters I did not like the method name `getName` in the
> >>>> original proposal, as in my opinion it could imply that it can return
> >>>> also just the name part of an ObjectIdentifier, which should not be
> the
> >>>> case.
> >>>>
> >>>> I'm fine with getSimpleName/getIdentifier, but want to throw in few
> >>>> other suggestions:
> >>>>
> >>>> * getShortPath(Identifier)/getLongPath(Identifier),
> >>>>
> >>>> * getSystemPath(Identifier)/getCatalogPath(Identifier)
> >>>>
> >>>> +1 to any of the 3 options.
> >>>>
> >>>> One additional thing the FunctionIdentifier should be a PublicEvolving
> >>>> class, as it is part of a PublicEvolving APIs e.g. CallExpression,
> which
> >>>> user might need to access e.g. in a filter pushdown.
> >>>>
> >>>> I also support the Xuefu's suggestion not to support the "ALL" keyword
> >>>> in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
> >>>> of it  is not part of the FLIP-57, we do not need to agree on that in
> >>>> this thread.
> >>>>
> >>>> Overall I think after updating the FLIP with the outcome of the
> >>>> discussion I vote +1 for it.
> >>>>
> >>>> Best,
> >>>>
> >>>> Dawid
> >>>>
> >>>>
> >>>> On 02/10/2019 00:21, Xuefu Z wrote:
> >>>>> Here are some of my thoughts on the minor debates above:
> >>>>>
> >>>>> 1. +1 for 4 categories of functions. They are categorized along two
> >>>>> dimensions of binary values: X: *temporary* vs non-temporary
> >>>> (persistent);
> >>>>> Y: *system* vs non-system (so said catalog).
> >>>>> 2. In my opinion, class functionIdentifier doesn't really need to
> >>>> reflect
> >>>>> the categories of the functions. Instead, we should decouple them to
> >>>> make
> >>>>> the API more stable. Thus, my suggestion is:
> >>>>>
> >>>>> @Internal
> >>>>> class FunctionIdentifier {
> >>>>>   // for temporary/non-temporary system function
> >>>>>     public FunctionIdentifier ofSimpleName(String name) {  }
> >>>>>   // for temporary/non-temporary catalog function
> >>>>>     public FunctionIdentifier ofIdentifier(ObjectIdentifier
> >>>>> identifier){  }
> >>>>>     public Optional<String> getSimpleName() {  }
> >>>>>     public Optional<ObjectIdentifier> getIdentifier() {  }
> >>>>> }
> >>>>> 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just
> be:
> >>>>>
> >>>>> SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.
> >>>>>
> >>>>> When either keyword is missing, "ALL" is implied along that
> dimension.
> >>>> We
> >>>>> should always limit the search to the system function catalog and the
> >>>>> current catalog/DB. I don't see a need of listing functions across
> >>>>> different catalogs and databases. (It can be added later if that
> >>>> arises.)
> >>>>> Thanks,
> >>>>> Xuefu
> >>>>>
> >>>>> On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]>
> wrote:
> >>>>>
> >>>>>> Hi Dawid,
> >>>>>>
> >>>>>> Thanks for bringing the suggestions up. I was prototyping yesterday
> and
> >>>>>> found out those places exactly as what you suggested.
> >>>>>>
> >>>>>> For CallExpression and UnresolvedCallExpression, I've added them to
> >>>>>> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier
> and
> >>>> mark
> >>>>>> that as a breaking change
> >>>>>>
> >>>>>> For FunctionIdentifier, the suggested changes LGTM. Just want to
> bring
> >>>> up
> >>>>>> an issue on naming. It seems to me how we now name functions
> >>>> categories is
> >>>>>> a bit unclear and confusing, which is reflected on the suggested
> APIs
> >>>> - in
> >>>>>> FunctionIdentifier you lay out, "builtin function" would include
> >>>> builtin
> >>>>>> functions and temporary system functions as we are kind of using
> >>>> "system"
> >>>>>> and "built-in" interchangeably, and "catalog function" would include
> >>>>>> catalog functions and temporary functions. I currently can see two
> >>>>>> approaches to make it clearer to users.
> >>>>>>
> >>>>>> 1) Simplify FunctionIdentifier to be the following. As it's
> internal,
> >>>> we
> >>>>>> add comments and explanation to devs on which case the APIs support.
> >>>>>> However, I feel this approach would be somehow a bit conflicting to
> >>>> what
> >>>>>> you want to achieve for the clarity of APIs
> >>>>>>
> >>>>>> @Internal
> >>>>>> class FunctionIdentifier {
> >>>>>>   // for built-in function and temporary system function
> >>>>>>     public FunctionIdentifier of(String name) {  }
> >>>>>>   // for temporary function and catalog function
> >>>>>>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
> >>>>>>     public Optional<String> getFunctionName() {  }
> >>>>>>     public Optional<ObjectIdentifier> getObjectIdentifier() {  }
> >>>>>> }
> >>>>>>
> >>>>>> 2) We can rename our function categories as following so there'll be
> >>>> mainly
> >>>>>> just two categories of functions, "system functions" and "catalog
> >>>>>> functions", either of which can have temporary ones
> >>>>>>
> >>>>>>   - built-in functions -> officially rename to "system functions"
> and
> >>>> note
> >>>>>> to users that "system" and "built-in" can be used interchangeably.
> We
> >>>>>> prefer "system" because that's the keyword we decided to use in DDL
> >>>> that
> >>>>>> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
> >>>>>>   - temporary system functions
> >>>>>>   - catalog functions
> >>>>>>   - temporary functions  -> rename to "temporary catalog function"
> >>>>>>
> >>>>>> @Internal
> >>>>>> class FunctionIdentifier {
> >>>>>>   // for temporary/non-temporary system function
> >>>>>>     public FunctionIdentifier ofSystemFunction(String name) {  }
> >>>>>>   // for temporary/non-temporary catalog function
> >>>>>>     public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
> >>>>>> identifier){  }
> >>>>>>     public Optional<String> getSystemFunctionName() {  }
> >>>>>>     public Optional<ObjectIdentifier> getCatalogFunctionIdentifier()
> >>>> {  }
> >>>>>> }
> >>>>>>
> >>>>>> WDYT?
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]>
> >>>> wrote:
> >>>>>>> Thanks for the summary Bowen!
> >>>>>>>
> >>>>>>> Looks good to me.
> >>>>>>>
> >>>>>>> Cheers,
> >>>>>>> Fabian
> >>>>>>>
> >>>>>>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <
> >>>>>> [hidden email]
> >>>>>>>> :
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I've updated the FLIP wiki with the following changes:
> >>>>>>>>
> >>>>>>>> - Lifespan of temp functions are not tied to those of catalogs and
> >>>>>>>> databases. Users can create temp functions even though
> catalogs/dbs
> >>>> in
> >>>>>>>> their fully qualified names don't even exist.
> >>>>>>>> - some new SQL commands
> >>>>>>>>     - "SHOW FUNCTIONS" - list names of temp and non-temp
> >>>>>> system/built-in
> >>>>>>>> functions, and names of temp and catalog functions in the current
> >>>>>> catalog
> >>>>>>>> and db
> >>>>>>>>     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp
> >>>>>> system/built
> >>>>>>>> functions, and fully qualified names of temp and catalog
> functions in
> >>>>>> all
> >>>>>>>> catalogs and dbs
> >>>>>>>>     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names
> of
> >>>>>> temp
> >>>>>>>> functions in all catalog and db
> >>>>>>>>     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all
> temp
> >>>>>>> system
> >>>>>>>> functions
> >>>>>>>>
> >>>>>>>> Let me know if you have any questions.
> >>>>>>>>
> >>>>>>>> Seems we have resolved all concerns. If there's no more ones, I'd
> >>>> like
> >>>>>> to
> >>>>>>>> close the vote by this time tomorrow.
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Bowen
> >>>>>>>>
> >>>>>>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]>
> >>>> wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I think above are some valid points, and we can adopt the
> >>>>>> suggestions.
> >>>>>>>>> To elaborate a bit on the new SQL syntax, it would imply that,
> >>>> unlike
> >>>>>>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL
> >>>>>> [TEMPORARY]
> >>>>>>>>> FUNCTIONS" would return functions' fully qualified names with
> >>>> catalog
> >>>>>>> and
> >>>>>>>>> db names.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]
> >
> >>>>>>> wrote:
> >>>>>>>>>> Hi all,
> >>>>>>>>>>
> >>>>>>>>>> I support Fabian's arguments. In my opinion, temporary objects
> >>>>>> should
> >>>>>>>>>> just be an additional layer on top of the regular
> catalog/database
> >>>>>>>>>> lookup logic. Thus, a temporary table or function has always
> >>>> highest
> >>>>>>>>>> precedence and should be stable within the local session.
> Otherwise
> >>>>>> it
> >>>>>>>>>> could magically disappear while someone else is performing
> >>>>>>> modifications
> >>>>>>>>>> in the catalog.
> >>>>>>>>>>
> >>>>>>>>>> Furthermore, this feature is very useful for prototyping as
> users
> >>>>>> can
> >>>>>>>>>> simply express that a catalog/database is present even through
> they
> >>>>>>>>>> might not have access to it currently.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Timo
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On 30.09.19 14:57, Fabian Hueske wrote:
> >>>>>>>>>>> Hi all,
> >>>>>>>>>>>
> >>>>>>>>>>> Sorry for the late reply.
> >>>>>>>>>>>
> >>>>>>>>>>> I think it might lead to confusing situations if temporary
> >>>>>> functions
> >>>>>>>> (or
> >>>>>>>>>>> any temporary db objects for that matter) are bound to the life
> >>>>>>> cycle
> >>>>>>>>>> of an
> >>>>>>>>>>> (external) db/catalog.
> >>>>>>>>>>> Imaging a situation where you create a temp function in a db
> in an
> >>>>>>>>>> external
> >>>>>>>>>>> catalog and use it but at some point it does not work anymore
> >>>>>>> because
> >>>>>>>>>> some
> >>>>>>>>>>> other dropped the database from the external catalog.
> >>>>>>>>>>> Shouldn't temporary objects be only controlled by the owner of
> a
> >>>>>>>>>> session?
> >>>>>>>>>>> I agree that creating temp objects in non-existing db/catalogs
> >>>>>>> sounds
> >>>>>>>> a
> >>>>>>>>>> bit
> >>>>>>>>>>> strange, but IMO the opposite (the db/catalog must exist for a
> >>>>>> temp
> >>>>>>>>>>> function to be created/exist) can have significant implications
> >>>>>> like
> >>>>>>>> the
> >>>>>>>>>>> one I described.
> >>>>>>>>>>> I think it would be quite easy for users to understand that
> >>>>>>> temporary
> >>>>>>>>>>> objects are solely owned by them (and their session).
> >>>>>>>>>>> The problem of listing temporary objects could be solved by
> >>>>>> adding a
> >>>>>>>> ALL
> >>>>>>>>>>> [TEMPORARY] clause:
> >>>>>>>>>>>
> >>>>>>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the
> >>>>>>>>>>> catalog/database including temporary functions.
> >>>>>>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary
> functions
> >>>>>>>>>> regardless
> >>>>>>>>>>> of the catalog/database.
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Fabian
> >>>>>>>>>>>
> >>>>>>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
> >>>>>>>>>> [hidden email]>:
> >>>>>>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can
> >>>>>> close
> >>>>>>>> the
> >>>>>>>>>>>> voting.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]
> >
> >>>>>>>> wrote:
> >>>>>>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just
> >>>>>> want
> >>>>>>> to
> >>>>>>>>>> try
> >>>>>>>>>>>>> out temporary functions, they can create temporary system
> >>>>>>> functions
> >>>>>>>>>> which
> >>>>>>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we
> allow
> >>>>>>>>>> temporary
> >>>>>>>>>>>>> catalog function is to let users override permanent catalog
> >>>>>>>> functions.
> >>>>>>>>>>>>> Therefore a temporary function in a non-existing catalog
> won't
> >>>>>>> serve
> >>>>>>>>>> that
> >>>>>>>>>>>>> purpose. Besides, each session is provided with a default
> >>>>>> catalog
> >>>>>>>> and
> >>>>>>>>>> DB.
> >>>>>>>>>>>>> So even if users simply want to create some catalog functions
> >>>>>> they
> >>>>>>>> can
> >>>>>>>>>>>>> forget about after the session, wouldn't the default
> catalog/DB
> >>>>>> be
> >>>>>>>>>> enough
> >>>>>>>>>>>>> for such experiments?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <
> [hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will
> >>>>>> first
> >>>>>>>>>> make
> >>>>>>>>>>>>> sure
> >>>>>>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is
> >>>>>> triggered
> >>>>>>> to
> >>>>>>>>>>>>> remove
> >>>>>>>>>>>>>> that temp function.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not
> >>>>>>> copying
> >>>>>>>>>>>> Hive.
> >>>>>>>>>>>>>> IMHO, allowing registering temp functions in nonexistent
> >>>>>>> catalog/db
> >>>>>>>>>> is
> >>>>>>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would
> >>>>>> list
> >>>>>>>>>> system
> >>>>>>>>>>>>>> functions and functions in the current catalog/db, since
> users
> >>>>>>>> cannot
> >>>>>>>>>>>>>> designate a nonexistent catalog/db as current ones, how can
> >>>>>> they
> >>>>>>>> list
> >>>>>>>>>>>>>> functions in nonexistent catalog/db? They may end up never
> >>>>>>> knowing
> >>>>>>>>>> what
> >>>>>>>>>>>>>> temp functions they've created unless trying out with
> queries
> >>>>>> or
> >>>>>>> we
> >>>>>>>>>>>>>> introducing some more nonstandard SQL statements. The same
> >>>>>>> applies
> >>>>>>>> to
> >>>>>>>>>>>>> other
> >>>>>>>>>>>>>> temp objects like temp tables.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> >>>>>>>>>>>>>> [hidden email]>
> >>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ad. 1
> >>>>>>>>>>>>>>> I wouldn't say it is hacky.
> >>>>>>>>>>>>>>> Moreover, how do you want ensure that the dB always exists
> >>>>>> when
> >>>>>>> a
> >>>>>>>>>>>>>> temporary
> >>>>>>>>>>>>>>> object is used?( in this particular case function). Do you
> >>>>>> want
> >>>>>>> to
> >>>>>>>>>>>>> query
> >>>>>>>>>>>>>>> for the database existence whenever e.g a temporary
> function
> >>>>>> is
> >>>>>>>>>>>> used? I
> >>>>>>>>>>>>>>> think important aspect here is that the database can be
> >>>>>> dropped
> >>>>>>>> from
> >>>>>>>>>>>>>>> external system, not just flink or a different flink
> session.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> E.g in case of hive, you cannot create a temporary table
> in a
> >>>>>>>>>>>> database
> >>>>>>>>>>>>>> that
> >>>>>>>>>>>>>>> does not exist, that's true. But if you create a temporary
> >>>>>> table
> >>>>>>>> in
> >>>>>>>>>> a
> >>>>>>>>>>>>>>> database and drop that database from a different session,
> you
> >>>>>>> can
> >>>>>>>>>>>> still
> >>>>>>>>>>>>>>> query the previously created temporary table from the
> original
> >>>>>>>>>>>> session.
> >>>>>>>>>>>>>> It
> >>>>>>>>>>>>>>> does not sound like a consistent behaviour to me. Why
> don't we
> >>>>>>>> make
> >>>>>>>>>>>>> this
> >>>>>>>>>>>>>>> behaviour of not binding a temporary objects to the
> lifetime
> >>>>>> of
> >>>>>>> a
> >>>>>>>>>>>>>> database
> >>>>>>>>>>>>>>> explicit as part of the temporary objects contract? In the
> end
> >>>>>>>> they
> >>>>>>>>>>>>> exist
> >>>>>>>>>>>>>>> in different layers. Permanent objects & databases in a
> >>>>>> catalog
> >>>>>>>> (in
> >>>>>>>>>>>>> case
> >>>>>>>>>>>>>> of
> >>>>>>>>>>>>>>> hive megastore) whereas temporary objects in flink
> sessions.
> >>>>>>>> That's
> >>>>>>>>>>>>> also
> >>>>>>>>>>>>>>> true for the original hive client. The temporary objects
> live
> >>>>>> in
> >>>>>>>> the
> >>>>>>>>>>>>> hive
> >>>>>>>>>>>>>>> client whereas databases are created in the metastore.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Ad.2
> >>>>>>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to
> >>>>>> achieve
> >>>>>>>>>> here
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>> so
> >>>>>>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One
> >>>>>>>>>> important
> >>>>>>>>>>>>>> thing
> >>>>>>>>>>>>>>> to remember here is that we need the function identifier
> to be
> >>>>>>>> part
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> FunctionDefinition object and not only as the result of the
> >>>>>>>> function
> >>>>>>>>>>>>>>> lookup. At some point we want to be able to store
> >>>>>>> QueryOperations
> >>>>>>>> in
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> catalogs. They can contain function calls within which we
> need
> >>>>>>> to
> >>>>>>>>>>>> have
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>> identifier.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I agree my initial suggestion is over complicated. How
> about
> >>>>>> we
> >>>>>>>> have
> >>>>>>>>>>>>> just
> >>>>>>>>>>>>>>> the FunctionIdentifier as top level class without making
> the
> >>>>>>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much
> the
> >>>>>>> same
> >>>>>>>>>>>> what
> >>>>>>>>>>>>>> you
> >>>>>>>>>>>>>>> suggested. The only difference is that it would be a top
> level
> >>>>>>>> class
> >>>>>>>>>>>>>> with a
> >>>>>>>>>>>>>>> more descriptive name.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]>
> >>>>>>> wrote:
> >>>>>>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete
> >>>>>>>>>>>> alternative
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>> following, basically having separate APIs in
> FunctionLookup
> >>>>>> for
> >>>>>>>>>>>>>> ambiguous
> >>>>>>>>>>>>>>>> and precise function lookup since planner is able to tell
> >>>>>> which
> >>>>>>>> API
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>>>> call
> >>>>>>>>>>>>>>>> with parsed queries, and have a unified result:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>>> class FunctionLookup {
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name);
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier
> oi);
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> class Result {
> >>>>>>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() {
> ...
> >>>>>> }
> >>>>>>>>>>>>>>>>      String getName() { ... }
> >>>>>>>>>>>>>>>>      // ...
> >>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>> Bowen
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <
> >>>>>> [hidden email]>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>> Hi Dawid,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Re 1): I agree making it easy for users to run
> experiments
> >>>>>> is
> >>>>>>>>>>>>>>> important.
> >>>>>>>>>>>>>>>>> However, I'm not sure allowing users to register temp
> >>>>>>> functions
> >>>>>>>>>>>> in
> >>>>>>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit
> >>>>>>> hacky,
> >>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>> breaks
> >>>>>>>>>>>>>>>>> the current contract between Flink and users that
> catalog/db
> >>>>>>>> must
> >>>>>>>>>>>>> be
> >>>>>>>>>>>>>>>> valid
> >>>>>>>>>>>>>>>>> in order to operate on.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> How about we instead focus on making it convenient to
> create
> >>>>>>>>>>>>>> catalogs?
> >>>>>>>>>>>>>>>>> Users actually can already do it with ease via program or
> >>>>>> SQL
> >>>>>>>> CLI
> >>>>>>>>>>>>>> yaml
> >>>>>>>>>>>>>>>> file
> >>>>>>>>>>>>>>>>> for an in-memory catalog which has neither extra
> dependency
> >>>>>>> nor
> >>>>>>>>>>>>>>> external
> >>>>>>>>>>>>>>>>> connections. What we can further improve is DDL for
> >>>>>> catalogs,
> >>>>>>>>>>>> and I
> >>>>>>>>>>>>>>>> raised
> >>>>>>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement]
> >>>>>>> driven
> >>>>>>>>>>>> by
> >>>>>>>>>>>>>>> Terry
> >>>>>>>>>>>>>>>>> now.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> In that case, if users would like to experiment via SQL,
> >>>>>> they
> >>>>>>>> can
> >>>>>>>>>>>>>>> easily
> >>>>>>>>>>>>>>>>> create an in memory catalog/database using DDL, then play
> >>>>>> with
> >>>>>>>>>>>> temp
> >>>>>>>>>>>>>>>>> functions.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function
> ObjectIdentifier
> >>>>>>> has
> >>>>>>>>>>>> not
> >>>>>>>>>>>>>>> been
> >>>>>>>>>>>>>>>>> resolved when stack call reaches
> >>>>>>>>>>>> FunctionCatalog#lookupFunction(),
> >>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>> only
> >>>>>>>>>>>>>>>>> been parsed?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good.
> I'm ok
> >>>>>>>> with
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a
> subclass
> >>>>>>> of
> >>>>>>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Another potentially simpler way is:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>>>> // in class FunctionLookup
> >>>>>>>>>>>>>>>>> class Result {
> >>>>>>>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() {
> >>>>>> ... }
> >>>>>>>>>>>>>>>>>      String getName() { ... }
> >>>>>>>>>>>>>>>>>      ...
> >>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>> ```
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> >>>>>>>>>>>>>>>>> [hidden email]> wrote:
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>> I really like the flip and think it clarifies important
> >>>>>>> aspects
> >>>>>>>>>>>> of
> >>>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> system.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> I have two, I hope small suggestions, which will not
> take
> >>>>>>> much
> >>>>>>>>>>>>> time
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>>>> agree on.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the
> >>>>>>>>>>>> existence
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>>>>> cat/db
> >>>>>>>>>>>>>>>>>> for temporary functions? That means not to check it, so
> >>>>>> e.g.
> >>>>>>>>>>>> it's
> >>>>>>>>>>>>>>>> possible
> >>>>>>>>>>>>>>>>>> to create a temporary function in a database that does
> not
> >>>>>>>>>>>> exist.
> >>>>>>>>>>>>> I
> >>>>>>>>>>>>>>>> think
> >>>>>>>>>>>>>>>>>> it's really useful e.g in cases when user wants to
> perform
> >>>>>>>>>>>>>> experiments
> >>>>>>>>>>>>>>>> but
> >>>>>>>>>>>>>>>>>> does not have access to the db yet or temporarily does
> not
> >>>>>>> have
> >>>>>>>>>>>>>>>> connection
> >>>>>>>>>>>>>>>>>> to a catalog.
> >>>>>>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we
> not
> >>>>>>>> loosen
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>>>>>> requirements for all catalog objects such as tables,
> views,
> >>>>>>>>>>>> types
> >>>>>>>>>>>>>> just
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>> the functions? It's really important later on from e.g
> the
> >>>>>>>>>>>>>>>> serializability
> >>>>>>>>>>>>>>>>>> perspective. The important aspect of the
> ObjectIdentifier
> >>>>>> is
> >>>>>>>>>>>> that
> >>>>>>>>>>>>> we
> >>>>>>>>>>>>>>>> know
> >>>>>>>>>>>>>>>>>> that it has been resolved. The suggested changes break
> that
> >>>>>>>>>>>>>>> assumption.
> >>>>>>>>>>>>>>>>>> What do you think about adding an interface
> >>>>>>> FunctionIdentifier
> >>>>>>>> {
> >>>>>>>>>>>>>>>>>> String getName();
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> /**
> >>>>>>>>>>>>>>>>>>    Return 3-part identifier. Empty in case of a built-in
> >>>>>>>>>>>> function.
> >>>>>>>>>>>>>>>>>> */
> >>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
> >>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
> >>>>>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
> >>>>>>>>>>>>>>>>>>   return Optional.of(this);
> >>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>> }
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> class SystemFunctionIdentifier implements
> >>>>>> FunctionIdentifier
> >>>>>>>>>>>> {...}
> >>>>>>>>>>>>>>>>>> WDYT?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]>
> >>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>> +1. LGTM
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
> >>>>>>>>>>>> [hidden email]>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>> Terry Wang
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]>
> 写道:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> +1
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>>>>> Kurt
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> >>>>>>>>>>>>> [hidden email]
> >>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1],
> >>>>>> which
> >>>>>>>>>>>>>> we've
> >>>>>>>>>>>>>>>>>> reached
> >>>>>>>>>>>>>>>>>>>>>> consensus in [2].
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till
> 6:30pm
> >>>>>>>>>>>>> UTC,
> >>>>>>>>>>>>>>> Sep
> >>>>>>>>>>>>>>>>>> 26.
> >>>>>>>>>>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>>>>>>>>>> Bowen
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> >>>>>>>>>>>>>>>>>>>>>> [2]
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> Xuefu Zhang
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>> --
> >>>>>>>>>>>>> Best regards!
> >>>>>>>>>>>>> Rui Li
> >>>>>>>>>>>>>
> >>>>
>
>
12