[VOTE] FLIP-57: Rework FunctionCatalog

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Hi all,

I'd like to start a voting thread for FLIP-57 [1], which we've reached
consensus in [2].

This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26.

Thanks,
Bowen

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Kurt Young
+1

Best,
Kurt


On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> wrote:

> Hi all,
>
> I'd like to start a voting thread for FLIP-57 [1], which we've reached
> consensus in [2].
>
> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26.
>
> Thanks,
> Bowen
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> [2]
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Terry Wang
+1

Best,
Terry Wang



> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
>
> +1
>
> Best,
> Kurt
>
>
> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> wrote:
>
>> Hi all,
>>
>> I'd like to start a voting thread for FLIP-57 [1], which we've reached
>> consensus in [2].
>>
>> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26.
>>
>> Thanks,
>> Bowen
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>> [2]
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Xuefu Z
+1. LGTM

On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> wrote:

> +1
>
> Best,
> Terry Wang
>
>
>
> > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> >
> > +1
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> wrote:
> >
> >> Hi all,
> >>
> >> I'd like to start a voting thread for FLIP-57 [1], which we've reached
> >> consensus in [2].
> >>
> >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26.
> >>
> >> Thanks,
> >> Bowen
> >>
> >> [1]
> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> >> [2]
> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> >>
>
>

--
Xuefu Zhang

"In Honey We Trust!"
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Dawid Wysakowicz
Hi,
I really like the flip and think it clarifies important aspects of the
system.

I have two, I hope small suggestions, which will not take much time to
agree on.

1. Could we follow the MySQL approach in regards to the existence of cat/db
for temporary functions? That means not to check it, so e.g. it's possible
to create a temporary function in a database that does not exist. I think
it's really useful e.g in cases when user wants to perform experiments but
does not have access to the db yet or temporarily does not have connection
to a catalog.
2. Could we not change the ObjectIdentifier? Could we not loosen the
requirements for all catalog objects such as tables, views, types just for
the functions? It's really important later on from e.g the serializability
perspective. The important aspect of the ObjectIdentifier is that we know
that it has been resolved. The suggested changes break that assumption.

What do you think about adding an interface FunctionIdentifier {

String getName();

/**
  Return 3-part identifier. Empty in case of a built-in function.
*/
Optional<ObjectIdentifier> getObjectIdentifier()
}

class ObjectIdentifier implements FunctionIdentifier {
Optional<ObjectIdentifier> getObjectIdentifier() {
 return Optional.of(this);
}
}

class SystemFunctionIdentifier implements FunctionIdentifier {...}

WDYT?

On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:

> +1. LGTM
>
> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> wrote:
>
> > +1
> >
> > Best,
> > Terry Wang
> >
> >
> >
> > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> > >
> > > +1
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I'd like to start a voting thread for FLIP-57 [1], which we've reached
> > >> consensus in [2].
> > >>
> > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26.
> > >>
> > >> Thanks,
> > >> Bowen
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > >> [2]
> > >>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > >>
> >
> >
>
> --
> Xuefu Zhang
>
> "In Honey We Trust!"
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Hi Dawid,

Re 1): I agree making it easy for users to run experiments is important.
However, I'm not sure allowing users to register temp functions in
nonexistent catalog/db is the optimal way. It seems a bit hacky, and breaks
the current contract between Flink and users that catalog/db must be valid
in order to operate on.

How about we instead focus on making it convenient to create catalogs?
Users actually can already do it with ease via program or SQL CLI yaml file
for an in-memory catalog which has neither extra dependency nor external
connections. What we can further improve is DDL for catalogs, and I raised
it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by Terry
now.

In that case, if users would like to experiment via SQL, they can easily
create an in memory catalog/database using DDL, then play with temp
functions.

Re 2): For the assumption, IIUIC, function ObjectIdentifier has not been
resolved when stack call reaches FunctionCatalog#lookupFunction(), but only
been parsed?

I agree keeping ObjectIdentifier as-is would be good. I'm ok with the
suggested classes, though making ObjectIdentifier a subclass of
FunctionIdentifier seem a bit counter intuitive.

Another potentially simpler way is:

```
// in class FunctionLookup
class Result {
    Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
    String getName() { ... }
    ...
}
```

WDYT?



On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi,
> I really like the flip and think it clarifies important aspects of the
> system.
>
> I have two, I hope small suggestions, which will not take much time to
> agree on.
>
> 1. Could we follow the MySQL approach in regards to the existence of cat/db
> for temporary functions? That means not to check it, so e.g. it's possible
> to create a temporary function in a database that does not exist. I think
> it's really useful e.g in cases when user wants to perform experiments but
> does not have access to the db yet or temporarily does not have connection
> to a catalog.
> 2. Could we not change the ObjectIdentifier? Could we not loosen the
> requirements for all catalog objects such as tables, views, types just for
> the functions? It's really important later on from e.g the serializability
> perspective. The important aspect of the ObjectIdentifier is that we know
> that it has been resolved. The suggested changes break that assumption.
>
> What do you think about adding an interface FunctionIdentifier {
>
> String getName();
>
> /**
>   Return 3-part identifier. Empty in case of a built-in function.
> */
> Optional<ObjectIdentifier> getObjectIdentifier()
> }
>
> class ObjectIdentifier implements FunctionIdentifier {
> Optional<ObjectIdentifier> getObjectIdentifier() {
>  return Optional.of(this);
> }
> }
>
> class SystemFunctionIdentifier implements FunctionIdentifier {...}
>
> WDYT?
>
> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
>
> > +1. LGTM
> >
> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> wrote:
> >
> > > +1
> > >
> > > Best,
> > > Terry Wang
> > >
> > >
> > >
> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> > > >
> > > > +1
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]>
> wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've
> reached
> > > >> consensus in [2].
> > > >>
> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep 26.
> > > >>
> > > >> Thanks,
> > > >> Bowen
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > > >> [2]
> > > >>
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > > >>
> > >
> > >
> >
> > --
> > Xuefu Zhang
> >
> > "In Honey We Trust!"
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Sorry, I missed some parts of the solution. The complete alternative is the
following, basically having separate APIs in FunctionLookup for ambiguous
and precise function lookup since planner is able to tell which API to call
with parsed queries, and have a unified result:

```
class FunctionLookup {

Optional<Result> lookupAmbiguousFunction(String name);


Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);


class Result {
    Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
    String getName() { ... }
    // ...
}

}
```

Thanks,
Bowen


On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> wrote:

> Hi Dawid,
>
> Re 1): I agree making it easy for users to run experiments is important.
> However, I'm not sure allowing users to register temp functions in
> nonexistent catalog/db is the optimal way. It seems a bit hacky, and breaks
> the current contract between Flink and users that catalog/db must be valid
> in order to operate on.
>
> How about we instead focus on making it convenient to create catalogs?
> Users actually can already do it with ease via program or SQL CLI yaml file
> for an in-memory catalog which has neither extra dependency nor external
> connections. What we can further improve is DDL for catalogs, and I raised
> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by Terry
> now.
>
> In that case, if users would like to experiment via SQL, they can easily
> create an in memory catalog/database using DDL, then play with temp
> functions.
>
> Re 2): For the assumption, IIUIC, function ObjectIdentifier has not been
> resolved when stack call reaches FunctionCatalog#lookupFunction(), but only
> been parsed?
>
> I agree keeping ObjectIdentifier as-is would be good. I'm ok with the
> suggested classes, though making ObjectIdentifier a subclass of
> FunctionIdentifier seem a bit counter intuitive.
>
> Another potentially simpler way is:
>
> ```
> // in class FunctionLookup
> class Result {
>     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
>     String getName() { ... }
>     ...
> }
> ```
>
> WDYT?
>
>
>
> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> [hidden email]> wrote:
>
>> Hi,
>> I really like the flip and think it clarifies important aspects of the
>> system.
>>
>> I have two, I hope small suggestions, which will not take much time to
>> agree on.
>>
>> 1. Could we follow the MySQL approach in regards to the existence of
>> cat/db
>> for temporary functions? That means not to check it, so e.g. it's possible
>> to create a temporary function in a database that does not exist. I think
>> it's really useful e.g in cases when user wants to perform experiments but
>> does not have access to the db yet or temporarily does not have connection
>> to a catalog.
>> 2. Could we not change the ObjectIdentifier? Could we not loosen the
>> requirements for all catalog objects such as tables, views, types just for
>> the functions? It's really important later on from e.g the serializability
>> perspective. The important aspect of the ObjectIdentifier is that we know
>> that it has been resolved. The suggested changes break that assumption.
>>
>> What do you think about adding an interface FunctionIdentifier {
>>
>> String getName();
>>
>> /**
>>   Return 3-part identifier. Empty in case of a built-in function.
>> */
>> Optional<ObjectIdentifier> getObjectIdentifier()
>> }
>>
>> class ObjectIdentifier implements FunctionIdentifier {
>> Optional<ObjectIdentifier> getObjectIdentifier() {
>>  return Optional.of(this);
>> }
>> }
>>
>> class SystemFunctionIdentifier implements FunctionIdentifier {...}
>>
>> WDYT?
>>
>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
>>
>> > +1. LGTM
>> >
>> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]> wrote:
>> >
>> > > +1
>> > >
>> > > Best,
>> > > Terry Wang
>> > >
>> > >
>> > >
>> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
>> > > >
>> > > > +1
>> > > >
>> > > > Best,
>> > > > Kurt
>> > > >
>> > > >
>> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]>
>> wrote:
>> > > >
>> > > >> Hi all,
>> > > >>
>> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've
>> reached
>> > > >> consensus in [2].
>> > > >>
>> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep
>> 26.
>> > > >>
>> > > >> Thanks,
>> > > >> Bowen
>> > > >>
>> > > >> [1]
>> > > >>
>> > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>> > > >> [2]
>> > > >>
>> > > >>
>> > >
>> >
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>> > > >>
>> > >
>> > >
>> >
>> > --
>> > Xuefu Zhang
>> >
>> > "In Honey We Trust!"
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Dawid Wysakowicz
Ad. 1
I wouldn't say it is hacky.
Moreover, how do you want ensure that the dB always exists when a temporary
object is used?( in this particular case function). Do you want to query
for the database existence whenever e.g a temporary function is used? I
think important aspect here is that the database can be dropped from
external system, not just flink or a different flink session.

E.g in case of hive, you cannot create a temporary table in a database that
does not exist, that's true. But if you create a temporary table in a
database and drop that database from a different session, you can still
query the previously created temporary table from the original session. It
does not sound like a consistent behaviour to me. Why don't we make this
behaviour of not binding a temporary objects to the lifetime of a database
explicit as part of the temporary objects contract? In the end they exist
in different layers. Permanent objects & databases in a catalog (in case of
hive megastore) whereas temporary objects in flink sessions. That's also
true for the original hive client. The temporary objects live in the hive
client whereas databases are created in the metastore.

Ad.2
I'm open for suggestions here. The one thing I wanted to achieve here is so
that we do not change the contract of ObjectIdentifier. One important thing
to remember here is that we need the function identifier to be part of the
FunctionDefinition object and not only as the result of the function
lookup. At some point we want to be able to store QueryOperations in the
catalogs. They can contain function calls within which we need to have the
identifier.

I agree my initial suggestion is over complicated. How about we have just
the FunctionIdentifier as top level class without making the
ObjectIdentifier extend from it? I think it's pretty much the same what you
suggested. The only difference is that it would be a top level class with a
more descriptive name.


On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:

> Sorry, I missed some parts of the solution. The complete alternative is the
> following, basically having separate APIs in FunctionLookup for ambiguous
> and precise function lookup since planner is able to tell which API to call
> with parsed queries, and have a unified result:
>
> ```
> class FunctionLookup {
>
> Optional<Result> lookupAmbiguousFunction(String name);
>
>
> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
>
>
> class Result {
>     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
>     String getName() { ... }
>     // ...
> }
>
> }
> ```
>
> Thanks,
> Bowen
>
>
> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> wrote:
>
> > Hi Dawid,
> >
> > Re 1): I agree making it easy for users to run experiments is important.
> > However, I'm not sure allowing users to register temp functions in
> > nonexistent catalog/db is the optimal way. It seems a bit hacky, and
> breaks
> > the current contract between Flink and users that catalog/db must be
> valid
> > in order to operate on.
> >
> > How about we instead focus on making it convenient to create catalogs?
> > Users actually can already do it with ease via program or SQL CLI yaml
> file
> > for an in-memory catalog which has neither extra dependency nor external
> > connections. What we can further improve is DDL for catalogs, and I
> raised
> > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by Terry
> > now.
> >
> > In that case, if users would like to experiment via SQL, they can easily
> > create an in memory catalog/database using DDL, then play with temp
> > functions.
> >
> > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not been
> > resolved when stack call reaches FunctionCatalog#lookupFunction(), but
> only
> > been parsed?
> >
> > I agree keeping ObjectIdentifier as-is would be good. I'm ok with the
> > suggested classes, though making ObjectIdentifier a subclass of
> > FunctionIdentifier seem a bit counter intuitive.
> >
> > Another potentially simpler way is:
> >
> > ```
> > // in class FunctionLookup
> > class Result {
> >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> >     String getName() { ... }
> >     ...
> > }
> > ```
> >
> > WDYT?
> >
> >
> >
> > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> > [hidden email]> wrote:
> >
> >> Hi,
> >> I really like the flip and think it clarifies important aspects of the
> >> system.
> >>
> >> I have two, I hope small suggestions, which will not take much time to
> >> agree on.
> >>
> >> 1. Could we follow the MySQL approach in regards to the existence of
> >> cat/db
> >> for temporary functions? That means not to check it, so e.g. it's
> possible
> >> to create a temporary function in a database that does not exist. I
> think
> >> it's really useful e.g in cases when user wants to perform experiments
> but
> >> does not have access to the db yet or temporarily does not have
> connection
> >> to a catalog.
> >> 2. Could we not change the ObjectIdentifier? Could we not loosen the
> >> requirements for all catalog objects such as tables, views, types just
> for
> >> the functions? It's really important later on from e.g the
> serializability
> >> perspective. The important aspect of the ObjectIdentifier is that we
> know
> >> that it has been resolved. The suggested changes break that assumption.
> >>
> >> What do you think about adding an interface FunctionIdentifier {
> >>
> >> String getName();
> >>
> >> /**
> >>   Return 3-part identifier. Empty in case of a built-in function.
> >> */
> >> Optional<ObjectIdentifier> getObjectIdentifier()
> >> }
> >>
> >> class ObjectIdentifier implements FunctionIdentifier {
> >> Optional<ObjectIdentifier> getObjectIdentifier() {
> >>  return Optional.of(this);
> >> }
> >> }
> >>
> >> class SystemFunctionIdentifier implements FunctionIdentifier {...}
> >>
> >> WDYT?
> >>
> >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
> >>
> >> > +1. LGTM
> >> >
> >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]>
> wrote:
> >> >
> >> > > +1
> >> > >
> >> > > Best,
> >> > > Terry Wang
> >> > >
> >> > >
> >> > >
> >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> >> > > >
> >> > > > +1
> >> > > >
> >> > > > Best,
> >> > > > Kurt
> >> > > >
> >> > > >
> >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]>
> >> wrote:
> >> > > >
> >> > > >> Hi all,
> >> > > >>
> >> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've
> >> reached
> >> > > >> consensus in [2].
> >> > > >>
> >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC, Sep
> >> 26.
> >> > > >>
> >> > > >> Thanks,
> >> > > >> Bowen
> >> > > >>
> >> > > >> [1]
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> >> > > >> [2]
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> >> > > >>
> >> > >
> >> > >
> >> >
> >> > --
> >> > Xuefu Zhang
> >> >
> >> > "In Honey We Trust!"
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Re 1) As described in the FLIP, a temp function lookup will first make sure
the db exists. If the db doesn't exist, a lazy drop is triggered to remove
that temp function.

I agree Hive doesn't handle it consistently, and we are not copying Hive.

IMHO, allowing registering temp functions in nonexistent catalog/db is
hacky and problematic. For instance, "SHOW FUNCTIONS" would list system
functions and functions in the current catalog/db, since users cannot
designate a nonexistent catalog/db as current ones, how can they list
functions in nonexistent catalog/db? They may end up never knowing what
temp functions they've created unless trying out with queries or we
introducing some more nonstandard SQL statements. The same applies to other
temp objects like temp tables.

Re 2) A standalone FunctionIdentifier sounds good to me

On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <[hidden email]>
wrote:

> Ad. 1
> I wouldn't say it is hacky.
> Moreover, how do you want ensure that the dB always exists when a temporary
> object is used?( in this particular case function). Do you want to query
> for the database existence whenever e.g a temporary function is used? I
> think important aspect here is that the database can be dropped from
> external system, not just flink or a different flink session.
>
> E.g in case of hive, you cannot create a temporary table in a database that
> does not exist, that's true. But if you create a temporary table in a
> database and drop that database from a different session, you can still
> query the previously created temporary table from the original session. It
> does not sound like a consistent behaviour to me. Why don't we make this
> behaviour of not binding a temporary objects to the lifetime of a database
> explicit as part of the temporary objects contract? In the end they exist
> in different layers. Permanent objects & databases in a catalog (in case of
> hive megastore) whereas temporary objects in flink sessions. That's also
> true for the original hive client. The temporary objects live in the hive
> client whereas databases are created in the metastore.
>
> Ad.2
> I'm open for suggestions here. The one thing I wanted to achieve here is so
> that we do not change the contract of ObjectIdentifier. One important thing
> to remember here is that we need the function identifier to be part of the
> FunctionDefinition object and not only as the result of the function
> lookup. At some point we want to be able to store QueryOperations in the
> catalogs. They can contain function calls within which we need to have the
> identifier.
>
> I agree my initial suggestion is over complicated. How about we have just
> the FunctionIdentifier as top level class without making the
> ObjectIdentifier extend from it? I think it's pretty much the same what you
> suggested. The only difference is that it would be a top level class with a
> more descriptive name.
>
>
> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:
>
> > Sorry, I missed some parts of the solution. The complete alternative is
> the
> > following, basically having separate APIs in FunctionLookup for ambiguous
> > and precise function lookup since planner is able to tell which API to
> call
> > with parsed queries, and have a unified result:
> >
> > ```
> > class FunctionLookup {
> >
> > Optional<Result> lookupAmbiguousFunction(String name);
> >
> >
> > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> >
> >
> > class Result {
> >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> >     String getName() { ... }
> >     // ...
> > }
> >
> > }
> > ```
> >
> > Thanks,
> > Bowen
> >
> >
> > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> wrote:
> >
> > > Hi Dawid,
> > >
> > > Re 1): I agree making it easy for users to run experiments is
> important.
> > > However, I'm not sure allowing users to register temp functions in
> > > nonexistent catalog/db is the optimal way. It seems a bit hacky, and
> > breaks
> > > the current contract between Flink and users that catalog/db must be
> > valid
> > > in order to operate on.
> > >
> > > How about we instead focus on making it convenient to create catalogs?
> > > Users actually can already do it with ease via program or SQL CLI yaml
> > file
> > > for an in-memory catalog which has neither extra dependency nor
> external
> > > connections. What we can further improve is DDL for catalogs, and I
> > raised
> > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by
> Terry
> > > now.
> > >
> > > In that case, if users would like to experiment via SQL, they can
> easily
> > > create an in memory catalog/database using DDL, then play with temp
> > > functions.
> > >
> > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not
> been
> > > resolved when stack call reaches FunctionCatalog#lookupFunction(), but
> > only
> > > been parsed?
> > >
> > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with the
> > > suggested classes, though making ObjectIdentifier a subclass of
> > > FunctionIdentifier seem a bit counter intuitive.
> > >
> > > Another potentially simpler way is:
> > >
> > > ```
> > > // in class FunctionLookup
> > > class Result {
> > >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > >     String getName() { ... }
> > >     ...
> > > }
> > > ```
> > >
> > > WDYT?
> > >
> > >
> > >
> > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> > > [hidden email]> wrote:
> > >
> > >> Hi,
> > >> I really like the flip and think it clarifies important aspects of the
> > >> system.
> > >>
> > >> I have two, I hope small suggestions, which will not take much time to
> > >> agree on.
> > >>
> > >> 1. Could we follow the MySQL approach in regards to the existence of
> > >> cat/db
> > >> for temporary functions? That means not to check it, so e.g. it's
> > possible
> > >> to create a temporary function in a database that does not exist. I
> > think
> > >> it's really useful e.g in cases when user wants to perform experiments
> > but
> > >> does not have access to the db yet or temporarily does not have
> > connection
> > >> to a catalog.
> > >> 2. Could we not change the ObjectIdentifier? Could we not loosen the
> > >> requirements for all catalog objects such as tables, views, types just
> > for
> > >> the functions? It's really important later on from e.g the
> > serializability
> > >> perspective. The important aspect of the ObjectIdentifier is that we
> > know
> > >> that it has been resolved. The suggested changes break that
> assumption.
> > >>
> > >> What do you think about adding an interface FunctionIdentifier {
> > >>
> > >> String getName();
> > >>
> > >> /**
> > >>   Return 3-part identifier. Empty in case of a built-in function.
> > >> */
> > >> Optional<ObjectIdentifier> getObjectIdentifier()
> > >> }
> > >>
> > >> class ObjectIdentifier implements FunctionIdentifier {
> > >> Optional<ObjectIdentifier> getObjectIdentifier() {
> > >>  return Optional.of(this);
> > >> }
> > >> }
> > >>
> > >> class SystemFunctionIdentifier implements FunctionIdentifier {...}
> > >>
> > >> WDYT?
> > >>
> > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
> > >>
> > >> > +1. LGTM
> > >> >
> > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]>
> > wrote:
> > >> >
> > >> > > +1
> > >> > >
> > >> > > Best,
> > >> > > Terry Wang
> > >> > >
> > >> > >
> > >> > >
> > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> > >> > > >
> > >> > > > +1
> > >> > > >
> > >> > > > Best,
> > >> > > > Kurt
> > >> > > >
> > >> > > >
> > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]>
> > >> wrote:
> > >> > > >
> > >> > > >> Hi all,
> > >> > > >>
> > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which we've
> > >> reached
> > >> > > >> consensus in [2].
> > >> > > >>
> > >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC,
> Sep
> > >> 26.
> > >> > > >>
> > >> > > >> Thanks,
> > >> > > >> Bowen
> > >> > > >>
> > >> > > >> [1]
> > >> > > >>
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > >> > > >> [2]
> > >> > > >>
> > >> > > >>
> > >> > >
> > >> >
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > >> > > >>
> > >> > >
> > >> > >
> > >> >
> > >> > --
> > >> > Xuefu Zhang
> > >> >
> > >> > "In Honey We Trust!"
> > >> >
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Rui Li
I'm not sure how much benefit #1 can bring us. If users just want to try
out temporary functions, they can create temporary system functions which
don't require a catalog/DB. IIUC, the main reason why we allow temporary
catalog function is to let users override permanent catalog functions.
Therefore a temporary function in a non-existing catalog won't serve that
purpose. Besides, each session is provided with a default catalog and DB.
So even if users simply want to create some catalog functions they can
forget about after the session, wouldn't the default catalog/DB be enough
for such experiments?

On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote:

> Re 1) As described in the FLIP, a temp function lookup will first make sure
> the db exists. If the db doesn't exist, a lazy drop is triggered to remove
> that temp function.
>
> I agree Hive doesn't handle it consistently, and we are not copying Hive.
>
> IMHO, allowing registering temp functions in nonexistent catalog/db is
> hacky and problematic. For instance, "SHOW FUNCTIONS" would list system
> functions and functions in the current catalog/db, since users cannot
> designate a nonexistent catalog/db as current ones, how can they list
> functions in nonexistent catalog/db? They may end up never knowing what
> temp functions they've created unless trying out with queries or we
> introducing some more nonstandard SQL statements. The same applies to other
> temp objects like temp tables.
>
> Re 2) A standalone FunctionIdentifier sounds good to me
>
> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> [hidden email]>
> wrote:
>
> > Ad. 1
> > I wouldn't say it is hacky.
> > Moreover, how do you want ensure that the dB always exists when a
> temporary
> > object is used?( in this particular case function). Do you want to query
> > for the database existence whenever e.g a temporary function is used? I
> > think important aspect here is that the database can be dropped from
> > external system, not just flink or a different flink session.
> >
> > E.g in case of hive, you cannot create a temporary table in a database
> that
> > does not exist, that's true. But if you create a temporary table in a
> > database and drop that database from a different session, you can still
> > query the previously created temporary table from the original session.
> It
> > does not sound like a consistent behaviour to me. Why don't we make this
> > behaviour of not binding a temporary objects to the lifetime of a
> database
> > explicit as part of the temporary objects contract? In the end they exist
> > in different layers. Permanent objects & databases in a catalog (in case
> of
> > hive megastore) whereas temporary objects in flink sessions. That's also
> > true for the original hive client. The temporary objects live in the hive
> > client whereas databases are created in the metastore.
> >
> > Ad.2
> > I'm open for suggestions here. The one thing I wanted to achieve here is
> so
> > that we do not change the contract of ObjectIdentifier. One important
> thing
> > to remember here is that we need the function identifier to be part of
> the
> > FunctionDefinition object and not only as the result of the function
> > lookup. At some point we want to be able to store QueryOperations in the
> > catalogs. They can contain function calls within which we need to have
> the
> > identifier.
> >
> > I agree my initial suggestion is over complicated. How about we have just
> > the FunctionIdentifier as top level class without making the
> > ObjectIdentifier extend from it? I think it's pretty much the same what
> you
> > suggested. The only difference is that it would be a top level class
> with a
> > more descriptive name.
> >
> >
> > On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:
> >
> > > Sorry, I missed some parts of the solution. The complete alternative is
> > the
> > > following, basically having separate APIs in FunctionLookup for
> ambiguous
> > > and precise function lookup since planner is able to tell which API to
> > call
> > > with parsed queries, and have a unified result:
> > >
> > > ```
> > > class FunctionLookup {
> > >
> > > Optional<Result> lookupAmbiguousFunction(String name);
> > >
> > >
> > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> > >
> > >
> > > class Result {
> > >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > >     String getName() { ... }
> > >     // ...
> > > }
> > >
> > > }
> > > ```
> > >
> > > Thanks,
> > > Bowen
> > >
> > >
> > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]> wrote:
> > >
> > > > Hi Dawid,
> > > >
> > > > Re 1): I agree making it easy for users to run experiments is
> > important.
> > > > However, I'm not sure allowing users to register temp functions in
> > > > nonexistent catalog/db is the optimal way. It seems a bit hacky, and
> > > breaks
> > > > the current contract between Flink and users that catalog/db must be
> > > valid
> > > > in order to operate on.
> > > >
> > > > How about we instead focus on making it convenient to create
> catalogs?
> > > > Users actually can already do it with ease via program or SQL CLI
> yaml
> > > file
> > > > for an in-memory catalog which has neither extra dependency nor
> > external
> > > > connections. What we can further improve is DDL for catalogs, and I
> > > raised
> > > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by
> > Terry
> > > > now.
> > > >
> > > > In that case, if users would like to experiment via SQL, they can
> > easily
> > > > create an in memory catalog/database using DDL, then play with temp
> > > > functions.
> > > >
> > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not
> > been
> > > > resolved when stack call reaches FunctionCatalog#lookupFunction(),
> but
> > > only
> > > > been parsed?
> > > >
> > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with the
> > > > suggested classes, though making ObjectIdentifier a subclass of
> > > > FunctionIdentifier seem a bit counter intuitive.
> > > >
> > > > Another potentially simpler way is:
> > > >
> > > > ```
> > > > // in class FunctionLookup
> > > > class Result {
> > > >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > > >     String getName() { ... }
> > > >     ...
> > > > }
> > > > ```
> > > >
> > > > WDYT?
> > > >
> > > >
> > > >
> > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> > > > [hidden email]> wrote:
> > > >
> > > >> Hi,
> > > >> I really like the flip and think it clarifies important aspects of
> the
> > > >> system.
> > > >>
> > > >> I have two, I hope small suggestions, which will not take much time
> to
> > > >> agree on.
> > > >>
> > > >> 1. Could we follow the MySQL approach in regards to the existence of
> > > >> cat/db
> > > >> for temporary functions? That means not to check it, so e.g. it's
> > > possible
> > > >> to create a temporary function in a database that does not exist. I
> > > think
> > > >> it's really useful e.g in cases when user wants to perform
> experiments
> > > but
> > > >> does not have access to the db yet or temporarily does not have
> > > connection
> > > >> to a catalog.
> > > >> 2. Could we not change the ObjectIdentifier? Could we not loosen the
> > > >> requirements for all catalog objects such as tables, views, types
> just
> > > for
> > > >> the functions? It's really important later on from e.g the
> > > serializability
> > > >> perspective. The important aspect of the ObjectIdentifier is that we
> > > know
> > > >> that it has been resolved. The suggested changes break that
> > assumption.
> > > >>
> > > >> What do you think about adding an interface FunctionIdentifier {
> > > >>
> > > >> String getName();
> > > >>
> > > >> /**
> > > >>   Return 3-part identifier. Empty in case of a built-in function.
> > > >> */
> > > >> Optional<ObjectIdentifier> getObjectIdentifier()
> > > >> }
> > > >>
> > > >> class ObjectIdentifier implements FunctionIdentifier {
> > > >> Optional<ObjectIdentifier> getObjectIdentifier() {
> > > >>  return Optional.of(this);
> > > >> }
> > > >> }
> > > >>
> > > >> class SystemFunctionIdentifier implements FunctionIdentifier {...}
> > > >>
> > > >> WDYT?
> > > >>
> > > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
> > > >>
> > > >> > +1. LGTM
> > > >> >
> > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]>
> > > wrote:
> > > >> >
> > > >> > > +1
> > > >> > >
> > > >> > > Best,
> > > >> > > Terry Wang
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> > > >> > > >
> > > >> > > > +1
> > > >> > > >
> > > >> > > > Best,
> > > >> > > > Kurt
> > > >> > > >
> > > >> > > >
> > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <[hidden email]
> >
> > > >> wrote:
> > > >> > > >
> > > >> > > >> Hi all,
> > > >> > > >>
> > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which
> we've
> > > >> reached
> > > >> > > >> consensus in [2].
> > > >> > > >>
> > > >> > > >> This voting will be open for minimum 3 days till 6:30pm UTC,
> > Sep
> > > >> 26.
> > > >> > > >>
> > > >> > > >> Thanks,
> > > >> > > >> Bowen
> > > >> > > >>
> > > >> > > >> [1]
> > > >> > > >>
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > > >> > > >> [2]
> > > >> > > >>
> > > >> > > >>
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > > >> > > >>
> > > >> > >
> > > >> > >
> > > >> >
> > > >> > --
> > > >> > Xuefu Zhang
> > > >> >
> > > >> > "In Honey We Trust!"
> > > >> >
> > > >>
> > > >
> > >
> >
>


--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
@Dawid, do you have any other concerns? If not, I hope we can close the
voting.


On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote:

> I'm not sure how much benefit #1 can bring us. If users just want to try
> out temporary functions, they can create temporary system functions which
> don't require a catalog/DB. IIUC, the main reason why we allow temporary
> catalog function is to let users override permanent catalog functions.
> Therefore a temporary function in a non-existing catalog won't serve that
> purpose. Besides, each session is provided with a default catalog and DB.
> So even if users simply want to create some catalog functions they can
> forget about after the session, wouldn't the default catalog/DB be enough
> for such experiments?
>
> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote:
>
> > Re 1) As described in the FLIP, a temp function lookup will first make
> sure
> > the db exists. If the db doesn't exist, a lazy drop is triggered to
> remove
> > that temp function.
> >
> > I agree Hive doesn't handle it consistently, and we are not copying Hive.
> >
> > IMHO, allowing registering temp functions in nonexistent catalog/db is
> > hacky and problematic. For instance, "SHOW FUNCTIONS" would list system
> > functions and functions in the current catalog/db, since users cannot
> > designate a nonexistent catalog/db as current ones, how can they list
> > functions in nonexistent catalog/db? They may end up never knowing what
> > temp functions they've created unless trying out with queries or we
> > introducing some more nonstandard SQL statements. The same applies to
> other
> > temp objects like temp tables.
> >
> > Re 2) A standalone FunctionIdentifier sounds good to me
> >
> > On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> > [hidden email]>
> > wrote:
> >
> > > Ad. 1
> > > I wouldn't say it is hacky.
> > > Moreover, how do you want ensure that the dB always exists when a
> > temporary
> > > object is used?( in this particular case function). Do you want to
> query
> > > for the database existence whenever e.g a temporary function is used? I
> > > think important aspect here is that the database can be dropped from
> > > external system, not just flink or a different flink session.
> > >
> > > E.g in case of hive, you cannot create a temporary table in a database
> > that
> > > does not exist, that's true. But if you create a temporary table in a
> > > database and drop that database from a different session, you can still
> > > query the previously created temporary table from the original session.
> > It
> > > does not sound like a consistent behaviour to me. Why don't we make
> this
> > > behaviour of not binding a temporary objects to the lifetime of a
> > database
> > > explicit as part of the temporary objects contract? In the end they
> exist
> > > in different layers. Permanent objects & databases in a catalog (in
> case
> > of
> > > hive megastore) whereas temporary objects in flink sessions. That's
> also
> > > true for the original hive client. The temporary objects live in the
> hive
> > > client whereas databases are created in the metastore.
> > >
> > > Ad.2
> > > I'm open for suggestions here. The one thing I wanted to achieve here
> is
> > so
> > > that we do not change the contract of ObjectIdentifier. One important
> > thing
> > > to remember here is that we need the function identifier to be part of
> > the
> > > FunctionDefinition object and not only as the result of the function
> > > lookup. At some point we want to be able to store QueryOperations in
> the
> > > catalogs. They can contain function calls within which we need to have
> > the
> > > identifier.
> > >
> > > I agree my initial suggestion is over complicated. How about we have
> just
> > > the FunctionIdentifier as top level class without making the
> > > ObjectIdentifier extend from it? I think it's pretty much the same what
> > you
> > > suggested. The only difference is that it would be a top level class
> > with a
> > > more descriptive name.
> > >
> > >
> > > On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:
> > >
> > > > Sorry, I missed some parts of the solution. The complete alternative
> is
> > > the
> > > > following, basically having separate APIs in FunctionLookup for
> > ambiguous
> > > > and precise function lookup since planner is able to tell which API
> to
> > > call
> > > > with parsed queries, and have a unified result:
> > > >
> > > > ```
> > > > class FunctionLookup {
> > > >
> > > > Optional<Result> lookupAmbiguousFunction(String name);
> > > >
> > > >
> > > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> > > >
> > > >
> > > > class Result {
> > > >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > > >     String getName() { ... }
> > > >     // ...
> > > > }
> > > >
> > > > }
> > > > ```
> > > >
> > > > Thanks,
> > > > Bowen
> > > >
> > > >
> > > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]>
> wrote:
> > > >
> > > > > Hi Dawid,
> > > > >
> > > > > Re 1): I agree making it easy for users to run experiments is
> > > important.
> > > > > However, I'm not sure allowing users to register temp functions in
> > > > > nonexistent catalog/db is the optimal way. It seems a bit hacky,
> and
> > > > breaks
> > > > > the current contract between Flink and users that catalog/db must
> be
> > > > valid
> > > > > in order to operate on.
> > > > >
> > > > > How about we instead focus on making it convenient to create
> > catalogs?
> > > > > Users actually can already do it with ease via program or SQL CLI
> > yaml
> > > > file
> > > > > for an in-memory catalog which has neither extra dependency nor
> > > external
> > > > > connections. What we can further improve is DDL for catalogs, and I
> > > > raised
> > > > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven by
> > > Terry
> > > > > now.
> > > > >
> > > > > In that case, if users would like to experiment via SQL, they can
> > > easily
> > > > > create an in memory catalog/database using DDL, then play with temp
> > > > > functions.
> > > > >
> > > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has not
> > > been
> > > > > resolved when stack call reaches FunctionCatalog#lookupFunction(),
> > but
> > > > only
> > > > > been parsed?
> > > > >
> > > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with
> the
> > > > > suggested classes, though making ObjectIdentifier a subclass of
> > > > > FunctionIdentifier seem a bit counter intuitive.
> > > > >
> > > > > Another potentially simpler way is:
> > > > >
> > > > > ```
> > > > > // in class FunctionLookup
> > > > > class Result {
> > > > >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > > > >     String getName() { ... }
> > > > >     ...
> > > > > }
> > > > > ```
> > > > >
> > > > > WDYT?
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> > > > > [hidden email]> wrote:
> > > > >
> > > > >> Hi,
> > > > >> I really like the flip and think it clarifies important aspects of
> > the
> > > > >> system.
> > > > >>
> > > > >> I have two, I hope small suggestions, which will not take much
> time
> > to
> > > > >> agree on.
> > > > >>
> > > > >> 1. Could we follow the MySQL approach in regards to the existence
> of
> > > > >> cat/db
> > > > >> for temporary functions? That means not to check it, so e.g. it's
> > > > possible
> > > > >> to create a temporary function in a database that does not exist.
> I
> > > > think
> > > > >> it's really useful e.g in cases when user wants to perform
> > experiments
> > > > but
> > > > >> does not have access to the db yet or temporarily does not have
> > > > connection
> > > > >> to a catalog.
> > > > >> 2. Could we not change the ObjectIdentifier? Could we not loosen
> the
> > > > >> requirements for all catalog objects such as tables, views, types
> > just
> > > > for
> > > > >> the functions? It's really important later on from e.g the
> > > > serializability
> > > > >> perspective. The important aspect of the ObjectIdentifier is that
> we
> > > > know
> > > > >> that it has been resolved. The suggested changes break that
> > > assumption.
> > > > >>
> > > > >> What do you think about adding an interface FunctionIdentifier {
> > > > >>
> > > > >> String getName();
> > > > >>
> > > > >> /**
> > > > >>   Return 3-part identifier. Empty in case of a built-in function.
> > > > >> */
> > > > >> Optional<ObjectIdentifier> getObjectIdentifier()
> > > > >> }
> > > > >>
> > > > >> class ObjectIdentifier implements FunctionIdentifier {
> > > > >> Optional<ObjectIdentifier> getObjectIdentifier() {
> > > > >>  return Optional.of(this);
> > > > >> }
> > > > >> }
> > > > >>
> > > > >> class SystemFunctionIdentifier implements FunctionIdentifier {...}
> > > > >>
> > > > >> WDYT?
> > > > >>
> > > > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
> > > > >>
> > > > >> > +1. LGTM
> > > > >> >
> > > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <[hidden email]>
> > > > wrote:
> > > > >> >
> > > > >> > > +1
> > > > >> > >
> > > > >> > > Best,
> > > > >> > > Terry Wang
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> > > > >> > > >
> > > > >> > > > +1
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Kurt
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> [hidden email]
> > >
> > > > >> wrote:
> > > > >> > > >
> > > > >> > > >> Hi all,
> > > > >> > > >>
> > > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which
> > we've
> > > > >> reached
> > > > >> > > >> consensus in [2].
> > > > >> > > >>
> > > > >> > > >> This voting will be open for minimum 3 days till 6:30pm
> UTC,
> > > Sep
> > > > >> 26.
> > > > >> > > >>
> > > > >> > > >> Thanks,
> > > > >> > > >> Bowen
> > > > >> > > >>
> > > > >> > > >> [1]
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > > > >> > > >> [2]
> > > > >> > > >>
> > > > >> > > >>
> > > > >> > >
> > > > >> >
> > > > >>
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > > > >> > > >>
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> > --
> > > > >> > Xuefu Zhang
> > > > >> >
> > > > >> > "In Honey We Trust!"
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
>
> --
> Best regards!
> Rui Li
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Fabian Hueske-2
Hi all,

Sorry for the late reply.

I think it might lead to confusing situations if temporary functions (or
any temporary db objects for that matter) are bound to the life cycle of an
(external) db/catalog.
Imaging a situation where you create a temp function in a db in an external
catalog and use it but at some point it does not work anymore because some
other dropped the database from the external catalog.
Shouldn't temporary objects be only controlled by the owner of a session?

I agree that creating temp objects in non-existing db/catalogs sounds a bit
strange, but IMO the opposite (the db/catalog must exist for a temp
function to be created/exist) can have significant implications like the
one I described.
I think it would be quite easy for users to understand that temporary
objects are solely owned by them (and their session).
The problem of listing temporary objects could be solved by adding a ALL
[TEMPORARY] clause:

SHOW ALL FUNCTIONS; could show all functions regardless of the
catalog/database including temporary functions.
SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions regardless
of the catalog/database.

Best,
Fabian

Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <[hidden email]>:

> @Dawid, do you have any other concerns? If not, I hope we can close the
> voting.
>
>
> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote:
>
> > I'm not sure how much benefit #1 can bring us. If users just want to try
> > out temporary functions, they can create temporary system functions which
> > don't require a catalog/DB. IIUC, the main reason why we allow temporary
> > catalog function is to let users override permanent catalog functions.
> > Therefore a temporary function in a non-existing catalog won't serve that
> > purpose. Besides, each session is provided with a default catalog and DB.
> > So even if users simply want to create some catalog functions they can
> > forget about after the session, wouldn't the default catalog/DB be enough
> > for such experiments?
> >
> > On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote:
> >
> > > Re 1) As described in the FLIP, a temp function lookup will first make
> > sure
> > > the db exists. If the db doesn't exist, a lazy drop is triggered to
> > remove
> > > that temp function.
> > >
> > > I agree Hive doesn't handle it consistently, and we are not copying
> Hive.
> > >
> > > IMHO, allowing registering temp functions in nonexistent catalog/db is
> > > hacky and problematic. For instance, "SHOW FUNCTIONS" would list system
> > > functions and functions in the current catalog/db, since users cannot
> > > designate a nonexistent catalog/db as current ones, how can they list
> > > functions in nonexistent catalog/db? They may end up never knowing what
> > > temp functions they've created unless trying out with queries or we
> > > introducing some more nonstandard SQL statements. The same applies to
> > other
> > > temp objects like temp tables.
> > >
> > > Re 2) A standalone FunctionIdentifier sounds good to me
> > >
> > > On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> > > [hidden email]>
> > > wrote:
> > >
> > > > Ad. 1
> > > > I wouldn't say it is hacky.
> > > > Moreover, how do you want ensure that the dB always exists when a
> > > temporary
> > > > object is used?( in this particular case function). Do you want to
> > query
> > > > for the database existence whenever e.g a temporary function is
> used? I
> > > > think important aspect here is that the database can be dropped from
> > > > external system, not just flink or a different flink session.
> > > >
> > > > E.g in case of hive, you cannot create a temporary table in a
> database
> > > that
> > > > does not exist, that's true. But if you create a temporary table in a
> > > > database and drop that database from a different session, you can
> still
> > > > query the previously created temporary table from the original
> session.
> > > It
> > > > does not sound like a consistent behaviour to me. Why don't we make
> > this
> > > > behaviour of not binding a temporary objects to the lifetime of a
> > > database
> > > > explicit as part of the temporary objects contract? In the end they
> > exist
> > > > in different layers. Permanent objects & databases in a catalog (in
> > case
> > > of
> > > > hive megastore) whereas temporary objects in flink sessions. That's
> > also
> > > > true for the original hive client. The temporary objects live in the
> > hive
> > > > client whereas databases are created in the metastore.
> > > >
> > > > Ad.2
> > > > I'm open for suggestions here. The one thing I wanted to achieve here
> > is
> > > so
> > > > that we do not change the contract of ObjectIdentifier. One important
> > > thing
> > > > to remember here is that we need the function identifier to be part
> of
> > > the
> > > > FunctionDefinition object and not only as the result of the function
> > > > lookup. At some point we want to be able to store QueryOperations in
> > the
> > > > catalogs. They can contain function calls within which we need to
> have
> > > the
> > > > identifier.
> > > >
> > > > I agree my initial suggestion is over complicated. How about we have
> > just
> > > > the FunctionIdentifier as top level class without making the
> > > > ObjectIdentifier extend from it? I think it's pretty much the same
> what
> > > you
> > > > suggested. The only difference is that it would be a top level class
> > > with a
> > > > more descriptive name.
> > > >
> > > >
> > > > On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:
> > > >
> > > > > Sorry, I missed some parts of the solution. The complete
> alternative
> > is
> > > > the
> > > > > following, basically having separate APIs in FunctionLookup for
> > > ambiguous
> > > > > and precise function lookup since planner is able to tell which API
> > to
> > > > call
> > > > > with parsed queries, and have a unified result:
> > > > >
> > > > > ```
> > > > > class FunctionLookup {
> > > > >
> > > > > Optional<Result> lookupAmbiguousFunction(String name);
> > > > >
> > > > >
> > > > > Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> > > > >
> > > > >
> > > > > class Result {
> > > > >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > > > >     String getName() { ... }
> > > > >     // ...
> > > > > }
> > > > >
> > > > > }
> > > > > ```
> > > > >
> > > > > Thanks,
> > > > > Bowen
> > > > >
> > > > >
> > > > > On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]>
> > wrote:
> > > > >
> > > > > > Hi Dawid,
> > > > > >
> > > > > > Re 1): I agree making it easy for users to run experiments is
> > > > important.
> > > > > > However, I'm not sure allowing users to register temp functions
> in
> > > > > > nonexistent catalog/db is the optimal way. It seems a bit hacky,
> > and
> > > > > breaks
> > > > > > the current contract between Flink and users that catalog/db must
> > be
> > > > > valid
> > > > > > in order to operate on.
> > > > > >
> > > > > > How about we instead focus on making it convenient to create
> > > catalogs?
> > > > > > Users actually can already do it with ease via program or SQL CLI
> > > yaml
> > > > > file
> > > > > > for an in-memory catalog which has neither extra dependency nor
> > > > external
> > > > > > connections. What we can further improve is DDL for catalogs,
> and I
> > > > > raised
> > > > > > it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven
> by
> > > > Terry
> > > > > > now.
> > > > > >
> > > > > > In that case, if users would like to experiment via SQL, they can
> > > > easily
> > > > > > create an in memory catalog/database using DDL, then play with
> temp
> > > > > > functions.
> > > > > >
> > > > > > Re 2): For the assumption, IIUIC, function ObjectIdentifier has
> not
> > > > been
> > > > > > resolved when stack call reaches
> FunctionCatalog#lookupFunction(),
> > > but
> > > > > only
> > > > > > been parsed?
> > > > > >
> > > > > > I agree keeping ObjectIdentifier as-is would be good. I'm ok with
> > the
> > > > > > suggested classes, though making ObjectIdentifier a subclass of
> > > > > > FunctionIdentifier seem a bit counter intuitive.
> > > > > >
> > > > > > Another potentially simpler way is:
> > > > > >
> > > > > > ```
> > > > > > // in class FunctionLookup
> > > > > > class Result {
> > > > > >     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > > > > >     String getName() { ... }
> > > > > >     ...
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > WDYT?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> > > > > > [hidden email]> wrote:
> > > > > >
> > > > > >> Hi,
> > > > > >> I really like the flip and think it clarifies important aspects
> of
> > > the
> > > > > >> system.
> > > > > >>
> > > > > >> I have two, I hope small suggestions, which will not take much
> > time
> > > to
> > > > > >> agree on.
> > > > > >>
> > > > > >> 1. Could we follow the MySQL approach in regards to the
> existence
> > of
> > > > > >> cat/db
> > > > > >> for temporary functions? That means not to check it, so e.g.
> it's
> > > > > possible
> > > > > >> to create a temporary function in a database that does not
> exist.
> > I
> > > > > think
> > > > > >> it's really useful e.g in cases when user wants to perform
> > > experiments
> > > > > but
> > > > > >> does not have access to the db yet or temporarily does not have
> > > > > connection
> > > > > >> to a catalog.
> > > > > >> 2. Could we not change the ObjectIdentifier? Could we not loosen
> > the
> > > > > >> requirements for all catalog objects such as tables, views,
> types
> > > just
> > > > > for
> > > > > >> the functions? It's really important later on from e.g the
> > > > > serializability
> > > > > >> perspective. The important aspect of the ObjectIdentifier is
> that
> > we
> > > > > know
> > > > > >> that it has been resolved. The suggested changes break that
> > > > assumption.
> > > > > >>
> > > > > >> What do you think about adding an interface FunctionIdentifier {
> > > > > >>
> > > > > >> String getName();
> > > > > >>
> > > > > >> /**
> > > > > >>   Return 3-part identifier. Empty in case of a built-in
> function.
> > > > > >> */
> > > > > >> Optional<ObjectIdentifier> getObjectIdentifier()
> > > > > >> }
> > > > > >>
> > > > > >> class ObjectIdentifier implements FunctionIdentifier {
> > > > > >> Optional<ObjectIdentifier> getObjectIdentifier() {
> > > > > >>  return Optional.of(this);
> > > > > >> }
> > > > > >> }
> > > > > >>
> > > > > >> class SystemFunctionIdentifier implements FunctionIdentifier
> {...}
> > > > > >>
> > > > > >> WDYT?
> > > > > >>
> > > > > >> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
> > > > > >>
> > > > > >> > +1. LGTM
> > > > > >> >
> > > > > >> > On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
> [hidden email]>
> > > > > wrote:
> > > > > >> >
> > > > > >> > > +1
> > > > > >> > >
> > > > > >> > > Best,
> > > > > >> > > Terry Wang
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > > 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> > > > > >> > > >
> > > > > >> > > > +1
> > > > > >> > > >
> > > > > >> > > > Best,
> > > > > >> > > > Kurt
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > > > On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> > [hidden email]
> > > >
> > > > > >> wrote:
> > > > > >> > > >
> > > > > >> > > >> Hi all,
> > > > > >> > > >>
> > > > > >> > > >> I'd like to start a voting thread for FLIP-57 [1], which
> > > we've
> > > > > >> reached
> > > > > >> > > >> consensus in [2].
> > > > > >> > > >>
> > > > > >> > > >> This voting will be open for minimum 3 days till 6:30pm
> > UTC,
> > > > Sep
> > > > > >> 26.
> > > > > >> > > >>
> > > > > >> > > >> Thanks,
> > > > > >> > > >> Bowen
> > > > > >> > > >>
> > > > > >> > > >> [1]
> > > > > >> > > >>
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > > > > >> > > >> [2]
> > > > > >> > > >>
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > > > > >> > > >>
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >> > --
> > > > > >> > Xuefu Zhang
> > > > > >> >
> > > > > >> > "In Honey We Trust!"
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > Best regards!
> > Rui Li
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Timo Walther-2
Hi all,

I support Fabian's arguments. In my opinion, temporary objects should
just be an additional layer on top of the regular catalog/database
lookup logic. Thus, a temporary table or function has always highest
precedence and should be stable within the local session. Otherwise it
could magically disappear while someone else is performing modifications
in the catalog.

Furthermore, this feature is very useful for prototyping as users can
simply express that a catalog/database is present even through they
might not have access to it currently.

Regards,
Timo


On 30.09.19 14:57, Fabian Hueske wrote:

> Hi all,
>
> Sorry for the late reply.
>
> I think it might lead to confusing situations if temporary functions (or
> any temporary db objects for that matter) are bound to the life cycle of an
> (external) db/catalog.
> Imaging a situation where you create a temp function in a db in an external
> catalog and use it but at some point it does not work anymore because some
> other dropped the database from the external catalog.
> Shouldn't temporary objects be only controlled by the owner of a session?
>
> I agree that creating temp objects in non-existing db/catalogs sounds a bit
> strange, but IMO the opposite (the db/catalog must exist for a temp
> function to be created/exist) can have significant implications like the
> one I described.
> I think it would be quite easy for users to understand that temporary
> objects are solely owned by them (and their session).
> The problem of listing temporary objects could be solved by adding a ALL
> [TEMPORARY] clause:
>
> SHOW ALL FUNCTIONS; could show all functions regardless of the
> catalog/database including temporary functions.
> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions regardless
> of the catalog/database.
>
> Best,
> Fabian
>
> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <[hidden email]>:
>
>> @Dawid, do you have any other concerns? If not, I hope we can close the
>> voting.
>>
>>
>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote:
>>
>>> I'm not sure how much benefit #1 can bring us. If users just want to try
>>> out temporary functions, they can create temporary system functions which
>>> don't require a catalog/DB. IIUC, the main reason why we allow temporary
>>> catalog function is to let users override permanent catalog functions.
>>> Therefore a temporary function in a non-existing catalog won't serve that
>>> purpose. Besides, each session is provided with a default catalog and DB.
>>> So even if users simply want to create some catalog functions they can
>>> forget about after the session, wouldn't the default catalog/DB be enough
>>> for such experiments?
>>>
>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote:
>>>
>>>> Re 1) As described in the FLIP, a temp function lookup will first make
>>> sure
>>>> the db exists. If the db doesn't exist, a lazy drop is triggered to
>>> remove
>>>> that temp function.
>>>>
>>>> I agree Hive doesn't handle it consistently, and we are not copying
>> Hive.
>>>> IMHO, allowing registering temp functions in nonexistent catalog/db is
>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list system
>>>> functions and functions in the current catalog/db, since users cannot
>>>> designate a nonexistent catalog/db as current ones, how can they list
>>>> functions in nonexistent catalog/db? They may end up never knowing what
>>>> temp functions they've created unless trying out with queries or we
>>>> introducing some more nonstandard SQL statements. The same applies to
>>> other
>>>> temp objects like temp tables.
>>>>
>>>> Re 2) A standalone FunctionIdentifier sounds good to me
>>>>
>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
>>>> [hidden email]>
>>>> wrote:
>>>>
>>>>> Ad. 1
>>>>> I wouldn't say it is hacky.
>>>>> Moreover, how do you want ensure that the dB always exists when a
>>>> temporary
>>>>> object is used?( in this particular case function). Do you want to
>>> query
>>>>> for the database existence whenever e.g a temporary function is
>> used? I
>>>>> think important aspect here is that the database can be dropped from
>>>>> external system, not just flink or a different flink session.
>>>>>
>>>>> E.g in case of hive, you cannot create a temporary table in a
>> database
>>>> that
>>>>> does not exist, that's true. But if you create a temporary table in a
>>>>> database and drop that database from a different session, you can
>> still
>>>>> query the previously created temporary table from the original
>> session.
>>>> It
>>>>> does not sound like a consistent behaviour to me. Why don't we make
>>> this
>>>>> behaviour of not binding a temporary objects to the lifetime of a
>>>> database
>>>>> explicit as part of the temporary objects contract? In the end they
>>> exist
>>>>> in different layers. Permanent objects & databases in a catalog (in
>>> case
>>>> of
>>>>> hive megastore) whereas temporary objects in flink sessions. That's
>>> also
>>>>> true for the original hive client. The temporary objects live in the
>>> hive
>>>>> client whereas databases are created in the metastore.
>>>>>
>>>>> Ad.2
>>>>> I'm open for suggestions here. The one thing I wanted to achieve here
>>> is
>>>> so
>>>>> that we do not change the contract of ObjectIdentifier. One important
>>>> thing
>>>>> to remember here is that we need the function identifier to be part
>> of
>>>> the
>>>>> FunctionDefinition object and not only as the result of the function
>>>>> lookup. At some point we want to be able to store QueryOperations in
>>> the
>>>>> catalogs. They can contain function calls within which we need to
>> have
>>>> the
>>>>> identifier.
>>>>>
>>>>> I agree my initial suggestion is over complicated. How about we have
>>> just
>>>>> the FunctionIdentifier as top level class without making the
>>>>> ObjectIdentifier extend from it? I think it's pretty much the same
>> what
>>>> you
>>>>> suggested. The only difference is that it would be a top level class
>>>> with a
>>>>> more descriptive name.
>>>>>
>>>>>
>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:
>>>>>
>>>>>> Sorry, I missed some parts of the solution. The complete
>> alternative
>>> is
>>>>> the
>>>>>> following, basically having separate APIs in FunctionLookup for
>>>> ambiguous
>>>>>> and precise function lookup since planner is able to tell which API
>>> to
>>>>> call
>>>>>> with parsed queries, and have a unified result:
>>>>>>
>>>>>> ```
>>>>>> class FunctionLookup {
>>>>>>
>>>>>> Optional<Result> lookupAmbiguousFunction(String name);
>>>>>>
>>>>>>
>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
>>>>>>
>>>>>>
>>>>>> class Result {
>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
>>>>>>      String getName() { ... }
>>>>>>      // ...
>>>>>> }
>>>>>>
>>>>>> }
>>>>>> ```
>>>>>>
>>>>>> Thanks,
>>>>>> Bowen
>>>>>>
>>>>>>
>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]>
>>> wrote:
>>>>>>> Hi Dawid,
>>>>>>>
>>>>>>> Re 1): I agree making it easy for users to run experiments is
>>>>> important.
>>>>>>> However, I'm not sure allowing users to register temp functions
>> in
>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky,
>>> and
>>>>>> breaks
>>>>>>> the current contract between Flink and users that catalog/db must
>>> be
>>>>>> valid
>>>>>>> in order to operate on.
>>>>>>>
>>>>>>> How about we instead focus on making it convenient to create
>>>> catalogs?
>>>>>>> Users actually can already do it with ease via program or SQL CLI
>>>> yaml
>>>>>> file
>>>>>>> for an in-memory catalog which has neither extra dependency nor
>>>>> external
>>>>>>> connections. What we can further improve is DDL for catalogs,
>> and I
>>>>>> raised
>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven
>> by
>>>>> Terry
>>>>>>> now.
>>>>>>>
>>>>>>> In that case, if users would like to experiment via SQL, they can
>>>>> easily
>>>>>>> create an in memory catalog/database using DDL, then play with
>> temp
>>>>>>> functions.
>>>>>>>
>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has
>> not
>>>>> been
>>>>>>> resolved when stack call reaches
>> FunctionCatalog#lookupFunction(),
>>>> but
>>>>>> only
>>>>>>> been parsed?
>>>>>>>
>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok with
>>> the
>>>>>>> suggested classes, though making ObjectIdentifier a subclass of
>>>>>>> FunctionIdentifier seem a bit counter intuitive.
>>>>>>>
>>>>>>> Another potentially simpler way is:
>>>>>>>
>>>>>>> ```
>>>>>>> // in class FunctionLookup
>>>>>>> class Result {
>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
>>>>>>>      String getName() { ... }
>>>>>>>      ...
>>>>>>> }
>>>>>>> ```
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
>>>>>>> [hidden email]> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>> I really like the flip and think it clarifies important aspects
>> of
>>>> the
>>>>>>>> system.
>>>>>>>>
>>>>>>>> I have two, I hope small suggestions, which will not take much
>>> time
>>>> to
>>>>>>>> agree on.
>>>>>>>>
>>>>>>>> 1. Could we follow the MySQL approach in regards to the
>> existence
>>> of
>>>>>>>> cat/db
>>>>>>>> for temporary functions? That means not to check it, so e.g.
>> it's
>>>>>> possible
>>>>>>>> to create a temporary function in a database that does not
>> exist.
>>> I
>>>>>> think
>>>>>>>> it's really useful e.g in cases when user wants to perform
>>>> experiments
>>>>>> but
>>>>>>>> does not have access to the db yet or temporarily does not have
>>>>>> connection
>>>>>>>> to a catalog.
>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not loosen
>>> the
>>>>>>>> requirements for all catalog objects such as tables, views,
>> types
>>>> just
>>>>>> for
>>>>>>>> the functions? It's really important later on from e.g the
>>>>>> serializability
>>>>>>>> perspective. The important aspect of the ObjectIdentifier is
>> that
>>> we
>>>>>> know
>>>>>>>> that it has been resolved. The suggested changes break that
>>>>> assumption.
>>>>>>>> What do you think about adding an interface FunctionIdentifier {
>>>>>>>>
>>>>>>>> String getName();
>>>>>>>>
>>>>>>>> /**
>>>>>>>>    Return 3-part identifier. Empty in case of a built-in
>> function.
>>>>>>>> */
>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
>>>>>>>> }
>>>>>>>>
>>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
>>>>>>>>   return Optional.of(this);
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier
>> {...}
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
>>>>>>>>
>>>>>>>>> +1. LGTM
>>>>>>>>>
>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
>> [hidden email]>
>>>>>> wrote:
>>>>>>>>>> +1
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Terry Wang
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
>>>>>>>>>>>
>>>>>>>>>>> +1
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Kurt
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
>>> [hidden email]
>>>>>>>> wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which
>>>> we've
>>>>>>>> reached
>>>>>>>>>>>> consensus in [2].
>>>>>>>>>>>>
>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
>>> UTC,
>>>>> Sep
>>>>>>>> 26.
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Bowen
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>>
>>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>>>>>>>>>>>> [2]
>>>>>>>>>>>>
>>>>>>>>>>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Xuefu Zhang
>>>>>>>>>
>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>
>>>
>>> --
>>> Best regards!
>>> Rui Li
>>>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Hi,

I think above are some valid points, and we can adopt the suggestions.

To elaborate a bit on the new SQL syntax, it would imply that, unlike "SHOW
FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
FUNCTIONS" would return functions' fully qualified names with catalog and
db names.



On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> wrote:

> Hi all,
>
> I support Fabian's arguments. In my opinion, temporary objects should
> just be an additional layer on top of the regular catalog/database
> lookup logic. Thus, a temporary table or function has always highest
> precedence and should be stable within the local session. Otherwise it
> could magically disappear while someone else is performing modifications
> in the catalog.
>
> Furthermore, this feature is very useful for prototyping as users can
> simply express that a catalog/database is present even through they
> might not have access to it currently.
>
> Regards,
> Timo
>
>
> On 30.09.19 14:57, Fabian Hueske wrote:
> > Hi all,
> >
> > Sorry for the late reply.
> >
> > I think it might lead to confusing situations if temporary functions (or
> > any temporary db objects for that matter) are bound to the life cycle of
> an
> > (external) db/catalog.
> > Imaging a situation where you create a temp function in a db in an
> external
> > catalog and use it but at some point it does not work anymore because
> some
> > other dropped the database from the external catalog.
> > Shouldn't temporary objects be only controlled by the owner of a session?
> >
> > I agree that creating temp objects in non-existing db/catalogs sounds a
> bit
> > strange, but IMO the opposite (the db/catalog must exist for a temp
> > function to be created/exist) can have significant implications like the
> > one I described.
> > I think it would be quite easy for users to understand that temporary
> > objects are solely owned by them (and their session).
> > The problem of listing temporary objects could be solved by adding a ALL
> > [TEMPORARY] clause:
> >
> > SHOW ALL FUNCTIONS; could show all functions regardless of the
> > catalog/database including temporary functions.
> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
> regardless
> > of the catalog/database.
> >
> > Best,
> > Fabian
> >
> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
> [hidden email]>:
> >
> >> @Dawid, do you have any other concerns? If not, I hope we can close the
> >> voting.
> >>
> >>
> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote:
> >>
> >>> I'm not sure how much benefit #1 can bring us. If users just want to
> try
> >>> out temporary functions, they can create temporary system functions
> which
> >>> don't require a catalog/DB. IIUC, the main reason why we allow
> temporary
> >>> catalog function is to let users override permanent catalog functions.
> >>> Therefore a temporary function in a non-existing catalog won't serve
> that
> >>> purpose. Besides, each session is provided with a default catalog and
> DB.
> >>> So even if users simply want to create some catalog functions they can
> >>> forget about after the session, wouldn't the default catalog/DB be
> enough
> >>> for such experiments?
> >>>
> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote:
> >>>
> >>>> Re 1) As described in the FLIP, a temp function lookup will first make
> >>> sure
> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to
> >>> remove
> >>>> that temp function.
> >>>>
> >>>> I agree Hive doesn't handle it consistently, and we are not copying
> >> Hive.
> >>>> IMHO, allowing registering temp functions in nonexistent catalog/db is
> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list
> system
> >>>> functions and functions in the current catalog/db, since users cannot
> >>>> designate a nonexistent catalog/db as current ones, how can they list
> >>>> functions in nonexistent catalog/db? They may end up never knowing
> what
> >>>> temp functions they've created unless trying out with queries or we
> >>>> introducing some more nonstandard SQL statements. The same applies to
> >>> other
> >>>> temp objects like temp tables.
> >>>>
> >>>> Re 2) A standalone FunctionIdentifier sounds good to me
> >>>>
> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> >>>> [hidden email]>
> >>>> wrote:
> >>>>
> >>>>> Ad. 1
> >>>>> I wouldn't say it is hacky.
> >>>>> Moreover, how do you want ensure that the dB always exists when a
> >>>> temporary
> >>>>> object is used?( in this particular case function). Do you want to
> >>> query
> >>>>> for the database existence whenever e.g a temporary function is
> >> used? I
> >>>>> think important aspect here is that the database can be dropped from
> >>>>> external system, not just flink or a different flink session.
> >>>>>
> >>>>> E.g in case of hive, you cannot create a temporary table in a
> >> database
> >>>> that
> >>>>> does not exist, that's true. But if you create a temporary table in a
> >>>>> database and drop that database from a different session, you can
> >> still
> >>>>> query the previously created temporary table from the original
> >> session.
> >>>> It
> >>>>> does not sound like a consistent behaviour to me. Why don't we make
> >>> this
> >>>>> behaviour of not binding a temporary objects to the lifetime of a
> >>>> database
> >>>>> explicit as part of the temporary objects contract? In the end they
> >>> exist
> >>>>> in different layers. Permanent objects & databases in a catalog (in
> >>> case
> >>>> of
> >>>>> hive megastore) whereas temporary objects in flink sessions. That's
> >>> also
> >>>>> true for the original hive client. The temporary objects live in the
> >>> hive
> >>>>> client whereas databases are created in the metastore.
> >>>>>
> >>>>> Ad.2
> >>>>> I'm open for suggestions here. The one thing I wanted to achieve here
> >>> is
> >>>> so
> >>>>> that we do not change the contract of ObjectIdentifier. One important
> >>>> thing
> >>>>> to remember here is that we need the function identifier to be part
> >> of
> >>>> the
> >>>>> FunctionDefinition object and not only as the result of the function
> >>>>> lookup. At some point we want to be able to store QueryOperations in
> >>> the
> >>>>> catalogs. They can contain function calls within which we need to
> >> have
> >>>> the
> >>>>> identifier.
> >>>>>
> >>>>> I agree my initial suggestion is over complicated. How about we have
> >>> just
> >>>>> the FunctionIdentifier as top level class without making the
> >>>>> ObjectIdentifier extend from it? I think it's pretty much the same
> >> what
> >>>> you
> >>>>> suggested. The only difference is that it would be a top level class
> >>>> with a
> >>>>> more descriptive name.
> >>>>>
> >>>>>
> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:
> >>>>>
> >>>>>> Sorry, I missed some parts of the solution. The complete
> >> alternative
> >>> is
> >>>>> the
> >>>>>> following, basically having separate APIs in FunctionLookup for
> >>>> ambiguous
> >>>>>> and precise function lookup since planner is able to tell which API
> >>> to
> >>>>> call
> >>>>>> with parsed queries, and have a unified result:
> >>>>>>
> >>>>>> ```
> >>>>>> class FunctionLookup {
> >>>>>>
> >>>>>> Optional<Result> lookupAmbiguousFunction(String name);
> >>>>>>
> >>>>>>
> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> >>>>>>
> >>>>>>
> >>>>>> class Result {
> >>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> >>>>>>      String getName() { ... }
> >>>>>>      // ...
> >>>>>> }
> >>>>>>
> >>>>>> }
> >>>>>> ```
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Bowen
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]>
> >>> wrote:
> >>>>>>> Hi Dawid,
> >>>>>>>
> >>>>>>> Re 1): I agree making it easy for users to run experiments is
> >>>>> important.
> >>>>>>> However, I'm not sure allowing users to register temp functions
> >> in
> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky,
> >>> and
> >>>>>> breaks
> >>>>>>> the current contract between Flink and users that catalog/db must
> >>> be
> >>>>>> valid
> >>>>>>> in order to operate on.
> >>>>>>>
> >>>>>>> How about we instead focus on making it convenient to create
> >>>> catalogs?
> >>>>>>> Users actually can already do it with ease via program or SQL CLI
> >>>> yaml
> >>>>>> file
> >>>>>>> for an in-memory catalog which has neither extra dependency nor
> >>>>> external
> >>>>>>> connections. What we can further improve is DDL for catalogs,
> >> and I
> >>>>>> raised
> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven
> >> by
> >>>>> Terry
> >>>>>>> now.
> >>>>>>>
> >>>>>>> In that case, if users would like to experiment via SQL, they can
> >>>>> easily
> >>>>>>> create an in memory catalog/database using DDL, then play with
> >> temp
> >>>>>>> functions.
> >>>>>>>
> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has
> >> not
> >>>>> been
> >>>>>>> resolved when stack call reaches
> >> FunctionCatalog#lookupFunction(),
> >>>> but
> >>>>>> only
> >>>>>>> been parsed?
> >>>>>>>
> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok with
> >>> the
> >>>>>>> suggested classes, though making ObjectIdentifier a subclass of
> >>>>>>> FunctionIdentifier seem a bit counter intuitive.
> >>>>>>>
> >>>>>>> Another potentially simpler way is:
> >>>>>>>
> >>>>>>> ```
> >>>>>>> // in class FunctionLookup
> >>>>>>> class Result {
> >>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> >>>>>>>      String getName() { ... }
> >>>>>>>      ...
> >>>>>>> }
> >>>>>>> ```
> >>>>>>>
> >>>>>>> WDYT?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> >>>>>>> [hidden email]> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>> I really like the flip and think it clarifies important aspects
> >> of
> >>>> the
> >>>>>>>> system.
> >>>>>>>>
> >>>>>>>> I have two, I hope small suggestions, which will not take much
> >>> time
> >>>> to
> >>>>>>>> agree on.
> >>>>>>>>
> >>>>>>>> 1. Could we follow the MySQL approach in regards to the
> >> existence
> >>> of
> >>>>>>>> cat/db
> >>>>>>>> for temporary functions? That means not to check it, so e.g.
> >> it's
> >>>>>> possible
> >>>>>>>> to create a temporary function in a database that does not
> >> exist.
> >>> I
> >>>>>> think
> >>>>>>>> it's really useful e.g in cases when user wants to perform
> >>>> experiments
> >>>>>> but
> >>>>>>>> does not have access to the db yet or temporarily does not have
> >>>>>> connection
> >>>>>>>> to a catalog.
> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not loosen
> >>> the
> >>>>>>>> requirements for all catalog objects such as tables, views,
> >> types
> >>>> just
> >>>>>> for
> >>>>>>>> the functions? It's really important later on from e.g the
> >>>>>> serializability
> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is
> >> that
> >>> we
> >>>>>> know
> >>>>>>>> that it has been resolved. The suggested changes break that
> >>>>> assumption.
> >>>>>>>> What do you think about adding an interface FunctionIdentifier {
> >>>>>>>>
> >>>>>>>> String getName();
> >>>>>>>>
> >>>>>>>> /**
> >>>>>>>>    Return 3-part identifier. Empty in case of a built-in
> >> function.
> >>>>>>>> */
> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
> >>>>>>>>   return Optional.of(this);
> >>>>>>>> }
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier
> >> {...}
> >>>>>>>> WDYT?
> >>>>>>>>
> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
> >>>>>>>>
> >>>>>>>>> +1. LGTM
> >>>>>>>>>
> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
> >> [hidden email]>
> >>>>>> wrote:
> >>>>>>>>>> +1
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Terry Wang
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> >>>>>>>>>>>
> >>>>>>>>>>> +1
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Kurt
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> >>> [hidden email]
> >>>>>>>> wrote:
> >>>>>>>>>>>> Hi all,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which
> >>>> we've
> >>>>>>>> reached
> >>>>>>>>>>>> consensus in [2].
> >>>>>>>>>>>>
> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
> >>> UTC,
> >>>>> Sep
> >>>>>>>> 26.
> >>>>>>>>>>>> Thanks,
> >>>>>>>>>>>> Bowen
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> >>>>>>>>>>>> [2]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> >>>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Xuefu Zhang
> >>>>>>>>>
> >>>>>>>>> "In Honey We Trust!"
> >>>>>>>>>
> >>>
> >>> --
> >>> Best regards!
> >>> Rui Li
> >>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Hi all,

I've updated the FLIP wiki with the following changes:

- Lifespan of temp functions are not tied to those of catalogs and
databases. Users can create temp functions even though catalogs/dbs in
their fully qualified names don't even exist.
- some new SQL commands
    - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in
functions, and names of temp and catalog functions in the current catalog
and db
    - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built
functions, and fully qualified names of temp and catalog functions in all
catalogs and dbs
    - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp
functions in all catalog and db
    - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system
functions

Let me know if you have any questions.

Seems we have resolved all concerns. If there's no more ones, I'd like to
close the vote by this time tomorrow.

Cheers,
Bowen

On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote:

> Hi,
>
> I think above are some valid points, and we can adopt the suggestions.
>
> To elaborate a bit on the new SQL syntax, it would imply that, unlike
> "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
> FUNCTIONS" would return functions' fully qualified names with catalog and
> db names.
>
>
>
> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> wrote:
>
>> Hi all,
>>
>> I support Fabian's arguments. In my opinion, temporary objects should
>> just be an additional layer on top of the regular catalog/database
>> lookup logic. Thus, a temporary table or function has always highest
>> precedence and should be stable within the local session. Otherwise it
>> could magically disappear while someone else is performing modifications
>> in the catalog.
>>
>> Furthermore, this feature is very useful for prototyping as users can
>> simply express that a catalog/database is present even through they
>> might not have access to it currently.
>>
>> Regards,
>> Timo
>>
>>
>> On 30.09.19 14:57, Fabian Hueske wrote:
>> > Hi all,
>> >
>> > Sorry for the late reply.
>> >
>> > I think it might lead to confusing situations if temporary functions (or
>> > any temporary db objects for that matter) are bound to the life cycle
>> of an
>> > (external) db/catalog.
>> > Imaging a situation where you create a temp function in a db in an
>> external
>> > catalog and use it but at some point it does not work anymore because
>> some
>> > other dropped the database from the external catalog.
>> > Shouldn't temporary objects be only controlled by the owner of a
>> session?
>> >
>> > I agree that creating temp objects in non-existing db/catalogs sounds a
>> bit
>> > strange, but IMO the opposite (the db/catalog must exist for a temp
>> > function to be created/exist) can have significant implications like the
>> > one I described.
>> > I think it would be quite easy for users to understand that temporary
>> > objects are solely owned by them (and their session).
>> > The problem of listing temporary objects could be solved by adding a ALL
>> > [TEMPORARY] clause:
>> >
>> > SHOW ALL FUNCTIONS; could show all functions regardless of the
>> > catalog/database including temporary functions.
>> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
>> regardless
>> > of the catalog/database.
>> >
>> > Best,
>> > Fabian
>> >
>> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
>> [hidden email]>:
>> >
>> >> @Dawid, do you have any other concerns? If not, I hope we can close the
>> >> voting.
>> >>
>> >>
>> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]> wrote:
>> >>
>> >>> I'm not sure how much benefit #1 can bring us. If users just want to
>> try
>> >>> out temporary functions, they can create temporary system functions
>> which
>> >>> don't require a catalog/DB. IIUC, the main reason why we allow
>> temporary
>> >>> catalog function is to let users override permanent catalog functions.
>> >>> Therefore a temporary function in a non-existing catalog won't serve
>> that
>> >>> purpose. Besides, each session is provided with a default catalog and
>> DB.
>> >>> So even if users simply want to create some catalog functions they can
>> >>> forget about after the session, wouldn't the default catalog/DB be
>> enough
>> >>> for such experiments?
>> >>>
>> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]> wrote:
>> >>>
>> >>>> Re 1) As described in the FLIP, a temp function lookup will first
>> make
>> >>> sure
>> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to
>> >>> remove
>> >>>> that temp function.
>> >>>>
>> >>>> I agree Hive doesn't handle it consistently, and we are not copying
>> >> Hive.
>> >>>> IMHO, allowing registering temp functions in nonexistent catalog/db
>> is
>> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list
>> system
>> >>>> functions and functions in the current catalog/db, since users cannot
>> >>>> designate a nonexistent catalog/db as current ones, how can they list
>> >>>> functions in nonexistent catalog/db? They may end up never knowing
>> what
>> >>>> temp functions they've created unless trying out with queries or we
>> >>>> introducing some more nonstandard SQL statements. The same applies to
>> >>> other
>> >>>> temp objects like temp tables.
>> >>>>
>> >>>> Re 2) A standalone FunctionIdentifier sounds good to me
>> >>>>
>> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
>> >>>> [hidden email]>
>> >>>> wrote:
>> >>>>
>> >>>>> Ad. 1
>> >>>>> I wouldn't say it is hacky.
>> >>>>> Moreover, how do you want ensure that the dB always exists when a
>> >>>> temporary
>> >>>>> object is used?( in this particular case function). Do you want to
>> >>> query
>> >>>>> for the database existence whenever e.g a temporary function is
>> >> used? I
>> >>>>> think important aspect here is that the database can be dropped from
>> >>>>> external system, not just flink or a different flink session.
>> >>>>>
>> >>>>> E.g in case of hive, you cannot create a temporary table in a
>> >> database
>> >>>> that
>> >>>>> does not exist, that's true. But if you create a temporary table in
>> a
>> >>>>> database and drop that database from a different session, you can
>> >> still
>> >>>>> query the previously created temporary table from the original
>> >> session.
>> >>>> It
>> >>>>> does not sound like a consistent behaviour to me. Why don't we make
>> >>> this
>> >>>>> behaviour of not binding a temporary objects to the lifetime of a
>> >>>> database
>> >>>>> explicit as part of the temporary objects contract? In the end they
>> >>> exist
>> >>>>> in different layers. Permanent objects & databases in a catalog (in
>> >>> case
>> >>>> of
>> >>>>> hive megastore) whereas temporary objects in flink sessions. That's
>> >>> also
>> >>>>> true for the original hive client. The temporary objects live in the
>> >>> hive
>> >>>>> client whereas databases are created in the metastore.
>> >>>>>
>> >>>>> Ad.2
>> >>>>> I'm open for suggestions here. The one thing I wanted to achieve
>> here
>> >>> is
>> >>>> so
>> >>>>> that we do not change the contract of ObjectIdentifier. One
>> important
>> >>>> thing
>> >>>>> to remember here is that we need the function identifier to be part
>> >> of
>> >>>> the
>> >>>>> FunctionDefinition object and not only as the result of the function
>> >>>>> lookup. At some point we want to be able to store QueryOperations in
>> >>> the
>> >>>>> catalogs. They can contain function calls within which we need to
>> >> have
>> >>>> the
>> >>>>> identifier.
>> >>>>>
>> >>>>> I agree my initial suggestion is over complicated. How about we have
>> >>> just
>> >>>>> the FunctionIdentifier as top level class without making the
>> >>>>> ObjectIdentifier extend from it? I think it's pretty much the same
>> >> what
>> >>>> you
>> >>>>> suggested. The only difference is that it would be a top level class
>> >>>> with a
>> >>>>> more descriptive name.
>> >>>>>
>> >>>>>
>> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:
>> >>>>>
>> >>>>>> Sorry, I missed some parts of the solution. The complete
>> >> alternative
>> >>> is
>> >>>>> the
>> >>>>>> following, basically having separate APIs in FunctionLookup for
>> >>>> ambiguous
>> >>>>>> and precise function lookup since planner is able to tell which API
>> >>> to
>> >>>>> call
>> >>>>>> with parsed queries, and have a unified result:
>> >>>>>>
>> >>>>>> ```
>> >>>>>> class FunctionLookup {
>> >>>>>>
>> >>>>>> Optional<Result> lookupAmbiguousFunction(String name);
>> >>>>>>
>> >>>>>>
>> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
>> >>>>>>
>> >>>>>>
>> >>>>>> class Result {
>> >>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
>> >>>>>>      String getName() { ... }
>> >>>>>>      // ...
>> >>>>>> }
>> >>>>>>
>> >>>>>> }
>> >>>>>> ```
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Bowen
>> >>>>>>
>> >>>>>>
>> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]>
>> >>> wrote:
>> >>>>>>> Hi Dawid,
>> >>>>>>>
>> >>>>>>> Re 1): I agree making it easy for users to run experiments is
>> >>>>> important.
>> >>>>>>> However, I'm not sure allowing users to register temp functions
>> >> in
>> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky,
>> >>> and
>> >>>>>> breaks
>> >>>>>>> the current contract between Flink and users that catalog/db must
>> >>> be
>> >>>>>> valid
>> >>>>>>> in order to operate on.
>> >>>>>>>
>> >>>>>>> How about we instead focus on making it convenient to create
>> >>>> catalogs?
>> >>>>>>> Users actually can already do it with ease via program or SQL CLI
>> >>>> yaml
>> >>>>>> file
>> >>>>>>> for an in-memory catalog which has neither extra dependency nor
>> >>>>> external
>> >>>>>>> connections. What we can further improve is DDL for catalogs,
>> >> and I
>> >>>>>> raised
>> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven
>> >> by
>> >>>>> Terry
>> >>>>>>> now.
>> >>>>>>>
>> >>>>>>> In that case, if users would like to experiment via SQL, they can
>> >>>>> easily
>> >>>>>>> create an in memory catalog/database using DDL, then play with
>> >> temp
>> >>>>>>> functions.
>> >>>>>>>
>> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has
>> >> not
>> >>>>> been
>> >>>>>>> resolved when stack call reaches
>> >> FunctionCatalog#lookupFunction(),
>> >>>> but
>> >>>>>> only
>> >>>>>>> been parsed?
>> >>>>>>>
>> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok with
>> >>> the
>> >>>>>>> suggested classes, though making ObjectIdentifier a subclass of
>> >>>>>>> FunctionIdentifier seem a bit counter intuitive.
>> >>>>>>>
>> >>>>>>> Another potentially simpler way is:
>> >>>>>>>
>> >>>>>>> ```
>> >>>>>>> // in class FunctionLookup
>> >>>>>>> class Result {
>> >>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
>> >>>>>>>      String getName() { ... }
>> >>>>>>>      ...
>> >>>>>>> }
>> >>>>>>> ```
>> >>>>>>>
>> >>>>>>> WDYT?
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
>> >>>>>>> [hidden email]> wrote:
>> >>>>>>>
>> >>>>>>>> Hi,
>> >>>>>>>> I really like the flip and think it clarifies important aspects
>> >> of
>> >>>> the
>> >>>>>>>> system.
>> >>>>>>>>
>> >>>>>>>> I have two, I hope small suggestions, which will not take much
>> >>> time
>> >>>> to
>> >>>>>>>> agree on.
>> >>>>>>>>
>> >>>>>>>> 1. Could we follow the MySQL approach in regards to the
>> >> existence
>> >>> of
>> >>>>>>>> cat/db
>> >>>>>>>> for temporary functions? That means not to check it, so e.g.
>> >> it's
>> >>>>>> possible
>> >>>>>>>> to create a temporary function in a database that does not
>> >> exist.
>> >>> I
>> >>>>>> think
>> >>>>>>>> it's really useful e.g in cases when user wants to perform
>> >>>> experiments
>> >>>>>> but
>> >>>>>>>> does not have access to the db yet or temporarily does not have
>> >>>>>> connection
>> >>>>>>>> to a catalog.
>> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not loosen
>> >>> the
>> >>>>>>>> requirements for all catalog objects such as tables, views,
>> >> types
>> >>>> just
>> >>>>>> for
>> >>>>>>>> the functions? It's really important later on from e.g the
>> >>>>>> serializability
>> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is
>> >> that
>> >>> we
>> >>>>>> know
>> >>>>>>>> that it has been resolved. The suggested changes break that
>> >>>>> assumption.
>> >>>>>>>> What do you think about adding an interface FunctionIdentifier {
>> >>>>>>>>
>> >>>>>>>> String getName();
>> >>>>>>>>
>> >>>>>>>> /**
>> >>>>>>>>    Return 3-part identifier. Empty in case of a built-in
>> >> function.
>> >>>>>>>> */
>> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
>> >>>>>>>> }
>> >>>>>>>>
>> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
>> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
>> >>>>>>>>   return Optional.of(this);
>> >>>>>>>> }
>> >>>>>>>> }
>> >>>>>>>>
>> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier
>> >> {...}
>> >>>>>>>> WDYT?
>> >>>>>>>>
>> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
>> >>>>>>>>
>> >>>>>>>>> +1. LGTM
>> >>>>>>>>>
>> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
>> >> [hidden email]>
>> >>>>>> wrote:
>> >>>>>>>>>> +1
>> >>>>>>>>>>
>> >>>>>>>>>> Best,
>> >>>>>>>>>> Terry Wang
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
>> >>>>>>>>>>>
>> >>>>>>>>>>> +1
>> >>>>>>>>>>>
>> >>>>>>>>>>> Best,
>> >>>>>>>>>>> Kurt
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
>> >>> [hidden email]
>> >>>>>>>> wrote:
>> >>>>>>>>>>>> Hi all,
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which
>> >>>> we've
>> >>>>>>>> reached
>> >>>>>>>>>>>> consensus in [2].
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
>> >>> UTC,
>> >>>>> Sep
>> >>>>>>>> 26.
>> >>>>>>>>>>>> Thanks,
>> >>>>>>>>>>>> Bowen
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [1]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>> >>>>>>>>>>>> [2]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>> >>>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Xuefu Zhang
>> >>>>>>>>>
>> >>>>>>>>> "In Honey We Trust!"
>> >>>>>>>>>
>> >>>
>> >>> --
>> >>> Best regards!
>> >>> Rui Li
>> >>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

dwysakowicz

Hi,

I think we're really close to a very good design for the functions. Thank you to all involved in the discussion and especially Bowen for driving the FLIP.

The last part I'm missing in the FLIP is a bit more elaborate description of the FunctionIdentifier. I gave it a bit more thought and discussed it with Timo and I think the purpose of the FunctionIdentifier is to unambigiously reference a function. That said it should be effectively either identifier of a built-in function or a catalog function. In order to do that I would suggest a slight change to the class:

class FunctionIdentifier {
  public FunctionIdentifier ofBuiltinFunction(String name) {
        return new FunctionIdentifier(name, null);
  }
  public FunctionIdentifier ofCatalogFunction(ObjectIdentifier identifier){
        return new FunctionIdentifier(null, identifier);
  }
  public Optional<String> getBuiltinFunctionIdentifier() {
       return Optional.ofNullable(name);
  }
  public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() {
       return Optional.ofNullable(identifier);

  }
  private FunctionIdentifier(String name, ObjectIdentifier identifier) {
     this.name = name;
     this.identifier = identifier
  }
}

Moreover as we are changing the way functions are referenced we need to update classes like UnresolvedCallExpression and CallExpression. As those are PublicEvolving classes we should cover those changes in the FLIP. Lastly, as those classes need a serializable representation, we need a serializable representation for FunctionIdentifier as well. Therefore we need to add a method to the FunctionIdentifier class:

    public String asSerializableString() {...}

similar to the method in the ObjectIdentifier.

Let me know what do you think.

Best,

Dawid

On 30/09/2019 23:23, Bowen Li wrote:
Hi all,

I've updated the FLIP wiki with the following changes:

- Lifespan of temp functions are not tied to those of catalogs and
databases. Users can create temp functions even though catalogs/dbs in
their fully qualified names don't even exist.
- some new SQL commands
    - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in
functions, and names of temp and catalog functions in the current catalog
and db
    - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built
functions, and fully qualified names of temp and catalog functions in all
catalogs and dbs
    - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp
functions in all catalog and db
    - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system
functions

Let me know if you have any questions.

Seems we have resolved all concerns. If there's no more ones, I'd like to
close the vote by this time tomorrow.

Cheers,
Bowen

On Mon, Sep 30, 2019 at 11:59 AM Bowen Li [hidden email] wrote:

Hi,

I think above are some valid points, and we can adopt the suggestions.

To elaborate a bit on the new SQL syntax, it would imply that, unlike
"SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
FUNCTIONS" would return functions' fully qualified names with catalog and
db names.



On Mon, Sep 30, 2019 at 6:38 AM Timo Walther [hidden email] wrote:

Hi all,

I support Fabian's arguments. In my opinion, temporary objects should
just be an additional layer on top of the regular catalog/database
lookup logic. Thus, a temporary table or function has always highest
precedence and should be stable within the local session. Otherwise it
could magically disappear while someone else is performing modifications
in the catalog.

Furthermore, this feature is very useful for prototyping as users can
simply express that a catalog/database is present even through they
might not have access to it currently.

Regards,
Timo


On 30.09.19 14:57, Fabian Hueske wrote:
Hi all,

Sorry for the late reply.

I think it might lead to confusing situations if temporary functions (or
any temporary db objects for that matter) are bound to the life cycle
of an
(external) db/catalog.
Imaging a situation where you create a temp function in a db in an
external
catalog and use it but at some point it does not work anymore because
some
other dropped the database from the external catalog.
Shouldn't temporary objects be only controlled by the owner of a
session?
I agree that creating temp objects in non-existing db/catalogs sounds a
bit
strange, but IMO the opposite (the db/catalog must exist for a temp
function to be created/exist) can have significant implications like the
one I described.
I think it would be quite easy for users to understand that temporary
objects are solely owned by them (and their session).
The problem of listing temporary objects could be solved by adding a ALL
[TEMPORARY] clause:

SHOW ALL FUNCTIONS; could show all functions regardless of the
catalog/database including temporary functions.
SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
regardless
of the catalog/database.

Best,
Fabian

Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
[hidden email]>:

            
@Dawid, do you have any other concerns? If not, I hope we can close the
voting.


On Thu, Sep 26, 2019 at 8:14 PM Rui Li [hidden email] wrote:

I'm not sure how much benefit #1 can bring us. If users just want to
try
out temporary functions, they can create temporary system functions
which
don't require a catalog/DB. IIUC, the main reason why we allow
temporary
catalog function is to let users override permanent catalog functions.
Therefore a temporary function in a non-existing catalog won't serve
that
purpose. Besides, each session is provided with a default catalog and
DB.
So even if users simply want to create some catalog functions they can
forget about after the session, wouldn't the default catalog/DB be
enough
for such experiments?

On Thu, Sep 26, 2019 at 4:38 AM Bowen Li [hidden email] wrote:

Re 1) As described in the FLIP, a temp function lookup will first
make
sure
the db exists. If the db doesn't exist, a lazy drop is triggered to
remove
that temp function.

I agree Hive doesn't handle it consistently, and we are not copying
Hive.
IMHO, allowing registering temp functions in nonexistent catalog/db
is
hacky and problematic. For instance, "SHOW FUNCTIONS" would list
system
functions and functions in the current catalog/db, since users cannot
designate a nonexistent catalog/db as current ones, how can they list
functions in nonexistent catalog/db? They may end up never knowing
what
temp functions they've created unless trying out with queries or we
introducing some more nonstandard SQL statements. The same applies to
other
temp objects like temp tables.

Re 2) A standalone FunctionIdentifier sounds good to me

On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
[hidden email]>
wrote:

Ad. 1
I wouldn't say it is hacky.
Moreover, how do you want ensure that the dB always exists when a
temporary
object is used?( in this particular case function). Do you want to
query
for the database existence whenever e.g a temporary function is
used? I
think important aspect here is that the database can be dropped from
external system, not just flink or a different flink session.

E.g in case of hive, you cannot create a temporary table in a
database
that
does not exist, that's true. But if you create a temporary table in
a
database and drop that database from a different session, you can
still
query the previously created temporary table from the original
session.
It
does not sound like a consistent behaviour to me. Why don't we make
this
behaviour of not binding a temporary objects to the lifetime of a
database
explicit as part of the temporary objects contract? In the end they
exist
in different layers. Permanent objects & databases in a catalog (in
case
of
hive megastore) whereas temporary objects in flink sessions. That's
also
true for the original hive client. The temporary objects live in the
hive
client whereas databases are created in the metastore.

Ad.2
I'm open for suggestions here. The one thing I wanted to achieve
here
is
so
that we do not change the contract of ObjectIdentifier. One
important
thing
to remember here is that we need the function identifier to be part
of
the
FunctionDefinition object and not only as the result of the function
lookup. At some point we want to be able to store QueryOperations in
the
catalogs. They can contain function calls within which we need to
have
the
identifier.

I agree my initial suggestion is over complicated. How about we have
just
the FunctionIdentifier as top level class without making the
ObjectIdentifier extend from it? I think it's pretty much the same
what
you
suggested. The only difference is that it would be a top level class
with a
more descriptive name.


On Wed, 25 Sep 2019, 13:57 Bowen Li, [hidden email] wrote:

Sorry, I missed some parts of the solution. The complete
alternative
is
the
following, basically having separate APIs in FunctionLookup for
ambiguous
and precise function lookup since planner is able to tell which API
to
call
with parsed queries, and have a unified result:

```
class FunctionLookup {

Optional<Result> lookupAmbiguousFunction(String name);


Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);


class Result {
     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
     String getName() { ... }
     // ...
}

}
```

Thanks,
Bowen


On Tue, Sep 24, 2019 at 9:42 PM Bowen Li [hidden email]
wrote:
Hi Dawid,

Re 1): I agree making it easy for users to run experiments is
important.
However, I'm not sure allowing users to register temp functions
in
nonexistent catalog/db is the optimal way. It seems a bit hacky,
and
breaks
the current contract between Flink and users that catalog/db must
be
valid
in order to operate on.

How about we instead focus on making it convenient to create
catalogs?
Users actually can already do it with ease via program or SQL CLI
yaml
file
for an in-memory catalog which has neither extra dependency nor
external
connections. What we can further improve is DDL for catalogs,
and I
raised
it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven
by
Terry
now.

In that case, if users would like to experiment via SQL, they can
easily
create an in memory catalog/database using DDL, then play with
temp
functions.

Re 2): For the assumption, IIUIC, function ObjectIdentifier has
not
been
resolved when stack call reaches
FunctionCatalog#lookupFunction(),
but
only
been parsed?

I agree keeping ObjectIdentifier as-is would be good. I'm ok with
the
suggested classes, though making ObjectIdentifier a subclass of
FunctionIdentifier seem a bit counter intuitive.

Another potentially simpler way is:

```
// in class FunctionLookup
class Result {
     Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
     String getName() { ... }
     ...
}
```

WDYT?



On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
[hidden email]> wrote:

Hi,
I really like the flip and think it clarifies important aspects
of
the
system.

I have two, I hope small suggestions, which will not take much
time
to
agree on.

1. Could we follow the MySQL approach in regards to the
existence
of
cat/db
for temporary functions? That means not to check it, so e.g.
it's
possible
to create a temporary function in a database that does not
exist.
I
think
it's really useful e.g in cases when user wants to perform
experiments
but
does not have access to the db yet or temporarily does not have
connection
to a catalog.
2. Could we not change the ObjectIdentifier? Could we not loosen
the
requirements for all catalog objects such as tables, views,
types
just
for
the functions? It's really important later on from e.g the
serializability
perspective. The important aspect of the ObjectIdentifier is
that
we
know
that it has been resolved. The suggested changes break that
assumption.
What do you think about adding an interface FunctionIdentifier {

String getName();

/**
   Return 3-part identifier. Empty in case of a built-in
function.
*/
Optional<ObjectIdentifier> getObjectIdentifier()
}

class ObjectIdentifier implements FunctionIdentifier {
Optional<ObjectIdentifier> getObjectIdentifier() {
  return Optional.of(this);
}
}

class SystemFunctionIdentifier implements FunctionIdentifier
{...}
WDYT?

On Wed, 25 Sep 2019, 04:50 Xuefu Z, [hidden email] wrote:

+1. LGTM

On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
[hidden email]>
wrote:
+1

Best,
Terry Wang



在 2019年9月24日,上午10:42,Kurt Young [hidden email] 写道:

+1

Best,
Kurt


On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
[hidden email]
wrote:
Hi all,

I'd like to start a voting thread for FLIP-57 [1], which
we've
reached
consensus in [2].

This voting will be open for minimum 3 days till 6:30pm
UTC,
Sep
26.
Thanks,
Bowen

[1]



            
https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
[2]



            
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613

                            
--
Xuefu Zhang

"In Honey We Trust!"

--
Best regards!
Rui Li



    

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Fabian Hueske-2
In reply to this post by bowen.li
Thanks for the summary Bowen!

Looks good to me.

Cheers,
Fabian

Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <[hidden email]>:

> Hi all,
>
> I've updated the FLIP wiki with the following changes:
>
> - Lifespan of temp functions are not tied to those of catalogs and
> databases. Users can create temp functions even though catalogs/dbs in
> their fully qualified names don't even exist.
> - some new SQL commands
>     - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in
> functions, and names of temp and catalog functions in the current catalog
> and db
>     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built
> functions, and fully qualified names of temp and catalog functions in all
> catalogs and dbs
>     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp
> functions in all catalog and db
>     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp system
> functions
>
> Let me know if you have any questions.
>
> Seems we have resolved all concerns. If there's no more ones, I'd like to
> close the vote by this time tomorrow.
>
> Cheers,
> Bowen
>
> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote:
>
> > Hi,
> >
> > I think above are some valid points, and we can adopt the suggestions.
> >
> > To elaborate a bit on the new SQL syntax, it would imply that, unlike
> > "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
> > FUNCTIONS" would return functions' fully qualified names with catalog and
> > db names.
> >
> >
> >
> > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]> wrote:
> >
> >> Hi all,
> >>
> >> I support Fabian's arguments. In my opinion, temporary objects should
> >> just be an additional layer on top of the regular catalog/database
> >> lookup logic. Thus, a temporary table or function has always highest
> >> precedence and should be stable within the local session. Otherwise it
> >> could magically disappear while someone else is performing modifications
> >> in the catalog.
> >>
> >> Furthermore, this feature is very useful for prototyping as users can
> >> simply express that a catalog/database is present even through they
> >> might not have access to it currently.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 30.09.19 14:57, Fabian Hueske wrote:
> >> > Hi all,
> >> >
> >> > Sorry for the late reply.
> >> >
> >> > I think it might lead to confusing situations if temporary functions
> (or
> >> > any temporary db objects for that matter) are bound to the life cycle
> >> of an
> >> > (external) db/catalog.
> >> > Imaging a situation where you create a temp function in a db in an
> >> external
> >> > catalog and use it but at some point it does not work anymore because
> >> some
> >> > other dropped the database from the external catalog.
> >> > Shouldn't temporary objects be only controlled by the owner of a
> >> session?
> >> >
> >> > I agree that creating temp objects in non-existing db/catalogs sounds
> a
> >> bit
> >> > strange, but IMO the opposite (the db/catalog must exist for a temp
> >> > function to be created/exist) can have significant implications like
> the
> >> > one I described.
> >> > I think it would be quite easy for users to understand that temporary
> >> > objects are solely owned by them (and their session).
> >> > The problem of listing temporary objects could be solved by adding a
> ALL
> >> > [TEMPORARY] clause:
> >> >
> >> > SHOW ALL FUNCTIONS; could show all functions regardless of the
> >> > catalog/database including temporary functions.
> >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
> >> regardless
> >> > of the catalog/database.
> >> >
> >> > Best,
> >> > Fabian
> >> >
> >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
> >> [hidden email]>:
> >> >
> >> >> @Dawid, do you have any other concerns? If not, I hope we can close
> the
> >> >> voting.
> >> >>
> >> >>
> >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]>
> wrote:
> >> >>
> >> >>> I'm not sure how much benefit #1 can bring us. If users just want to
> >> try
> >> >>> out temporary functions, they can create temporary system functions
> >> which
> >> >>> don't require a catalog/DB. IIUC, the main reason why we allow
> >> temporary
> >> >>> catalog function is to let users override permanent catalog
> functions.
> >> >>> Therefore a temporary function in a non-existing catalog won't serve
> >> that
> >> >>> purpose. Besides, each session is provided with a default catalog
> and
> >> DB.
> >> >>> So even if users simply want to create some catalog functions they
> can
> >> >>> forget about after the session, wouldn't the default catalog/DB be
> >> enough
> >> >>> for such experiments?
> >> >>>
> >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]>
> wrote:
> >> >>>
> >> >>>> Re 1) As described in the FLIP, a temp function lookup will first
> >> make
> >> >>> sure
> >> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered to
> >> >>> remove
> >> >>>> that temp function.
> >> >>>>
> >> >>>> I agree Hive doesn't handle it consistently, and we are not copying
> >> >> Hive.
> >> >>>> IMHO, allowing registering temp functions in nonexistent catalog/db
> >> is
> >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list
> >> system
> >> >>>> functions and functions in the current catalog/db, since users
> cannot
> >> >>>> designate a nonexistent catalog/db as current ones, how can they
> list
> >> >>>> functions in nonexistent catalog/db? They may end up never knowing
> >> what
> >> >>>> temp functions they've created unless trying out with queries or we
> >> >>>> introducing some more nonstandard SQL statements. The same applies
> to
> >> >>> other
> >> >>>> temp objects like temp tables.
> >> >>>>
> >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me
> >> >>>>
> >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> >> >>>> [hidden email]>
> >> >>>> wrote:
> >> >>>>
> >> >>>>> Ad. 1
> >> >>>>> I wouldn't say it is hacky.
> >> >>>>> Moreover, how do you want ensure that the dB always exists when a
> >> >>>> temporary
> >> >>>>> object is used?( in this particular case function). Do you want to
> >> >>> query
> >> >>>>> for the database existence whenever e.g a temporary function is
> >> >> used? I
> >> >>>>> think important aspect here is that the database can be dropped
> from
> >> >>>>> external system, not just flink or a different flink session.
> >> >>>>>
> >> >>>>> E.g in case of hive, you cannot create a temporary table in a
> >> >> database
> >> >>>> that
> >> >>>>> does not exist, that's true. But if you create a temporary table
> in
> >> a
> >> >>>>> database and drop that database from a different session, you can
> >> >> still
> >> >>>>> query the previously created temporary table from the original
> >> >> session.
> >> >>>> It
> >> >>>>> does not sound like a consistent behaviour to me. Why don't we
> make
> >> >>> this
> >> >>>>> behaviour of not binding a temporary objects to the lifetime of a
> >> >>>> database
> >> >>>>> explicit as part of the temporary objects contract? In the end
> they
> >> >>> exist
> >> >>>>> in different layers. Permanent objects & databases in a catalog
> (in
> >> >>> case
> >> >>>> of
> >> >>>>> hive megastore) whereas temporary objects in flink sessions.
> That's
> >> >>> also
> >> >>>>> true for the original hive client. The temporary objects live in
> the
> >> >>> hive
> >> >>>>> client whereas databases are created in the metastore.
> >> >>>>>
> >> >>>>> Ad.2
> >> >>>>> I'm open for suggestions here. The one thing I wanted to achieve
> >> here
> >> >>> is
> >> >>>> so
> >> >>>>> that we do not change the contract of ObjectIdentifier. One
> >> important
> >> >>>> thing
> >> >>>>> to remember here is that we need the function identifier to be
> part
> >> >> of
> >> >>>> the
> >> >>>>> FunctionDefinition object and not only as the result of the
> function
> >> >>>>> lookup. At some point we want to be able to store QueryOperations
> in
> >> >>> the
> >> >>>>> catalogs. They can contain function calls within which we need to
> >> >> have
> >> >>>> the
> >> >>>>> identifier.
> >> >>>>>
> >> >>>>> I agree my initial suggestion is over complicated. How about we
> have
> >> >>> just
> >> >>>>> the FunctionIdentifier as top level class without making the
> >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the same
> >> >> what
> >> >>>> you
> >> >>>>> suggested. The only difference is that it would be a top level
> class
> >> >>>> with a
> >> >>>>> more descriptive name.
> >> >>>>>
> >> >>>>>
> >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]> wrote:
> >> >>>>>
> >> >>>>>> Sorry, I missed some parts of the solution. The complete
> >> >> alternative
> >> >>> is
> >> >>>>> the
> >> >>>>>> following, basically having separate APIs in FunctionLookup for
> >> >>>> ambiguous
> >> >>>>>> and precise function lookup since planner is able to tell which
> API
> >> >>> to
> >> >>>>> call
> >> >>>>>> with parsed queries, and have a unified result:
> >> >>>>>>
> >> >>>>>> ```
> >> >>>>>> class FunctionLookup {
> >> >>>>>>
> >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name);
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> class Result {
> >> >>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> >> >>>>>>      String getName() { ... }
> >> >>>>>>      // ...
> >> >>>>>> }
> >> >>>>>>
> >> >>>>>> }
> >> >>>>>> ```
> >> >>>>>>
> >> >>>>>> Thanks,
> >> >>>>>> Bowen
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]>
> >> >>> wrote:
> >> >>>>>>> Hi Dawid,
> >> >>>>>>>
> >> >>>>>>> Re 1): I agree making it easy for users to run experiments is
> >> >>>>> important.
> >> >>>>>>> However, I'm not sure allowing users to register temp functions
> >> >> in
> >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit hacky,
> >> >>> and
> >> >>>>>> breaks
> >> >>>>>>> the current contract between Flink and users that catalog/db
> must
> >> >>> be
> >> >>>>>> valid
> >> >>>>>>> in order to operate on.
> >> >>>>>>>
> >> >>>>>>> How about we instead focus on making it convenient to create
> >> >>>> catalogs?
> >> >>>>>>> Users actually can already do it with ease via program or SQL
> CLI
> >> >>>> yaml
> >> >>>>>> file
> >> >>>>>>> for an in-memory catalog which has neither extra dependency nor
> >> >>>>> external
> >> >>>>>>> connections. What we can further improve is DDL for catalogs,
> >> >> and I
> >> >>>>>> raised
> >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement] driven
> >> >> by
> >> >>>>> Terry
> >> >>>>>>> now.
> >> >>>>>>>
> >> >>>>>>> In that case, if users would like to experiment via SQL, they
> can
> >> >>>>> easily
> >> >>>>>>> create an in memory catalog/database using DDL, then play with
> >> >> temp
> >> >>>>>>> functions.
> >> >>>>>>>
> >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier has
> >> >> not
> >> >>>>> been
> >> >>>>>>> resolved when stack call reaches
> >> >> FunctionCatalog#lookupFunction(),
> >> >>>> but
> >> >>>>>> only
> >> >>>>>>> been parsed?
> >> >>>>>>>
> >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
> with
> >> >>> the
> >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass of
> >> >>>>>>> FunctionIdentifier seem a bit counter intuitive.
> >> >>>>>>>
> >> >>>>>>> Another potentially simpler way is:
> >> >>>>>>>
> >> >>>>>>> ```
> >> >>>>>>> // in class FunctionLookup
> >> >>>>>>> class Result {
> >> >>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> >> >>>>>>>      String getName() { ... }
> >> >>>>>>>      ...
> >> >>>>>>> }
> >> >>>>>>> ```
> >> >>>>>>>
> >> >>>>>>> WDYT?
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> >> >>>>>>> [hidden email]> wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi,
> >> >>>>>>>> I really like the flip and think it clarifies important aspects
> >> >> of
> >> >>>> the
> >> >>>>>>>> system.
> >> >>>>>>>>
> >> >>>>>>>> I have two, I hope small suggestions, which will not take much
> >> >>> time
> >> >>>> to
> >> >>>>>>>> agree on.
> >> >>>>>>>>
> >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the
> >> >> existence
> >> >>> of
> >> >>>>>>>> cat/db
> >> >>>>>>>> for temporary functions? That means not to check it, so e.g.
> >> >> it's
> >> >>>>>> possible
> >> >>>>>>>> to create a temporary function in a database that does not
> >> >> exist.
> >> >>> I
> >> >>>>>> think
> >> >>>>>>>> it's really useful e.g in cases when user wants to perform
> >> >>>> experiments
> >> >>>>>> but
> >> >>>>>>>> does not have access to the db yet or temporarily does not have
> >> >>>>>> connection
> >> >>>>>>>> to a catalog.
> >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
> loosen
> >> >>> the
> >> >>>>>>>> requirements for all catalog objects such as tables, views,
> >> >> types
> >> >>>> just
> >> >>>>>> for
> >> >>>>>>>> the functions? It's really important later on from e.g the
> >> >>>>>> serializability
> >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is
> >> >> that
> >> >>> we
> >> >>>>>> know
> >> >>>>>>>> that it has been resolved. The suggested changes break that
> >> >>>>> assumption.
> >> >>>>>>>> What do you think about adding an interface FunctionIdentifier
> {
> >> >>>>>>>>
> >> >>>>>>>> String getName();
> >> >>>>>>>>
> >> >>>>>>>> /**
> >> >>>>>>>>    Return 3-part identifier. Empty in case of a built-in
> >> >> function.
> >> >>>>>>>> */
> >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
> >> >>>>>>>> }
> >> >>>>>>>>
> >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
> >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
> >> >>>>>>>>   return Optional.of(this);
> >> >>>>>>>> }
> >> >>>>>>>> }
> >> >>>>>>>>
> >> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier
> >> >> {...}
> >> >>>>>>>> WDYT?
> >> >>>>>>>>
> >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> +1. LGTM
> >> >>>>>>>>>
> >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
> >> >> [hidden email]>
> >> >>>>>> wrote:
> >> >>>>>>>>>> +1
> >> >>>>>>>>>>
> >> >>>>>>>>>> Best,
> >> >>>>>>>>>> Terry Wang
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>
> >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> +1
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> Best,
> >> >>>>>>>>>>> Kurt
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> >> >>> [hidden email]
> >> >>>>>>>> wrote:
> >> >>>>>>>>>>>> Hi all,
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which
> >> >>>> we've
> >> >>>>>>>> reached
> >> >>>>>>>>>>>> consensus in [2].
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
> >> >>> UTC,
> >> >>>>> Sep
> >> >>>>>>>> 26.
> >> >>>>>>>>>>>> Thanks,
> >> >>>>>>>>>>>> Bowen
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> [1]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> >> >>>>>>>>>>>> [2]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>>
> >> >>
> >>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> >> >>>>>>>>>>
> >> >>>>>>>>> --
> >> >>>>>>>>> Xuefu Zhang
> >> >>>>>>>>>
> >> >>>>>>>>> "In Honey We Trust!"
> >> >>>>>>>>>
> >> >>>
> >> >>> --
> >> >>> Best regards!
> >> >>> Rui Li
> >> >>>
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

bowen.li
Hi Dawid,

Thanks for bringing the suggestions up. I was prototyping yesterday and
found out those places exactly as what you suggested.

For CallExpression and UnresolvedCallExpression, I've added them to
FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and mark
that as a breaking change

For FunctionIdentifier, the suggested changes LGTM. Just want to bring up
an issue on naming. It seems to me how we now name functions categories is
a bit unclear and confusing, which is reflected on the suggested APIs - in
FunctionIdentifier you lay out, "builtin function" would include builtin
functions and temporary system functions as we are kind of using "system"
and "built-in" interchangeably, and "catalog function" would include
catalog functions and temporary functions. I currently can see two
approaches to make it clearer to users.

1) Simplify FunctionIdentifier to be the following. As it's internal, we
add comments and explanation to devs on which case the APIs support.
However, I feel this approach would be somehow a bit conflicting to what
you want to achieve for the clarity of APIs

@Internal
class FunctionIdentifier {
  // for built-in function and temporary system function
    public FunctionIdentifier of(String name) {  }
  // for temporary function and catalog function
    public FunctionIdentifier of(ObjectIdentifier identifier){  }
    public Optional<String> getFunctionName() {  }
    public Optional<ObjectIdentifier> getObjectIdentifier() {  }
}

2) We can rename our function categories as following so there'll be mainly
just two categories of functions, "system functions" and "catalog
functions", either of which can have temporary ones

  - built-in functions -> officially rename to "system functions" and note
to users that "system" and "built-in" can be used interchangeably. We
prefer "system" because that's the keyword we decided to use in DDL that
creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
  - temporary system functions
  - catalog functions
  - temporary functions  -> rename to "temporary catalog function"

@Internal
class FunctionIdentifier {
  // for temporary/non-temporary system function
    public FunctionIdentifier ofSystemFunction(String name) {  }
  // for temporary/non-temporary catalog function
    public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
identifier){  }
    public Optional<String> getSystemFunctionName() {  }
    public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() {  }
}

WDYT?


On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> wrote:

> Thanks for the summary Bowen!
>
> Looks good to me.
>
> Cheers,
> Fabian
>
> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <[hidden email]
> >:
>
> > Hi all,
> >
> > I've updated the FLIP wiki with the following changes:
> >
> > - Lifespan of temp functions are not tied to those of catalogs and
> > databases. Users can create temp functions even though catalogs/dbs in
> > their fully qualified names don't even exist.
> > - some new SQL commands
> >     - "SHOW FUNCTIONS" - list names of temp and non-temp system/built-in
> > functions, and names of temp and catalog functions in the current catalog
> > and db
> >     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp system/built
> > functions, and fully qualified names of temp and catalog functions in all
> > catalogs and dbs
> >     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of temp
> > functions in all catalog and db
> >     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp
> system
> > functions
> >
> > Let me know if you have any questions.
> >
> > Seems we have resolved all concerns. If there's no more ones, I'd like to
> > close the vote by this time tomorrow.
> >
> > Cheers,
> > Bowen
> >
> > On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote:
> >
> > > Hi,
> > >
> > > I think above are some valid points, and we can adopt the suggestions.
> > >
> > > To elaborate a bit on the new SQL syntax, it would imply that, unlike
> > > "SHOW FUNCTION" which only return function names, "SHOW ALL [TEMPORARY]
> > > FUNCTIONS" would return functions' fully qualified names with catalog
> and
> > > db names.
> > >
> > >
> > >
> > > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]>
> wrote:
> > >
> > >> Hi all,
> > >>
> > >> I support Fabian's arguments. In my opinion, temporary objects should
> > >> just be an additional layer on top of the regular catalog/database
> > >> lookup logic. Thus, a temporary table or function has always highest
> > >> precedence and should be stable within the local session. Otherwise it
> > >> could magically disappear while someone else is performing
> modifications
> > >> in the catalog.
> > >>
> > >> Furthermore, this feature is very useful for prototyping as users can
> > >> simply express that a catalog/database is present even through they
> > >> might not have access to it currently.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 30.09.19 14:57, Fabian Hueske wrote:
> > >> > Hi all,
> > >> >
> > >> > Sorry for the late reply.
> > >> >
> > >> > I think it might lead to confusing situations if temporary functions
> > (or
> > >> > any temporary db objects for that matter) are bound to the life
> cycle
> > >> of an
> > >> > (external) db/catalog.
> > >> > Imaging a situation where you create a temp function in a db in an
> > >> external
> > >> > catalog and use it but at some point it does not work anymore
> because
> > >> some
> > >> > other dropped the database from the external catalog.
> > >> > Shouldn't temporary objects be only controlled by the owner of a
> > >> session?
> > >> >
> > >> > I agree that creating temp objects in non-existing db/catalogs
> sounds
> > a
> > >> bit
> > >> > strange, but IMO the opposite (the db/catalog must exist for a temp
> > >> > function to be created/exist) can have significant implications like
> > the
> > >> > one I described.
> > >> > I think it would be quite easy for users to understand that
> temporary
> > >> > objects are solely owned by them (and their session).
> > >> > The problem of listing temporary objects could be solved by adding a
> > ALL
> > >> > [TEMPORARY] clause:
> > >> >
> > >> > SHOW ALL FUNCTIONS; could show all functions regardless of the
> > >> > catalog/database including temporary functions.
> > >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
> > >> regardless
> > >> > of the catalog/database.
> > >> >
> > >> > Best,
> > >> > Fabian
> > >> >
> > >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
> > >> [hidden email]>:
> > >> >
> > >> >> @Dawid, do you have any other concerns? If not, I hope we can close
> > the
> > >> >> voting.
> > >> >>
> > >> >>
> > >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]>
> > wrote:
> > >> >>
> > >> >>> I'm not sure how much benefit #1 can bring us. If users just want
> to
> > >> try
> > >> >>> out temporary functions, they can create temporary system
> functions
> > >> which
> > >> >>> don't require a catalog/DB. IIUC, the main reason why we allow
> > >> temporary
> > >> >>> catalog function is to let users override permanent catalog
> > functions.
> > >> >>> Therefore a temporary function in a non-existing catalog won't
> serve
> > >> that
> > >> >>> purpose. Besides, each session is provided with a default catalog
> > and
> > >> DB.
> > >> >>> So even if users simply want to create some catalog functions they
> > can
> > >> >>> forget about after the session, wouldn't the default catalog/DB be
> > >> enough
> > >> >>> for such experiments?
> > >> >>>
> > >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]>
> > wrote:
> > >> >>>
> > >> >>>> Re 1) As described in the FLIP, a temp function lookup will first
> > >> make
> > >> >>> sure
> > >> >>>> the db exists. If the db doesn't exist, a lazy drop is triggered
> to
> > >> >>> remove
> > >> >>>> that temp function.
> > >> >>>>
> > >> >>>> I agree Hive doesn't handle it consistently, and we are not
> copying
> > >> >> Hive.
> > >> >>>> IMHO, allowing registering temp functions in nonexistent
> catalog/db
> > >> is
> > >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would list
> > >> system
> > >> >>>> functions and functions in the current catalog/db, since users
> > cannot
> > >> >>>> designate a nonexistent catalog/db as current ones, how can they
> > list
> > >> >>>> functions in nonexistent catalog/db? They may end up never
> knowing
> > >> what
> > >> >>>> temp functions they've created unless trying out with queries or
> we
> > >> >>>> introducing some more nonstandard SQL statements. The same
> applies
> > to
> > >> >>> other
> > >> >>>> temp objects like temp tables.
> > >> >>>>
> > >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me
> > >> >>>>
> > >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> > >> >>>> [hidden email]>
> > >> >>>> wrote:
> > >> >>>>
> > >> >>>>> Ad. 1
> > >> >>>>> I wouldn't say it is hacky.
> > >> >>>>> Moreover, how do you want ensure that the dB always exists when
> a
> > >> >>>> temporary
> > >> >>>>> object is used?( in this particular case function). Do you want
> to
> > >> >>> query
> > >> >>>>> for the database existence whenever e.g a temporary function is
> > >> >> used? I
> > >> >>>>> think important aspect here is that the database can be dropped
> > from
> > >> >>>>> external system, not just flink or a different flink session.
> > >> >>>>>
> > >> >>>>> E.g in case of hive, you cannot create a temporary table in a
> > >> >> database
> > >> >>>> that
> > >> >>>>> does not exist, that's true. But if you create a temporary table
> > in
> > >> a
> > >> >>>>> database and drop that database from a different session, you
> can
> > >> >> still
> > >> >>>>> query the previously created temporary table from the original
> > >> >> session.
> > >> >>>> It
> > >> >>>>> does not sound like a consistent behaviour to me. Why don't we
> > make
> > >> >>> this
> > >> >>>>> behaviour of not binding a temporary objects to the lifetime of
> a
> > >> >>>> database
> > >> >>>>> explicit as part of the temporary objects contract? In the end
> > they
> > >> >>> exist
> > >> >>>>> in different layers. Permanent objects & databases in a catalog
> > (in
> > >> >>> case
> > >> >>>> of
> > >> >>>>> hive megastore) whereas temporary objects in flink sessions.
> > That's
> > >> >>> also
> > >> >>>>> true for the original hive client. The temporary objects live in
> > the
> > >> >>> hive
> > >> >>>>> client whereas databases are created in the metastore.
> > >> >>>>>
> > >> >>>>> Ad.2
> > >> >>>>> I'm open for suggestions here. The one thing I wanted to achieve
> > >> here
> > >> >>> is
> > >> >>>> so
> > >> >>>>> that we do not change the contract of ObjectIdentifier. One
> > >> important
> > >> >>>> thing
> > >> >>>>> to remember here is that we need the function identifier to be
> > part
> > >> >> of
> > >> >>>> the
> > >> >>>>> FunctionDefinition object and not only as the result of the
> > function
> > >> >>>>> lookup. At some point we want to be able to store
> QueryOperations
> > in
> > >> >>> the
> > >> >>>>> catalogs. They can contain function calls within which we need
> to
> > >> >> have
> > >> >>>> the
> > >> >>>>> identifier.
> > >> >>>>>
> > >> >>>>> I agree my initial suggestion is over complicated. How about we
> > have
> > >> >>> just
> > >> >>>>> the FunctionIdentifier as top level class without making the
> > >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the
> same
> > >> >> what
> > >> >>>> you
> > >> >>>>> suggested. The only difference is that it would be a top level
> > class
> > >> >>>> with a
> > >> >>>>> more descriptive name.
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]>
> wrote:
> > >> >>>>>
> > >> >>>>>> Sorry, I missed some parts of the solution. The complete
> > >> >> alternative
> > >> >>> is
> > >> >>>>> the
> > >> >>>>>> following, basically having separate APIs in FunctionLookup for
> > >> >>>> ambiguous
> > >> >>>>>> and precise function lookup since planner is able to tell which
> > API
> > >> >>> to
> > >> >>>>> call
> > >> >>>>>> with parsed queries, and have a unified result:
> > >> >>>>>>
> > >> >>>>>> ```
> > >> >>>>>> class FunctionLookup {
> > >> >>>>>>
> > >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name);
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> class Result {
> > >> >>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > >> >>>>>>      String getName() { ... }
> > >> >>>>>>      // ...
> > >> >>>>>> }
> > >> >>>>>>
> > >> >>>>>> }
> > >> >>>>>> ```
> > >> >>>>>>
> > >> >>>>>> Thanks,
> > >> >>>>>> Bowen
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <[hidden email]>
> > >> >>> wrote:
> > >> >>>>>>> Hi Dawid,
> > >> >>>>>>>
> > >> >>>>>>> Re 1): I agree making it easy for users to run experiments is
> > >> >>>>> important.
> > >> >>>>>>> However, I'm not sure allowing users to register temp
> functions
> > >> >> in
> > >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit
> hacky,
> > >> >>> and
> > >> >>>>>> breaks
> > >> >>>>>>> the current contract between Flink and users that catalog/db
> > must
> > >> >>> be
> > >> >>>>>> valid
> > >> >>>>>>> in order to operate on.
> > >> >>>>>>>
> > >> >>>>>>> How about we instead focus on making it convenient to create
> > >> >>>> catalogs?
> > >> >>>>>>> Users actually can already do it with ease via program or SQL
> > CLI
> > >> >>>> yaml
> > >> >>>>>> file
> > >> >>>>>>> for an in-memory catalog which has neither extra dependency
> nor
> > >> >>>>> external
> > >> >>>>>>> connections. What we can further improve is DDL for catalogs,
> > >> >> and I
> > >> >>>>>> raised
> > >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement]
> driven
> > >> >> by
> > >> >>>>> Terry
> > >> >>>>>>> now.
> > >> >>>>>>>
> > >> >>>>>>> In that case, if users would like to experiment via SQL, they
> > can
> > >> >>>>> easily
> > >> >>>>>>> create an in memory catalog/database using DDL, then play with
> > >> >> temp
> > >> >>>>>>> functions.
> > >> >>>>>>>
> > >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier
> has
> > >> >> not
> > >> >>>>> been
> > >> >>>>>>> resolved when stack call reaches
> > >> >> FunctionCatalog#lookupFunction(),
> > >> >>>> but
> > >> >>>>>> only
> > >> >>>>>>> been parsed?
> > >> >>>>>>>
> > >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
> > with
> > >> >>> the
> > >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass
> of
> > >> >>>>>>> FunctionIdentifier seem a bit counter intuitive.
> > >> >>>>>>>
> > >> >>>>>>> Another potentially simpler way is:
> > >> >>>>>>>
> > >> >>>>>>> ```
> > >> >>>>>>> // in class FunctionLookup
> > >> >>>>>>> class Result {
> > >> >>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ... }
> > >> >>>>>>>      String getName() { ... }
> > >> >>>>>>>      ...
> > >> >>>>>>> }
> > >> >>>>>>> ```
> > >> >>>>>>>
> > >> >>>>>>> WDYT?
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> > >> >>>>>>> [hidden email]> wrote:
> > >> >>>>>>>
> > >> >>>>>>>> Hi,
> > >> >>>>>>>> I really like the flip and think it clarifies important
> aspects
> > >> >> of
> > >> >>>> the
> > >> >>>>>>>> system.
> > >> >>>>>>>>
> > >> >>>>>>>> I have two, I hope small suggestions, which will not take
> much
> > >> >>> time
> > >> >>>> to
> > >> >>>>>>>> agree on.
> > >> >>>>>>>>
> > >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the
> > >> >> existence
> > >> >>> of
> > >> >>>>>>>> cat/db
> > >> >>>>>>>> for temporary functions? That means not to check it, so e.g.
> > >> >> it's
> > >> >>>>>> possible
> > >> >>>>>>>> to create a temporary function in a database that does not
> > >> >> exist.
> > >> >>> I
> > >> >>>>>> think
> > >> >>>>>>>> it's really useful e.g in cases when user wants to perform
> > >> >>>> experiments
> > >> >>>>>> but
> > >> >>>>>>>> does not have access to the db yet or temporarily does not
> have
> > >> >>>>>> connection
> > >> >>>>>>>> to a catalog.
> > >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
> > loosen
> > >> >>> the
> > >> >>>>>>>> requirements for all catalog objects such as tables, views,
> > >> >> types
> > >> >>>> just
> > >> >>>>>> for
> > >> >>>>>>>> the functions? It's really important later on from e.g the
> > >> >>>>>> serializability
> > >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier is
> > >> >> that
> > >> >>> we
> > >> >>>>>> know
> > >> >>>>>>>> that it has been resolved. The suggested changes break that
> > >> >>>>> assumption.
> > >> >>>>>>>> What do you think about adding an interface
> FunctionIdentifier
> > {
> > >> >>>>>>>>
> > >> >>>>>>>> String getName();
> > >> >>>>>>>>
> > >> >>>>>>>> /**
> > >> >>>>>>>>    Return 3-part identifier. Empty in case of a built-in
> > >> >> function.
> > >> >>>>>>>> */
> > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
> > >> >>>>>>>> }
> > >> >>>>>>>>
> > >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
> > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
> > >> >>>>>>>>   return Optional.of(this);
> > >> >>>>>>>> }
> > >> >>>>>>>> }
> > >> >>>>>>>>
> > >> >>>>>>>> class SystemFunctionIdentifier implements FunctionIdentifier
> > >> >> {...}
> > >> >>>>>>>> WDYT?
> > >> >>>>>>>>
> > >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]>
> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> +1. LGTM
> > >> >>>>>>>>>
> > >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
> > >> >> [hidden email]>
> > >> >>>>>> wrote:
> > >> >>>>>>>>>> +1
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Best,
> > >> >>>>>>>>>> Terry Wang
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> +1
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> Best,
> > >> >>>>>>>>>>> Kurt
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> > >> >>> [hidden email]
> > >> >>>>>>>> wrote:
> > >> >>>>>>>>>>>> Hi all,
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1], which
> > >> >>>> we've
> > >> >>>>>>>> reached
> > >> >>>>>>>>>>>> consensus in [2].
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
> > >> >>> UTC,
> > >> >>>>> Sep
> > >> >>>>>>>> 26.
> > >> >>>>>>>>>>>> Thanks,
> > >> >>>>>>>>>>>> Bowen
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> [1]
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > >> >>>>>>>>>>>> [2]
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>>
> > >> >>
> > >>
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > >> >>>>>>>>>>
> > >> >>>>>>>>> --
> > >> >>>>>>>>> Xuefu Zhang
> > >> >>>>>>>>>
> > >> >>>>>>>>> "In Honey We Trust!"
> > >> >>>>>>>>>
> > >> >>>
> > >> >>> --
> > >> >>> Best regards!
> > >> >>> Rui Li
> > >> >>>
> > >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

Xuefu Z
Here are some of my thoughts on the minor debates above:

1. +1 for 4 categories of functions. They are categorized along two
dimensions of binary values: X: *temporary* vs non-temporary (persistent);
Y: *system* vs non-system (so said catalog).
2. In my opinion, class functionIdentifier doesn't really need to reflect
the categories of the functions. Instead, we should decouple them to make
the API more stable. Thus, my suggestion is:

@Internal
class FunctionIdentifier {
  // for temporary/non-temporary system function
    public FunctionIdentifier ofSimpleName(String name) {  }
  // for temporary/non-temporary catalog function
    public FunctionIdentifier ofIdentifier(ObjectIdentifier
identifier){  }
    public Optional<String> getSimpleName() {  }
    public Optional<ObjectIdentifier> getIdentifier() {  }
}
3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be:

SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.

When either keyword is missing, "ALL" is implied along that dimension. We
should always limit the search to the system function catalog and the
current catalog/DB. I don't see a need of listing functions across
different catalogs and databases. (It can be added later if that arises.)

Thanks,
Xuefu

On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote:

> Hi Dawid,
>
> Thanks for bringing the suggestions up. I was prototyping yesterday and
> found out those places exactly as what you suggested.
>
> For CallExpression and UnresolvedCallExpression, I've added them to
> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and mark
> that as a breaking change
>
> For FunctionIdentifier, the suggested changes LGTM. Just want to bring up
> an issue on naming. It seems to me how we now name functions categories is
> a bit unclear and confusing, which is reflected on the suggested APIs - in
> FunctionIdentifier you lay out, "builtin function" would include builtin
> functions and temporary system functions as we are kind of using "system"
> and "built-in" interchangeably, and "catalog function" would include
> catalog functions and temporary functions. I currently can see two
> approaches to make it clearer to users.
>
> 1) Simplify FunctionIdentifier to be the following. As it's internal, we
> add comments and explanation to devs on which case the APIs support.
> However, I feel this approach would be somehow a bit conflicting to what
> you want to achieve for the clarity of APIs
>
> @Internal
> class FunctionIdentifier {
>   // for built-in function and temporary system function
>     public FunctionIdentifier of(String name) {  }
>   // for temporary function and catalog function
>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
>     public Optional<String> getFunctionName() {  }
>     public Optional<ObjectIdentifier> getObjectIdentifier() {  }
> }
>
> 2) We can rename our function categories as following so there'll be mainly
> just two categories of functions, "system functions" and "catalog
> functions", either of which can have temporary ones
>
>   - built-in functions -> officially rename to "system functions" and note
> to users that "system" and "built-in" can be used interchangeably. We
> prefer "system" because that's the keyword we decided to use in DDL that
> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
>   - temporary system functions
>   - catalog functions
>   - temporary functions  -> rename to "temporary catalog function"
>
> @Internal
> class FunctionIdentifier {
>   // for temporary/non-temporary system function
>     public FunctionIdentifier ofSystemFunction(String name) {  }
>   // for temporary/non-temporary catalog function
>     public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
> identifier){  }
>     public Optional<String> getSystemFunctionName() {  }
>     public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() {  }
> }
>
> WDYT?
>
>
> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> wrote:
>
> > Thanks for the summary Bowen!
> >
> > Looks good to me.
> >
> > Cheers,
> > Fabian
> >
> > Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <
> [hidden email]
> > >:
> >
> > > Hi all,
> > >
> > > I've updated the FLIP wiki with the following changes:
> > >
> > > - Lifespan of temp functions are not tied to those of catalogs and
> > > databases. Users can create temp functions even though catalogs/dbs in
> > > their fully qualified names don't even exist.
> > > - some new SQL commands
> > >     - "SHOW FUNCTIONS" - list names of temp and non-temp
> system/built-in
> > > functions, and names of temp and catalog functions in the current
> catalog
> > > and db
> > >     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp
> system/built
> > > functions, and fully qualified names of temp and catalog functions in
> all
> > > catalogs and dbs
> > >     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of
> temp
> > > functions in all catalog and db
> > >     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp
> > system
> > > functions
> > >
> > > Let me know if you have any questions.
> > >
> > > Seems we have resolved all concerns. If there's no more ones, I'd like
> to
> > > close the vote by this time tomorrow.
> > >
> > > Cheers,
> > > Bowen
> > >
> > > On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote:
> > >
> > > > Hi,
> > > >
> > > > I think above are some valid points, and we can adopt the
> suggestions.
> > > >
> > > > To elaborate a bit on the new SQL syntax, it would imply that, unlike
> > > > "SHOW FUNCTION" which only return function names, "SHOW ALL
> [TEMPORARY]
> > > > FUNCTIONS" would return functions' fully qualified names with catalog
> > and
> > > > db names.
> > > >
> > > >
> > > >
> > > > On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]>
> > wrote:
> > > >
> > > >> Hi all,
> > > >>
> > > >> I support Fabian's arguments. In my opinion, temporary objects
> should
> > > >> just be an additional layer on top of the regular catalog/database
> > > >> lookup logic. Thus, a temporary table or function has always highest
> > > >> precedence and should be stable within the local session. Otherwise
> it
> > > >> could magically disappear while someone else is performing
> > modifications
> > > >> in the catalog.
> > > >>
> > > >> Furthermore, this feature is very useful for prototyping as users
> can
> > > >> simply express that a catalog/database is present even through they
> > > >> might not have access to it currently.
> > > >>
> > > >> Regards,
> > > >> Timo
> > > >>
> > > >>
> > > >> On 30.09.19 14:57, Fabian Hueske wrote:
> > > >> > Hi all,
> > > >> >
> > > >> > Sorry for the late reply.
> > > >> >
> > > >> > I think it might lead to confusing situations if temporary
> functions
> > > (or
> > > >> > any temporary db objects for that matter) are bound to the life
> > cycle
> > > >> of an
> > > >> > (external) db/catalog.
> > > >> > Imaging a situation where you create a temp function in a db in an
> > > >> external
> > > >> > catalog and use it but at some point it does not work anymore
> > because
> > > >> some
> > > >> > other dropped the database from the external catalog.
> > > >> > Shouldn't temporary objects be only controlled by the owner of a
> > > >> session?
> > > >> >
> > > >> > I agree that creating temp objects in non-existing db/catalogs
> > sounds
> > > a
> > > >> bit
> > > >> > strange, but IMO the opposite (the db/catalog must exist for a
> temp
> > > >> > function to be created/exist) can have significant implications
> like
> > > the
> > > >> > one I described.
> > > >> > I think it would be quite easy for users to understand that
> > temporary
> > > >> > objects are solely owned by them (and their session).
> > > >> > The problem of listing temporary objects could be solved by
> adding a
> > > ALL
> > > >> > [TEMPORARY] clause:
> > > >> >
> > > >> > SHOW ALL FUNCTIONS; could show all functions regardless of the
> > > >> > catalog/database including temporary functions.
> > > >> > SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
> > > >> regardless
> > > >> > of the catalog/database.
> > > >> >
> > > >> > Best,
> > > >> > Fabian
> > > >> >
> > > >> > Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
> > > >> [hidden email]>:
> > > >> >
> > > >> >> @Dawid, do you have any other concerns? If not, I hope we can
> close
> > > the
> > > >> >> voting.
> > > >> >>
> > > >> >>
> > > >> >> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]>
> > > wrote:
> > > >> >>
> > > >> >>> I'm not sure how much benefit #1 can bring us. If users just
> want
> > to
> > > >> try
> > > >> >>> out temporary functions, they can create temporary system
> > functions
> > > >> which
> > > >> >>> don't require a catalog/DB. IIUC, the main reason why we allow
> > > >> temporary
> > > >> >>> catalog function is to let users override permanent catalog
> > > functions.
> > > >> >>> Therefore a temporary function in a non-existing catalog won't
> > serve
> > > >> that
> > > >> >>> purpose. Besides, each session is provided with a default
> catalog
> > > and
> > > >> DB.
> > > >> >>> So even if users simply want to create some catalog functions
> they
> > > can
> > > >> >>> forget about after the session, wouldn't the default catalog/DB
> be
> > > >> enough
> > > >> >>> for such experiments?
> > > >> >>>
> > > >> >>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]>
> > > wrote:
> > > >> >>>
> > > >> >>>> Re 1) As described in the FLIP, a temp function lookup will
> first
> > > >> make
> > > >> >>> sure
> > > >> >>>> the db exists. If the db doesn't exist, a lazy drop is
> triggered
> > to
> > > >> >>> remove
> > > >> >>>> that temp function.
> > > >> >>>>
> > > >> >>>> I agree Hive doesn't handle it consistently, and we are not
> > copying
> > > >> >> Hive.
> > > >> >>>> IMHO, allowing registering temp functions in nonexistent
> > catalog/db
> > > >> is
> > > >> >>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would
> list
> > > >> system
> > > >> >>>> functions and functions in the current catalog/db, since users
> > > cannot
> > > >> >>>> designate a nonexistent catalog/db as current ones, how can
> they
> > > list
> > > >> >>>> functions in nonexistent catalog/db? They may end up never
> > knowing
> > > >> what
> > > >> >>>> temp functions they've created unless trying out with queries
> or
> > we
> > > >> >>>> introducing some more nonstandard SQL statements. The same
> > applies
> > > to
> > > >> >>> other
> > > >> >>>> temp objects like temp tables.
> > > >> >>>>
> > > >> >>>> Re 2) A standalone FunctionIdentifier sounds good to me
> > > >> >>>>
> > > >> >>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
> > > >> >>>> [hidden email]>
> > > >> >>>> wrote:
> > > >> >>>>
> > > >> >>>>> Ad. 1
> > > >> >>>>> I wouldn't say it is hacky.
> > > >> >>>>> Moreover, how do you want ensure that the dB always exists
> when
> > a
> > > >> >>>> temporary
> > > >> >>>>> object is used?( in this particular case function). Do you
> want
> > to
> > > >> >>> query
> > > >> >>>>> for the database existence whenever e.g a temporary function
> is
> > > >> >> used? I
> > > >> >>>>> think important aspect here is that the database can be
> dropped
> > > from
> > > >> >>>>> external system, not just flink or a different flink session.
> > > >> >>>>>
> > > >> >>>>> E.g in case of hive, you cannot create a temporary table in a
> > > >> >> database
> > > >> >>>> that
> > > >> >>>>> does not exist, that's true. But if you create a temporary
> table
> > > in
> > > >> a
> > > >> >>>>> database and drop that database from a different session, you
> > can
> > > >> >> still
> > > >> >>>>> query the previously created temporary table from the original
> > > >> >> session.
> > > >> >>>> It
> > > >> >>>>> does not sound like a consistent behaviour to me. Why don't we
> > > make
> > > >> >>> this
> > > >> >>>>> behaviour of not binding a temporary objects to the lifetime
> of
> > a
> > > >> >>>> database
> > > >> >>>>> explicit as part of the temporary objects contract? In the end
> > > they
> > > >> >>> exist
> > > >> >>>>> in different layers. Permanent objects & databases in a
> catalog
> > > (in
> > > >> >>> case
> > > >> >>>> of
> > > >> >>>>> hive megastore) whereas temporary objects in flink sessions.
> > > That's
> > > >> >>> also
> > > >> >>>>> true for the original hive client. The temporary objects live
> in
> > > the
> > > >> >>> hive
> > > >> >>>>> client whereas databases are created in the metastore.
> > > >> >>>>>
> > > >> >>>>> Ad.2
> > > >> >>>>> I'm open for suggestions here. The one thing I wanted to
> achieve
> > > >> here
> > > >> >>> is
> > > >> >>>> so
> > > >> >>>>> that we do not change the contract of ObjectIdentifier. One
> > > >> important
> > > >> >>>> thing
> > > >> >>>>> to remember here is that we need the function identifier to be
> > > part
> > > >> >> of
> > > >> >>>> the
> > > >> >>>>> FunctionDefinition object and not only as the result of the
> > > function
> > > >> >>>>> lookup. At some point we want to be able to store
> > QueryOperations
> > > in
> > > >> >>> the
> > > >> >>>>> catalogs. They can contain function calls within which we need
> > to
> > > >> >> have
> > > >> >>>> the
> > > >> >>>>> identifier.
> > > >> >>>>>
> > > >> >>>>> I agree my initial suggestion is over complicated. How about
> we
> > > have
> > > >> >>> just
> > > >> >>>>> the FunctionIdentifier as top level class without making the
> > > >> >>>>> ObjectIdentifier extend from it? I think it's pretty much the
> > same
> > > >> >> what
> > > >> >>>> you
> > > >> >>>>> suggested. The only difference is that it would be a top level
> > > class
> > > >> >>>> with a
> > > >> >>>>> more descriptive name.
> > > >> >>>>>
> > > >> >>>>>
> > > >> >>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]>
> > wrote:
> > > >> >>>>>
> > > >> >>>>>> Sorry, I missed some parts of the solution. The complete
> > > >> >> alternative
> > > >> >>> is
> > > >> >>>>> the
> > > >> >>>>>> following, basically having separate APIs in FunctionLookup
> for
> > > >> >>>> ambiguous
> > > >> >>>>>> and precise function lookup since planner is able to tell
> which
> > > API
> > > >> >>> to
> > > >> >>>>> call
> > > >> >>>>>> with parsed queries, and have a unified result:
> > > >> >>>>>>
> > > >> >>>>>> ```
> > > >> >>>>>> class FunctionLookup {
> > > >> >>>>>>
> > > >> >>>>>> Optional<Result> lookupAmbiguousFunction(String name);
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> class Result {
> > > >> >>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ...
> }
> > > >> >>>>>>      String getName() { ... }
> > > >> >>>>>>      // ...
> > > >> >>>>>> }
> > > >> >>>>>>
> > > >> >>>>>> }
> > > >> >>>>>> ```
> > > >> >>>>>>
> > > >> >>>>>> Thanks,
> > > >> >>>>>> Bowen
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <
> [hidden email]>
> > > >> >>> wrote:
> > > >> >>>>>>> Hi Dawid,
> > > >> >>>>>>>
> > > >> >>>>>>> Re 1): I agree making it easy for users to run experiments
> is
> > > >> >>>>> important.
> > > >> >>>>>>> However, I'm not sure allowing users to register temp
> > functions
> > > >> >> in
> > > >> >>>>>>> nonexistent catalog/db is the optimal way. It seems a bit
> > hacky,
> > > >> >>> and
> > > >> >>>>>> breaks
> > > >> >>>>>>> the current contract between Flink and users that catalog/db
> > > must
> > > >> >>> be
> > > >> >>>>>> valid
> > > >> >>>>>>> in order to operate on.
> > > >> >>>>>>>
> > > >> >>>>>>> How about we instead focus on making it convenient to create
> > > >> >>>> catalogs?
> > > >> >>>>>>> Users actually can already do it with ease via program or
> SQL
> > > CLI
> > > >> >>>> yaml
> > > >> >>>>>> file
> > > >> >>>>>>> for an in-memory catalog which has neither extra dependency
> > nor
> > > >> >>>>> external
> > > >> >>>>>>> connections. What we can further improve is DDL for
> catalogs,
> > > >> >> and I
> > > >> >>>>>> raised
> > > >> >>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement]
> > driven
> > > >> >> by
> > > >> >>>>> Terry
> > > >> >>>>>>> now.
> > > >> >>>>>>>
> > > >> >>>>>>> In that case, if users would like to experiment via SQL,
> they
> > > can
> > > >> >>>>> easily
> > > >> >>>>>>> create an in memory catalog/database using DDL, then play
> with
> > > >> >> temp
> > > >> >>>>>>> functions.
> > > >> >>>>>>>
> > > >> >>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier
> > has
> > > >> >> not
> > > >> >>>>> been
> > > >> >>>>>>> resolved when stack call reaches
> > > >> >> FunctionCatalog#lookupFunction(),
> > > >> >>>> but
> > > >> >>>>>> only
> > > >> >>>>>>> been parsed?
> > > >> >>>>>>>
> > > >> >>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
> > > with
> > > >> >>> the
> > > >> >>>>>>> suggested classes, though making ObjectIdentifier a subclass
> > of
> > > >> >>>>>>> FunctionIdentifier seem a bit counter intuitive.
> > > >> >>>>>>>
> > > >> >>>>>>> Another potentially simpler way is:
> > > >> >>>>>>>
> > > >> >>>>>>> ```
> > > >> >>>>>>> // in class FunctionLookup
> > > >> >>>>>>> class Result {
> > > >> >>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() {
> ... }
> > > >> >>>>>>>      String getName() { ... }
> > > >> >>>>>>>      ...
> > > >> >>>>>>> }
> > > >> >>>>>>> ```
> > > >> >>>>>>>
> > > >> >>>>>>> WDYT?
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
> > > >> >>>>>>> [hidden email]> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>>> Hi,
> > > >> >>>>>>>> I really like the flip and think it clarifies important
> > aspects
> > > >> >> of
> > > >> >>>> the
> > > >> >>>>>>>> system.
> > > >> >>>>>>>>
> > > >> >>>>>>>> I have two, I hope small suggestions, which will not take
> > much
> > > >> >>> time
> > > >> >>>> to
> > > >> >>>>>>>> agree on.
> > > >> >>>>>>>>
> > > >> >>>>>>>> 1. Could we follow the MySQL approach in regards to the
> > > >> >> existence
> > > >> >>> of
> > > >> >>>>>>>> cat/db
> > > >> >>>>>>>> for temporary functions? That means not to check it, so
> e.g.
> > > >> >> it's
> > > >> >>>>>> possible
> > > >> >>>>>>>> to create a temporary function in a database that does not
> > > >> >> exist.
> > > >> >>> I
> > > >> >>>>>> think
> > > >> >>>>>>>> it's really useful e.g in cases when user wants to perform
> > > >> >>>> experiments
> > > >> >>>>>> but
> > > >> >>>>>>>> does not have access to the db yet or temporarily does not
> > have
> > > >> >>>>>> connection
> > > >> >>>>>>>> to a catalog.
> > > >> >>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
> > > loosen
> > > >> >>> the
> > > >> >>>>>>>> requirements for all catalog objects such as tables, views,
> > > >> >> types
> > > >> >>>> just
> > > >> >>>>>> for
> > > >> >>>>>>>> the functions? It's really important later on from e.g the
> > > >> >>>>>> serializability
> > > >> >>>>>>>> perspective. The important aspect of the ObjectIdentifier
> is
> > > >> >> that
> > > >> >>> we
> > > >> >>>>>> know
> > > >> >>>>>>>> that it has been resolved. The suggested changes break that
> > > >> >>>>> assumption.
> > > >> >>>>>>>> What do you think about adding an interface
> > FunctionIdentifier
> > > {
> > > >> >>>>>>>>
> > > >> >>>>>>>> String getName();
> > > >> >>>>>>>>
> > > >> >>>>>>>> /**
> > > >> >>>>>>>>    Return 3-part identifier. Empty in case of a built-in
> > > >> >> function.
> > > >> >>>>>>>> */
> > > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
> > > >> >>>>>>>> }
> > > >> >>>>>>>>
> > > >> >>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
> > > >> >>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
> > > >> >>>>>>>>   return Optional.of(this);
> > > >> >>>>>>>> }
> > > >> >>>>>>>> }
> > > >> >>>>>>>>
> > > >> >>>>>>>> class SystemFunctionIdentifier implements
> FunctionIdentifier
> > > >> >> {...}
> > > >> >>>>>>>> WDYT?
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]>
> > wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> +1. LGTM
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
> > > >> >> [hidden email]>
> > > >> >>>>>> wrote:
> > > >> >>>>>>>>>> +1
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> Best,
> > > >> >>>>>>>>>> Terry Wang
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> +1
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> Best,
> > > >> >>>>>>>>>>> Kurt
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
> > > >> >>> [hidden email]
> > > >> >>>>>>>> wrote:
> > > >> >>>>>>>>>>>> Hi all,
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1],
> which
> > > >> >>>> we've
> > > >> >>>>>>>> reached
> > > >> >>>>>>>>>>>> consensus in [2].
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
> > > >> >>> UTC,
> > > >> >>>>> Sep
> > > >> >>>>>>>> 26.
> > > >> >>>>>>>>>>>> Thanks,
> > > >> >>>>>>>>>>>> Bowen
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> [1]
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > > >> >>>>>>>>>>>> [2]
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>>
> > > >> >>
> > > >>
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>> --
> > > >> >>>>>>>>> Xuefu Zhang
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> "In Honey We Trust!"
> > > >> >>>>>>>>>
> > > >> >>>
> > > >> >>> --
> > > >> >>> Best regards!
> > > >> >>> Rui Li
> > > >> >>>
> > > >>
> > > >>
> > >
> >
>


--
Xuefu Zhang

"In Honey We Trust!"
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] FLIP-57: Rework FunctionCatalog

dwysakowicz
Hi,

I very much agree with Xuefu's summary of the two points, especially on
the "functionIdentifier doesn't need to reflect the categories".

For the factory methods I think methods of should be enough:

  // for temporary/non-temporary system function
    public FunctionIdentifier of(String name) {  }
  // for temporary/non-temporary catalog function
    public FunctionIdentifier of(ObjectIdentifier identifier){  }

In case of the getters I did not like the method name `getName` in the
original proposal, as in my opinion it could imply that it can return
also just the name part of an ObjectIdentifier, which should not be the
case.

I'm fine with getSimpleName/getIdentifier, but want to throw in few
other suggestions:

* getShortPath(Identifier)/getLongPath(Identifier),

* getSystemPath(Identifier)/getCatalogPath(Identifier)

+1 to any of the 3 options.

One additional thing the FunctionIdentifier should be a PublicEvolving
class, as it is part of a PublicEvolving APIs e.g. CallExpression, which
user might need to access e.g. in a filter pushdown.

I also support the Xuefu's suggestion not to support the "ALL" keyword
in the "SHOW [TEMPORARY] FUNCTIONS" statement, but as the exact design
of it  is not part of the FLIP-57, we do not need to agree on that in
this thread.

Overall I think after updating the FLIP with the outcome of the
discussion I vote +1 for it.

Best,

Dawid


On 02/10/2019 00:21, Xuefu Z wrote:

> Here are some of my thoughts on the minor debates above:
>
> 1. +1 for 4 categories of functions. They are categorized along two
> dimensions of binary values: X: *temporary* vs non-temporary (persistent);
> Y: *system* vs non-system (so said catalog).
> 2. In my opinion, class functionIdentifier doesn't really need to reflect
> the categories of the functions. Instead, we should decouple them to make
> the API more stable. Thus, my suggestion is:
>
> @Internal
> class FunctionIdentifier {
>   // for temporary/non-temporary system function
>     public FunctionIdentifier ofSimpleName(String name) {  }
>   // for temporary/non-temporary catalog function
>     public FunctionIdentifier ofIdentifier(ObjectIdentifier
> identifier){  }
>     public Optional<String> getSimpleName() {  }
>     public Optional<ObjectIdentifier> getIdentifier() {  }
> }
> 3. DDLs -- I don't think we need "ALL" keyword. The grammar can just be:
>
> SHOW [TEMPORARY] [SYSTEM] FUNCTIONS.
>
> When either keyword is missing, "ALL" is implied along that dimension. We
> should always limit the search to the system function catalog and the
> current catalog/DB. I don't see a need of listing functions across
> different catalogs and databases. (It can be added later if that arises.)
>
> Thanks,
> Xuefu
>
> On Tue, Oct 1, 2019 at 11:12 AM Bowen Li <[hidden email]> wrote:
>
>> Hi Dawid,
>>
>> Thanks for bringing the suggestions up. I was prototyping yesterday and
>> found out those places exactly as what you suggested.
>>
>> For CallExpression and UnresolvedCallExpression, I've added them to
>> FLIP-57. We will replace ObjectIdentifier with FunctionIdentifier and mark
>> that as a breaking change
>>
>> For FunctionIdentifier, the suggested changes LGTM. Just want to bring up
>> an issue on naming. It seems to me how we now name functions categories is
>> a bit unclear and confusing, which is reflected on the suggested APIs - in
>> FunctionIdentifier you lay out, "builtin function" would include builtin
>> functions and temporary system functions as we are kind of using "system"
>> and "built-in" interchangeably, and "catalog function" would include
>> catalog functions and temporary functions. I currently can see two
>> approaches to make it clearer to users.
>>
>> 1) Simplify FunctionIdentifier to be the following. As it's internal, we
>> add comments and explanation to devs on which case the APIs support.
>> However, I feel this approach would be somehow a bit conflicting to what
>> you want to achieve for the clarity of APIs
>>
>> @Internal
>> class FunctionIdentifier {
>>   // for built-in function and temporary system function
>>     public FunctionIdentifier of(String name) {  }
>>   // for temporary function and catalog function
>>     public FunctionIdentifier of(ObjectIdentifier identifier){  }
>>     public Optional<String> getFunctionName() {  }
>>     public Optional<ObjectIdentifier> getObjectIdentifier() {  }
>> }
>>
>> 2) We can rename our function categories as following so there'll be mainly
>> just two categories of functions, "system functions" and "catalog
>> functions", either of which can have temporary ones
>>
>>   - built-in functions -> officially rename to "system functions" and note
>> to users that "system" and "built-in" can be used interchangeably. We
>> prefer "system" because that's the keyword we decided to use in DDL that
>> creates its temporary peers ("CREATE TEMPORARY SYSTEM FUNCTION")
>>   - temporary system functions
>>   - catalog functions
>>   - temporary functions  -> rename to "temporary catalog function"
>>
>> @Internal
>> class FunctionIdentifier {
>>   // for temporary/non-temporary system function
>>     public FunctionIdentifier ofSystemFunction(String name) {  }
>>   // for temporary/non-temporary catalog function
>>     public FunctionIdentifier ofCatalogFunction(ObjectIdentifier
>> identifier){  }
>>     public Optional<String> getSystemFunctionName() {  }
>>     public Optional<ObjectIdentifier> getCatalogFunctionIdentifier() {  }
>> }
>>
>> WDYT?
>>
>>
>> On Tue, Oct 1, 2019 at 5:48 AM Fabian Hueske <[hidden email]> wrote:
>>
>>> Thanks for the summary Bowen!
>>>
>>> Looks good to me.
>>>
>>> Cheers,
>>> Fabian
>>>
>>> Am Mo., 30. Sept. 2019 um 23:24 Uhr schrieb Bowen Li <
>> [hidden email]
>>>> :
>>>> Hi all,
>>>>
>>>> I've updated the FLIP wiki with the following changes:
>>>>
>>>> - Lifespan of temp functions are not tied to those of catalogs and
>>>> databases. Users can create temp functions even though catalogs/dbs in
>>>> their fully qualified names don't even exist.
>>>> - some new SQL commands
>>>>     - "SHOW FUNCTIONS" - list names of temp and non-temp
>> system/built-in
>>>> functions, and names of temp and catalog functions in the current
>> catalog
>>>> and db
>>>>     - "SHOW ALL FUNCTIONS" - list names of temp and non-temp
>> system/built
>>>> functions, and fully qualified names of temp and catalog functions in
>> all
>>>> catalogs and dbs
>>>>     - "SHOW ALL TEMPORARY FUNCTIONS" - list fully qualified names of
>> temp
>>>> functions in all catalog and db
>>>>     - "SHOW ALL TEMPORARY SYSTEM FUNCTIONS" - list names of all temp
>>> system
>>>> functions
>>>>
>>>> Let me know if you have any questions.
>>>>
>>>> Seems we have resolved all concerns. If there's no more ones, I'd like
>> to
>>>> close the vote by this time tomorrow.
>>>>
>>>> Cheers,
>>>> Bowen
>>>>
>>>> On Mon, Sep 30, 2019 at 11:59 AM Bowen Li <[hidden email]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I think above are some valid points, and we can adopt the
>> suggestions.
>>>>> To elaborate a bit on the new SQL syntax, it would imply that, unlike
>>>>> "SHOW FUNCTION" which only return function names, "SHOW ALL
>> [TEMPORARY]
>>>>> FUNCTIONS" would return functions' fully qualified names with catalog
>>> and
>>>>> db names.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 30, 2019 at 6:38 AM Timo Walther <[hidden email]>
>>> wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> I support Fabian's arguments. In my opinion, temporary objects
>> should
>>>>>> just be an additional layer on top of the regular catalog/database
>>>>>> lookup logic. Thus, a temporary table or function has always highest
>>>>>> precedence and should be stable within the local session. Otherwise
>> it
>>>>>> could magically disappear while someone else is performing
>>> modifications
>>>>>> in the catalog.
>>>>>>
>>>>>> Furthermore, this feature is very useful for prototyping as users
>> can
>>>>>> simply express that a catalog/database is present even through they
>>>>>> might not have access to it currently.
>>>>>>
>>>>>> Regards,
>>>>>> Timo
>>>>>>
>>>>>>
>>>>>> On 30.09.19 14:57, Fabian Hueske wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Sorry for the late reply.
>>>>>>>
>>>>>>> I think it might lead to confusing situations if temporary
>> functions
>>>> (or
>>>>>>> any temporary db objects for that matter) are bound to the life
>>> cycle
>>>>>> of an
>>>>>>> (external) db/catalog.
>>>>>>> Imaging a situation where you create a temp function in a db in an
>>>>>> external
>>>>>>> catalog and use it but at some point it does not work anymore
>>> because
>>>>>> some
>>>>>>> other dropped the database from the external catalog.
>>>>>>> Shouldn't temporary objects be only controlled by the owner of a
>>>>>> session?
>>>>>>> I agree that creating temp objects in non-existing db/catalogs
>>> sounds
>>>> a
>>>>>> bit
>>>>>>> strange, but IMO the opposite (the db/catalog must exist for a
>> temp
>>>>>>> function to be created/exist) can have significant implications
>> like
>>>> the
>>>>>>> one I described.
>>>>>>> I think it would be quite easy for users to understand that
>>> temporary
>>>>>>> objects are solely owned by them (and their session).
>>>>>>> The problem of listing temporary objects could be solved by
>> adding a
>>>> ALL
>>>>>>> [TEMPORARY] clause:
>>>>>>>
>>>>>>> SHOW ALL FUNCTIONS; could show all functions regardless of the
>>>>>>> catalog/database including temporary functions.
>>>>>>> SHOW ALL TEMPORARY FUNCTIONS; could show all temporary functions
>>>>>> regardless
>>>>>>> of the catalog/database.
>>>>>>>
>>>>>>> Best,
>>>>>>> Fabian
>>>>>>>
>>>>>>> Am Sa., 28. Sept. 2019 um 02:21 Uhr schrieb Bowen Li <
>>>>>> [hidden email]>:
>>>>>>>> @Dawid, do you have any other concerns? If not, I hope we can
>> close
>>>> the
>>>>>>>> voting.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Sep 26, 2019 at 8:14 PM Rui Li <[hidden email]>
>>>> wrote:
>>>>>>>>> I'm not sure how much benefit #1 can bring us. If users just
>> want
>>> to
>>>>>> try
>>>>>>>>> out temporary functions, they can create temporary system
>>> functions
>>>>>> which
>>>>>>>>> don't require a catalog/DB. IIUC, the main reason why we allow
>>>>>> temporary
>>>>>>>>> catalog function is to let users override permanent catalog
>>>> functions.
>>>>>>>>> Therefore a temporary function in a non-existing catalog won't
>>> serve
>>>>>> that
>>>>>>>>> purpose. Besides, each session is provided with a default
>> catalog
>>>> and
>>>>>> DB.
>>>>>>>>> So even if users simply want to create some catalog functions
>> they
>>>> can
>>>>>>>>> forget about after the session, wouldn't the default catalog/DB
>> be
>>>>>> enough
>>>>>>>>> for such experiments?
>>>>>>>>>
>>>>>>>>> On Thu, Sep 26, 2019 at 4:38 AM Bowen Li <[hidden email]>
>>>> wrote:
>>>>>>>>>> Re 1) As described in the FLIP, a temp function lookup will
>> first
>>>>>> make
>>>>>>>>> sure
>>>>>>>>>> the db exists. If the db doesn't exist, a lazy drop is
>> triggered
>>> to
>>>>>>>>> remove
>>>>>>>>>> that temp function.
>>>>>>>>>>
>>>>>>>>>> I agree Hive doesn't handle it consistently, and we are not
>>> copying
>>>>>>>> Hive.
>>>>>>>>>> IMHO, allowing registering temp functions in nonexistent
>>> catalog/db
>>>>>> is
>>>>>>>>>> hacky and problematic. For instance, "SHOW FUNCTIONS" would
>> list
>>>>>> system
>>>>>>>>>> functions and functions in the current catalog/db, since users
>>>> cannot
>>>>>>>>>> designate a nonexistent catalog/db as current ones, how can
>> they
>>>> list
>>>>>>>>>> functions in nonexistent catalog/db? They may end up never
>>> knowing
>>>>>> what
>>>>>>>>>> temp functions they've created unless trying out with queries
>> or
>>> we
>>>>>>>>>> introducing some more nonstandard SQL statements. The same
>>> applies
>>>> to
>>>>>>>>> other
>>>>>>>>>> temp objects like temp tables.
>>>>>>>>>>
>>>>>>>>>> Re 2) A standalone FunctionIdentifier sounds good to me
>>>>>>>>>>
>>>>>>>>>> On Wed, Sep 25, 2019 at 4:46 AM Dawid Wysakowicz <
>>>>>>>>>> [hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Ad. 1
>>>>>>>>>>> I wouldn't say it is hacky.
>>>>>>>>>>> Moreover, how do you want ensure that the dB always exists
>> when
>>> a
>>>>>>>>>> temporary
>>>>>>>>>>> object is used?( in this particular case function). Do you
>> want
>>> to
>>>>>>>>> query
>>>>>>>>>>> for the database existence whenever e.g a temporary function
>> is
>>>>>>>> used? I
>>>>>>>>>>> think important aspect here is that the database can be
>> dropped
>>>> from
>>>>>>>>>>> external system, not just flink or a different flink session.
>>>>>>>>>>>
>>>>>>>>>>> E.g in case of hive, you cannot create a temporary table in a
>>>>>>>> database
>>>>>>>>>> that
>>>>>>>>>>> does not exist, that's true. But if you create a temporary
>> table
>>>> in
>>>>>> a
>>>>>>>>>>> database and drop that database from a different session, you
>>> can
>>>>>>>> still
>>>>>>>>>>> query the previously created temporary table from the original
>>>>>>>> session.
>>>>>>>>>> It
>>>>>>>>>>> does not sound like a consistent behaviour to me. Why don't we
>>>> make
>>>>>>>>> this
>>>>>>>>>>> behaviour of not binding a temporary objects to the lifetime
>> of
>>> a
>>>>>>>>>> database
>>>>>>>>>>> explicit as part of the temporary objects contract? In the end
>>>> they
>>>>>>>>> exist
>>>>>>>>>>> in different layers. Permanent objects & databases in a
>> catalog
>>>> (in
>>>>>>>>> case
>>>>>>>>>> of
>>>>>>>>>>> hive megastore) whereas temporary objects in flink sessions.
>>>> That's
>>>>>>>>> also
>>>>>>>>>>> true for the original hive client. The temporary objects live
>> in
>>>> the
>>>>>>>>> hive
>>>>>>>>>>> client whereas databases are created in the metastore.
>>>>>>>>>>>
>>>>>>>>>>> Ad.2
>>>>>>>>>>> I'm open for suggestions here. The one thing I wanted to
>> achieve
>>>>>> here
>>>>>>>>> is
>>>>>>>>>> so
>>>>>>>>>>> that we do not change the contract of ObjectIdentifier. One
>>>>>> important
>>>>>>>>>> thing
>>>>>>>>>>> to remember here is that we need the function identifier to be
>>>> part
>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>> FunctionDefinition object and not only as the result of the
>>>> function
>>>>>>>>>>> lookup. At some point we want to be able to store
>>> QueryOperations
>>>> in
>>>>>>>>> the
>>>>>>>>>>> catalogs. They can contain function calls within which we need
>>> to
>>>>>>>> have
>>>>>>>>>> the
>>>>>>>>>>> identifier.
>>>>>>>>>>>
>>>>>>>>>>> I agree my initial suggestion is over complicated. How about
>> we
>>>> have
>>>>>>>>> just
>>>>>>>>>>> the FunctionIdentifier as top level class without making the
>>>>>>>>>>> ObjectIdentifier extend from it? I think it's pretty much the
>>> same
>>>>>>>> what
>>>>>>>>>> you
>>>>>>>>>>> suggested. The only difference is that it would be a top level
>>>> class
>>>>>>>>>> with a
>>>>>>>>>>> more descriptive name.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 25 Sep 2019, 13:57 Bowen Li, <[hidden email]>
>>> wrote:
>>>>>>>>>>>> Sorry, I missed some parts of the solution. The complete
>>>>>>>> alternative
>>>>>>>>> is
>>>>>>>>>>> the
>>>>>>>>>>>> following, basically having separate APIs in FunctionLookup
>> for
>>>>>>>>>> ambiguous
>>>>>>>>>>>> and precise function lookup since planner is able to tell
>> which
>>>> API
>>>>>>>>> to
>>>>>>>>>>> call
>>>>>>>>>>>> with parsed queries, and have a unified result:
>>>>>>>>>>>>
>>>>>>>>>>>> ```
>>>>>>>>>>>> class FunctionLookup {
>>>>>>>>>>>>
>>>>>>>>>>>> Optional<Result> lookupAmbiguousFunction(String name);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Optional<Result> lookupPreciseFunction(ObjectIdentifier oi);
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> class Result {
>>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() { ...
>> }
>>>>>>>>>>>>      String getName() { ... }
>>>>>>>>>>>>      // ...
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> }
>>>>>>>>>>>> ```
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Bowen
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 24, 2019 at 9:42 PM Bowen Li <
>> [hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Dawid,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Re 1): I agree making it easy for users to run experiments
>> is
>>>>>>>>>>> important.
>>>>>>>>>>>>> However, I'm not sure allowing users to register temp
>>> functions
>>>>>>>> in
>>>>>>>>>>>>> nonexistent catalog/db is the optimal way. It seems a bit
>>> hacky,
>>>>>>>>> and
>>>>>>>>>>>> breaks
>>>>>>>>>>>>> the current contract between Flink and users that catalog/db
>>>> must
>>>>>>>>> be
>>>>>>>>>>>> valid
>>>>>>>>>>>>> in order to operate on.
>>>>>>>>>>>>>
>>>>>>>>>>>>> How about we instead focus on making it convenient to create
>>>>>>>>>> catalogs?
>>>>>>>>>>>>> Users actually can already do it with ease via program or
>> SQL
>>>> CLI
>>>>>>>>>> yaml
>>>>>>>>>>>> file
>>>>>>>>>>>>> for an in-memory catalog which has neither extra dependency
>>> nor
>>>>>>>>>>> external
>>>>>>>>>>>>> connections. What we can further improve is DDL for
>> catalogs,
>>>>>>>> and I
>>>>>>>>>>>> raised
>>>>>>>>>>>>> it in discussion of [FLIP 69 - Flink SQL DDL Enhancement]
>>> driven
>>>>>>>> by
>>>>>>>>>>> Terry
>>>>>>>>>>>>> now.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In that case, if users would like to experiment via SQL,
>> they
>>>> can
>>>>>>>>>>> easily
>>>>>>>>>>>>> create an in memory catalog/database using DDL, then play
>> with
>>>>>>>> temp
>>>>>>>>>>>>> functions.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Re 2): For the assumption, IIUIC, function ObjectIdentifier
>>> has
>>>>>>>> not
>>>>>>>>>>> been
>>>>>>>>>>>>> resolved when stack call reaches
>>>>>>>> FunctionCatalog#lookupFunction(),
>>>>>>>>>> but
>>>>>>>>>>>> only
>>>>>>>>>>>>> been parsed?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I agree keeping ObjectIdentifier as-is would be good. I'm ok
>>>> with
>>>>>>>>> the
>>>>>>>>>>>>> suggested classes, though making ObjectIdentifier a subclass
>>> of
>>>>>>>>>>>>> FunctionIdentifier seem a bit counter intuitive.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Another potentially simpler way is:
>>>>>>>>>>>>>
>>>>>>>>>>>>> ```
>>>>>>>>>>>>> // in class FunctionLookup
>>>>>>>>>>>>> class Result {
>>>>>>>>>>>>>      Optional<ObjectIdentifier>  getObjectIdentifier() {
>> ... }
>>>>>>>>>>>>>      String getName() { ... }
>>>>>>>>>>>>>      ...
>>>>>>>>>>>>> }
>>>>>>>>>>>>> ```
>>>>>>>>>>>>>
>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 3:41 PM Dawid Wysakowicz <
>>>>>>>>>>>>> [hidden email]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>> I really like the flip and think it clarifies important
>>> aspects
>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>>>>> system.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have two, I hope small suggestions, which will not take
>>> much
>>>>>>>>> time
>>>>>>>>>> to
>>>>>>>>>>>>>> agree on.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 1. Could we follow the MySQL approach in regards to the
>>>>>>>> existence
>>>>>>>>> of
>>>>>>>>>>>>>> cat/db
>>>>>>>>>>>>>> for temporary functions? That means not to check it, so
>> e.g.
>>>>>>>> it's
>>>>>>>>>>>> possible
>>>>>>>>>>>>>> to create a temporary function in a database that does not
>>>>>>>> exist.
>>>>>>>>> I
>>>>>>>>>>>> think
>>>>>>>>>>>>>> it's really useful e.g in cases when user wants to perform
>>>>>>>>>> experiments
>>>>>>>>>>>> but
>>>>>>>>>>>>>> does not have access to the db yet or temporarily does not
>>> have
>>>>>>>>>>>> connection
>>>>>>>>>>>>>> to a catalog.
>>>>>>>>>>>>>> 2. Could we not change the ObjectIdentifier? Could we not
>>>> loosen
>>>>>>>>> the
>>>>>>>>>>>>>> requirements for all catalog objects such as tables, views,
>>>>>>>> types
>>>>>>>>>> just
>>>>>>>>>>>> for
>>>>>>>>>>>>>> the functions? It's really important later on from e.g the
>>>>>>>>>>>> serializability
>>>>>>>>>>>>>> perspective. The important aspect of the ObjectIdentifier
>> is
>>>>>>>> that
>>>>>>>>> we
>>>>>>>>>>>> know
>>>>>>>>>>>>>> that it has been resolved. The suggested changes break that
>>>>>>>>>>> assumption.
>>>>>>>>>>>>>> What do you think about adding an interface
>>> FunctionIdentifier
>>>> {
>>>>>>>>>>>>>> String getName();
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> /**
>>>>>>>>>>>>>>    Return 3-part identifier. Empty in case of a built-in
>>>>>>>> function.
>>>>>>>>>>>>>> */
>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier()
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> class ObjectIdentifier implements FunctionIdentifier {
>>>>>>>>>>>>>> Optional<ObjectIdentifier> getObjectIdentifier() {
>>>>>>>>>>>>>>   return Optional.of(this);
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> class SystemFunctionIdentifier implements
>> FunctionIdentifier
>>>>>>>> {...}
>>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, 25 Sep 2019, 04:50 Xuefu Z, <[hidden email]>
>>> wrote:
>>>>>>>>>>>>>>> +1. LGTM
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 6:09 AM Terry Wang <
>>>>>>>> [hidden email]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>> Terry Wang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 在 2019年9月24日,上午10:42,Kurt Young <[hidden email]> 写道:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>> Kurt
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Sep 24, 2019 at 2:30 AM Bowen Li <
>>>>>>>>> [hidden email]
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I'd like to start a voting thread for FLIP-57 [1],
>> which
>>>>>>>>>> we've
>>>>>>>>>>>>>> reached
>>>>>>>>>>>>>>>>>> consensus in [2].
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> This voting will be open for minimum 3 days till 6:30pm
>>>>>>>>> UTC,
>>>>>>>>>>> Sep
>>>>>>>>>>>>>> 26.
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Bowen
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
>>>>>>>>>>>>>>>>>> [2]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-57-Rework-FunctionCatalog-td32291.html#a32613
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> Xuefu Zhang
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "In Honey We Trust!"
>>>>>>>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards!
>>>>>>>>> Rui Li
>>>>>>>>>
>>>>>>
>


signature.asc (849 bytes) Download Attachment
12