[DISCUSS] remove the default in-memory catalog from Flink table module

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] remove the default in-memory catalog from Flink table module

bowen.li
Hi all,

I want to kick off a discussion of whether we should remove the default
in-memory catalog from Flink table module.

Background: Currently Flink always has a default in-memory catalog named
"default_catalog" upon start. This behavior is added in 1.9.0 along with
new Catalog APIs. Till 1.9, lots of Flink temporary objects in table module
are not well defined on both APIs and locations to store them, thus this
default in-memory catalog serves backward compatibility purpose to hold
such temporary objects, e.g. ConnectorCatalogTable.

In Flink 1.10, we completely redefined temp objects so that, if I
understand correctly, no temp objects are in this default catalog anymore.
More specifically, all temp functions reside in FunctionCatalog as of
FLIP-57 [1], and all temp table/view reside in CatalogManager as of FLIP-64
[2]. Thus, the backward compatibility purpose isn't there anymore.

Therefore, I propose to remove this default in memory catalog from Flink.

Benefits: it would make the metadata/catalog architecture cleaner, and
prevent developers from misusing this default catalog unintentionally, as
this default catalog is really no different from any other catalogs anymore
from 1.10

Potential impacts: users then would need to explicitly register an in
memory catalog either in table API or via SQL CLI yaml file, if they want
to play with catalogs and catalog objects without external dependencies,
but it's still pretty easy to do.

Note that it's not a proposal to remove the in-memory catalog
implementation.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] remove the default in-memory catalog from Flink table module

Jingsong Li
Hi Bowen:

Thanks for you proposal.
You mean even if there is no catalog, users can completely walk through the
set of temporary objects and work well?
- If it is, I am +1, in memory catalog actually is a temporary catalog, it
can not persist, we can replace it with real temporary objects.
- If user must specify a catalog to run a job, I am -1, we can not force
user to do another explicitly registry. It means, we should have most of
functions in temporary objects, user can very happy to run jobs with
temporary objects.

Here I think there are two problems:
- temporary objects need current catalog and current database to qualify,
[1], like temporary view:
 createTemporaryView("temp", ...) → registers function with an identifier
`current_cat`.`current_db`.`temp`
- As I know, Temporary tables still lack many functions, such as statistics.

What do you thinks?

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

Best,
Jingsong Lee

On Wed, Nov 20, 2019 at 7:52 AM Bowen Li <[hidden email]> wrote:

> Hi all,
>
> I want to kick off a discussion of whether we should remove the default
> in-memory catalog from Flink table module.
>
> Background: Currently Flink always has a default in-memory catalog named
> "default_catalog" upon start. This behavior is added in 1.9.0 along with
> new Catalog APIs. Till 1.9, lots of Flink temporary objects in table module
> are not well defined on both APIs and locations to store them, thus this
> default in-memory catalog serves backward compatibility purpose to hold
> such temporary objects, e.g. ConnectorCatalogTable.
>
> In Flink 1.10, we completely redefined temp objects so that, if I
> understand correctly, no temp objects are in this default catalog anymore.
> More specifically, all temp functions reside in FunctionCatalog as of
> FLIP-57 [1], and all temp table/view reside in CatalogManager as of FLIP-64
> [2]. Thus, the backward compatibility purpose isn't there anymore.
>
> Therefore, I propose to remove this default in memory catalog from Flink.
>
> Benefits: it would make the metadata/catalog architecture cleaner, and
> prevent developers from misusing this default catalog unintentionally, as
> this default catalog is really no different from any other catalogs anymore
> from 1.10
>
> Potential impacts: users then would need to explicitly register an in
> memory catalog either in table API or via SQL CLI yaml file, if they want
> to play with catalogs and catalog objects without external dependencies,
> but it's still pretty easy to do.
>
> Note that it's not a proposal to remove the in-memory catalog
> implementation.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] remove the default in-memory catalog from Flink table module

Jark Wu-2
Hi Bowen,

I agree with Jingsong. I doubt whether it works if we remove the default
catalog.
Because all the temporary objects requires the current catalog and current
database to qualify.
That means even the simplest Table API programs will need to specify a
catalog first,
if it is something have to do, why not make it as default.

Second, it will be a big change to current implementation around catalog,
because the `currentCatalog` and `currentDatabase` will be nullable,
all the operations based on them have to be protected by a proper exception
or returning Optional.

Best,
Jark


On Wed, 20 Nov 2019 at 15:44, Jingsong Li <[hidden email]> wrote:

> Hi Bowen:
>
> Thanks for you proposal.
> You mean even if there is no catalog, users can completely walk through the
> set of temporary objects and work well?
> - If it is, I am +1, in memory catalog actually is a temporary catalog, it
> can not persist, we can replace it with real temporary objects.
> - If user must specify a catalog to run a job, I am -1, we can not force
> user to do another explicitly registry. It means, we should have most of
> functions in temporary objects, user can very happy to run jobs with
> temporary objects.
>
> Here I think there are two problems:
> - temporary objects need current catalog and current database to qualify,
> [1], like temporary view:
>  createTemporaryView("temp", ...) → registers function with an identifier
> `current_cat`.`current_db`.`temp`
> - As I know, Temporary tables still lack many functions, such as
> statistics.
>
> What do you thinks?
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>
> Best,
> Jingsong Lee
>
> On Wed, Nov 20, 2019 at 7:52 AM Bowen Li <[hidden email]> wrote:
>
> > Hi all,
> >
> > I want to kick off a discussion of whether we should remove the default
> > in-memory catalog from Flink table module.
> >
> > Background: Currently Flink always has a default in-memory catalog named
> > "default_catalog" upon start. This behavior is added in 1.9.0 along with
> > new Catalog APIs. Till 1.9, lots of Flink temporary objects in table
> module
> > are not well defined on both APIs and locations to store them, thus this
> > default in-memory catalog serves backward compatibility purpose to hold
> > such temporary objects, e.g. ConnectorCatalogTable.
> >
> > In Flink 1.10, we completely redefined temp objects so that, if I
> > understand correctly, no temp objects are in this default catalog
> anymore.
> > More specifically, all temp functions reside in FunctionCatalog as of
> > FLIP-57 [1], and all temp table/view reside in CatalogManager as of
> FLIP-64
> > [2]. Thus, the backward compatibility purpose isn't there anymore.
> >
> > Therefore, I propose to remove this default in memory catalog from Flink.
> >
> > Benefits: it would make the metadata/catalog architecture cleaner, and
> > prevent developers from misusing this default catalog unintentionally, as
> > this default catalog is really no different from any other catalogs
> anymore
> > from 1.10
> >
> > Potential impacts: users then would need to explicitly register an in
> > memory catalog either in table API or via SQL CLI yaml file, if they want
> > to play with catalogs and catalog objects without external dependencies,
> > but it's still pretty easy to do.
> >
> > Note that it's not a proposal to remove the in-memory catalog
> > implementation.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
> >
>
>
> --
> Best, Jingsong Lee
>