Hi all,
I want to kick off a discussion of whether we should remove the default in-memory catalog from Flink table module. Background: Currently Flink always has a default in-memory catalog named "default_catalog" upon start. This behavior is added in 1.9.0 along with new Catalog APIs. Till 1.9, lots of Flink temporary objects in table module are not well defined on both APIs and locations to store them, thus this default in-memory catalog serves backward compatibility purpose to hold such temporary objects, e.g. ConnectorCatalogTable. In Flink 1.10, we completely redefined temp objects so that, if I understand correctly, no temp objects are in this default catalog anymore. More specifically, all temp functions reside in FunctionCatalog as of FLIP-57 [1], and all temp table/view reside in CatalogManager as of FLIP-64 [2]. Thus, the backward compatibility purpose isn't there anymore. Therefore, I propose to remove this default in memory catalog from Flink. Benefits: it would make the metadata/catalog architecture cleaner, and prevent developers from misusing this default catalog unintentionally, as this default catalog is really no different from any other catalogs anymore from 1.10 Potential impacts: users then would need to explicitly register an in memory catalog either in table API or via SQL CLI yaml file, if they want to play with catalogs and catalog objects without external dependencies, but it's still pretty easy to do. Note that it's not a proposal to remove the in-memory catalog implementation. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module |
Hi Bowen:
Thanks for you proposal. You mean even if there is no catalog, users can completely walk through the set of temporary objects and work well? - If it is, I am +1, in memory catalog actually is a temporary catalog, it can not persist, we can replace it with real temporary objects. - If user must specify a catalog to run a job, I am -1, we can not force user to do another explicitly registry. It means, we should have most of functions in temporary objects, user can very happy to run jobs with temporary objects. Here I think there are two problems: - temporary objects need current catalog and current database to qualify, [1], like temporary view: createTemporaryView("temp", ...) → registers function with an identifier `current_cat`.`current_db`.`temp` - As I know, Temporary tables still lack many functions, such as statistics. What do you thinks? [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module Best, Jingsong Lee On Wed, Nov 20, 2019 at 7:52 AM Bowen Li <[hidden email]> wrote: > Hi all, > > I want to kick off a discussion of whether we should remove the default > in-memory catalog from Flink table module. > > Background: Currently Flink always has a default in-memory catalog named > "default_catalog" upon start. This behavior is added in 1.9.0 along with > new Catalog APIs. Till 1.9, lots of Flink temporary objects in table module > are not well defined on both APIs and locations to store them, thus this > default in-memory catalog serves backward compatibility purpose to hold > such temporary objects, e.g. ConnectorCatalogTable. > > In Flink 1.10, we completely redefined temp objects so that, if I > understand correctly, no temp objects are in this default catalog anymore. > More specifically, all temp functions reside in FunctionCatalog as of > FLIP-57 [1], and all temp table/view reside in CatalogManager as of FLIP-64 > [2]. Thus, the backward compatibility purpose isn't there anymore. > > Therefore, I propose to remove this default in memory catalog from Flink. > > Benefits: it would make the metadata/catalog architecture cleaner, and > prevent developers from misusing this default catalog unintentionally, as > this default catalog is really no different from any other catalogs anymore > from 1.10 > > Potential impacts: users then would need to explicitly register an in > memory catalog either in table API or via SQL CLI yaml file, if they want > to play with catalogs and catalog objects without external dependencies, > but it's still pretty easy to do. > > Note that it's not a proposal to remove the in-memory catalog > implementation. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > [2] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > -- Best, Jingsong Lee |
Hi Bowen,
I agree with Jingsong. I doubt whether it works if we remove the default catalog. Because all the temporary objects requires the current catalog and current database to qualify. That means even the simplest Table API programs will need to specify a catalog first, if it is something have to do, why not make it as default. Second, it will be a big change to current implementation around catalog, because the `currentCatalog` and `currentDatabase` will be nullable, all the operations based on them have to be protected by a proper exception or returning Optional. Best, Jark On Wed, 20 Nov 2019 at 15:44, Jingsong Li <[hidden email]> wrote: > Hi Bowen: > > Thanks for you proposal. > You mean even if there is no catalog, users can completely walk through the > set of temporary objects and work well? > - If it is, I am +1, in memory catalog actually is a temporary catalog, it > can not persist, we can replace it with real temporary objects. > - If user must specify a catalog to run a job, I am -1, we can not force > user to do another explicitly registry. It means, we should have most of > functions in temporary objects, user can very happy to run jobs with > temporary objects. > > Here I think there are two problems: > - temporary objects need current catalog and current database to qualify, > [1], like temporary view: > createTemporaryView("temp", ...) → registers function with an identifier > `current_cat`.`current_db`.`temp` > - As I know, Temporary tables still lack many functions, such as > statistics. > > What do you thinks? > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > Best, > Jingsong Lee > > On Wed, Nov 20, 2019 at 7:52 AM Bowen Li <[hidden email]> wrote: > > > Hi all, > > > > I want to kick off a discussion of whether we should remove the default > > in-memory catalog from Flink table module. > > > > Background: Currently Flink always has a default in-memory catalog named > > "default_catalog" upon start. This behavior is added in 1.9.0 along with > > new Catalog APIs. Till 1.9, lots of Flink temporary objects in table > module > > are not well defined on both APIs and locations to store them, thus this > > default in-memory catalog serves backward compatibility purpose to hold > > such temporary objects, e.g. ConnectorCatalogTable. > > > > In Flink 1.10, we completely redefined temp objects so that, if I > > understand correctly, no temp objects are in this default catalog > anymore. > > More specifically, all temp functions reside in FunctionCatalog as of > > FLIP-57 [1], and all temp table/view reside in CatalogManager as of > FLIP-64 > > [2]. Thus, the backward compatibility purpose isn't there anymore. > > > > Therefore, I propose to remove this default in memory catalog from Flink. > > > > Benefits: it would make the metadata/catalog architecture cleaner, and > > prevent developers from misusing this default catalog unintentionally, as > > this default catalog is really no different from any other catalogs > anymore > > from 1.10 > > > > Potential impacts: users then would need to explicitly register an in > > memory catalog either in table API or via SQL CLI yaml file, if they want > > to play with catalogs and catalog objects without external dependencies, > > but it's still pretty easy to do. > > > > Note that it's not a proposal to remove the in-memory catalog > > implementation. > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-57%3A+Rework+FunctionCatalog > > [2] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module > > > > > -- > Best, Jingsong Lee > |
Free forum by Nabble | Edit this page |