Hi Dev,
Currently temporary generic tables cannot work with hive catalog [1]. When hive catalog is chosen as the current catalog, planner will use HiveTableFactory to create source/sink for the temporary table. HiveTableFactory cannot tell whether a table is temporary or not, and considers it as a Hive table, which leads to job failure. I've discussed with Jingsong offline and we believe one solution is to make planner avoid using catalog table factory for temporary tables. But I'd also like to hear more opinions from others whether this is the right way to go. I think a possible alternative is to add an *isTemporary* field to TableSourceFactory.Context & TableSinkFactory.Context, so that HiveTableFactory knows how to handle such tables. What do you think? [1] https://issues.apache.org/jira/browse/FLINK-18999 -- Best regards! Rui Li |
Hi Rui,
My take is that temporary tables should use the factory of the catalog they were registered with. What you are describing sounds very much like a limitation/bug in Hive catalog only. I'd be in favor of passing the *isTemporary* flag. Best, Dawid On 25/08/2020 09:37, Rui Li wrote: > Hi Dev, > > Currently temporary generic tables cannot work with hive catalog [1]. When > hive catalog is chosen as the current catalog, planner will use > HiveTableFactory to create source/sink for the temporary > table. HiveTableFactory cannot tell whether a table is temporary or not, > and considers it as a Hive table, which leads to job failure. > I've discussed with Jingsong offline and we believe one solution is to make > planner avoid using catalog table factory for temporary tables. But I'd > also like to hear more opinions from others whether this is the right way > to go. I think a possible alternative is to add an *isTemporary* field > to TableSourceFactory.Context & TableSinkFactory.Context, so that > HiveTableFactory knows how to handle such tables. What do you think? > > [1] https://issues.apache.org/jira/browse/FLINK-18999 > signature.asc (849 bytes) Download Attachment |
Hi Dawid,
But the temporary table does not belong to Catalog, actually Catalog doesn't know the existence of the temporary table. Let the table factory of catalog to create source/sink sounds a little sudden. If we want to make temporary tables belong to Catalog, I think we need to involve catalog when creating temporary tables. Best, Jingsong On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <[hidden email]> wrote: > Hi Rui, > > My take is that temporary tables should use the factory of the catalog > they were registered with. > > What you are describing sounds very much like a limitation/bug in Hive > catalog only. I'd be in favor of passing the *isTemporary* flag. > > Best, > > Dawid > > On 25/08/2020 09:37, Rui Li wrote: > > Hi Dev, > > > > Currently temporary generic tables cannot work with hive catalog [1]. > When > > hive catalog is chosen as the current catalog, planner will use > > HiveTableFactory to create source/sink for the temporary > > table. HiveTableFactory cannot tell whether a table is temporary or not, > > and considers it as a Hive table, which leads to job failure. > > I've discussed with Jingsong offline and we believe one solution is to > make > > planner avoid using catalog table factory for temporary tables. But I'd > > also like to hear more opinions from others whether this is the right way > > to go. I think a possible alternative is to add an *isTemporary* field > > to TableSourceFactory.Context & TableSinkFactory.Context, so that > > HiveTableFactory knows how to handle such tables. What do you think? > > > > [1] https://issues.apache.org/jira/browse/FLINK-18999 > > > > -- Best, Jingsong Lee |
Hi,
I'm wondering if we always fallback to using SPI for temporary tables, then how does the create Hive temporary table using Hive dialect work? IMO, adding an "isTemporary" to the factory context sounds reasonable to me, because the factory context should describe the full content of create table DDL. Best, Jark On Tue, 25 Aug 2020 at 16:01, Jingsong Li <[hidden email]> wrote: > Hi Dawid, > > But the temporary table does not belong to Catalog, actually > Catalog doesn't know the existence of the temporary table. Let the table > factory of catalog to create source/sink sounds a little sudden. > > If we want to make temporary tables belong to Catalog, I think we need to > involve catalog when creating temporary tables. > > Best, > Jingsong > > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <[hidden email]> > wrote: > > > Hi Rui, > > > > My take is that temporary tables should use the factory of the catalog > > they were registered with. > > > > What you are describing sounds very much like a limitation/bug in Hive > > catalog only. I'd be in favor of passing the *isTemporary* flag. > > > > Best, > > > > Dawid > > > > On 25/08/2020 09:37, Rui Li wrote: > > > Hi Dev, > > > > > > Currently temporary generic tables cannot work with hive catalog [1]. > > When > > > hive catalog is chosen as the current catalog, planner will use > > > HiveTableFactory to create source/sink for the temporary > > > table. HiveTableFactory cannot tell whether a table is temporary or > not, > > > and considers it as a Hive table, which leads to job failure. > > > I've discussed with Jingsong offline and we believe one solution is to > > make > > > planner avoid using catalog table factory for temporary tables. But I'd > > > also like to hear more opinions from others whether this is the right > way > > > to go. I think a possible alternative is to add an *isTemporary* field > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that > > > HiveTableFactory knows how to handle such tables. What do you think? > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-18999 > > > > > > > > > -- > Best, Jingsong Lee > |
Hi Jark,
You raised a good point: Creating the Hive temporary table. AFAIK, Hive temporary tables should be stored in metastore, Hive metastore will maintain their life cycle. Correct me if I am wrong. So actually, if we want to support Hive temporary tables, we should finish one thing: - A temporary table should belong to Catalog. - Instead of current, a temporary table belongs to CatalogManager. It means, `createTemporaryTable` and `dropTemporaryTable` should be proxied into the Catalog. In this situation, actually, we don't need the "isTemporary" flag. (But we can have it too) Best, Jingsong On Tue, Aug 25, 2020 at 4:32 PM Jark Wu <[hidden email]> wrote: > Hi, > > I'm wondering if we always fallback to using SPI for temporary tables, then > how does the create Hive temporary table using Hive dialect work? > > IMO, adding an "isTemporary" to the factory context sounds reasonable to > me, because the factory context should describe the full content of create > table DDL. > > Best, > Jark > > > On Tue, 25 Aug 2020 at 16:01, Jingsong Li <[hidden email]> wrote: > > > Hi Dawid, > > > > But the temporary table does not belong to Catalog, actually > > Catalog doesn't know the existence of the temporary table. Let the table > > factory of catalog to create source/sink sounds a little sudden. > > > > If we want to make temporary tables belong to Catalog, I think we need to > > involve catalog when creating temporary tables. > > > > Best, > > Jingsong > > > > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <[hidden email] > > > > wrote: > > > > > Hi Rui, > > > > > > My take is that temporary tables should use the factory of the catalog > > > they were registered with. > > > > > > What you are describing sounds very much like a limitation/bug in Hive > > > catalog only. I'd be in favor of passing the *isTemporary* flag. > > > > > > Best, > > > > > > Dawid > > > > > > On 25/08/2020 09:37, Rui Li wrote: > > > > Hi Dev, > > > > > > > > Currently temporary generic tables cannot work with hive catalog [1]. > > > When > > > > hive catalog is chosen as the current catalog, planner will use > > > > HiveTableFactory to create source/sink for the temporary > > > > table. HiveTableFactory cannot tell whether a table is temporary or > > not, > > > > and considers it as a Hive table, which leads to job failure. > > > > I've discussed with Jingsong offline and we believe one solution is > to > > > make > > > > planner avoid using catalog table factory for temporary tables. But > I'd > > > > also like to hear more opinions from others whether this is the right > > way > > > > to go. I think a possible alternative is to add an *isTemporary* > field > > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that > > > > HiveTableFactory knows how to handle such tables. What do you think? > > > > > > > > [1] https://issues.apache.org/jira/browse/FLINK-18999 > > > > > > > > > > > > > > -- > > Best, Jingsong Lee > > > -- Best, Jingsong Lee |
Hi,
Thanks everyone for your inputs. Temporary hive table is not supported at the moment. If we want to support it, I agree with Jingsong that the life cycle of the temporary table should somehow be bound to the hive catalog. For example, hive catalog should be responsible to delete the table folder when the temporary table is dropped, or when hive catalog itself gets unregistered. Whether such a table should be stored in metastore is another topic and open to discussion. Hive doesn't store temporary tables in metastore, so we probably won't want to do it either. I'm fine with either of the solutions proposed. I think we can add the `isTemporary` flag to factory context, even though temporary tables currently don't belong to a catalog. Because conceptually, whether a table is temporary should be part of the context that a factory may be interested in. On Tue, Aug 25, 2020 at 4:48 PM Jingsong Li <[hidden email]> wrote: > Hi Jark, > > You raised a good point: Creating the Hive temporary table. > AFAIK, Hive temporary tables should be stored in metastore, Hive metastore > will maintain their life cycle. Correct me if I am wrong. > > So actually, if we want to support Hive temporary tables, we should finish > one thing: > - A temporary table should belong to Catalog. > - Instead of current, a temporary table belongs to CatalogManager. > > It means, `createTemporaryTable` and `dropTemporaryTable` should be > proxied into the Catalog. > In this situation, actually, we don't need the "isTemporary" flag. (But we > can have it too) > > Best, > Jingsong > > On Tue, Aug 25, 2020 at 4:32 PM Jark Wu <[hidden email]> wrote: > >> Hi, >> >> I'm wondering if we always fallback to using SPI for temporary tables, >> then >> how does the create Hive temporary table using Hive dialect work? >> >> IMO, adding an "isTemporary" to the factory context sounds reasonable to >> me, because the factory context should describe the full content of create >> table DDL. >> >> Best, >> Jark >> >> >> On Tue, 25 Aug 2020 at 16:01, Jingsong Li <[hidden email]> wrote: >> >> > Hi Dawid, >> > >> > But the temporary table does not belong to Catalog, actually >> > Catalog doesn't know the existence of the temporary table. Let the table >> > factory of catalog to create source/sink sounds a little sudden. >> > >> > If we want to make temporary tables belong to Catalog, I think we need >> to >> > involve catalog when creating temporary tables. >> > >> > Best, >> > Jingsong >> > >> > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz < >> [hidden email]> >> > wrote: >> > >> > > Hi Rui, >> > > >> > > My take is that temporary tables should use the factory of the catalog >> > > they were registered with. >> > > >> > > What you are describing sounds very much like a limitation/bug in Hive >> > > catalog only. I'd be in favor of passing the *isTemporary* flag. >> > > >> > > Best, >> > > >> > > Dawid >> > > >> > > On 25/08/2020 09:37, Rui Li wrote: >> > > > Hi Dev, >> > > > >> > > > Currently temporary generic tables cannot work with hive catalog >> [1]. >> > > When >> > > > hive catalog is chosen as the current catalog, planner will use >> > > > HiveTableFactory to create source/sink for the temporary >> > > > table. HiveTableFactory cannot tell whether a table is temporary or >> > not, >> > > > and considers it as a Hive table, which leads to job failure. >> > > > I've discussed with Jingsong offline and we believe one solution is >> to >> > > make >> > > > planner avoid using catalog table factory for temporary tables. But >> I'd >> > > > also like to hear more opinions from others whether this is the >> right >> > way >> > > > to go. I think a possible alternative is to add an *isTemporary* >> field >> > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that >> > > > HiveTableFactory knows how to handle such tables. What do you think? >> > > > >> > > > [1] https://issues.apache.org/jira/browse/FLINK-18999 >> > > > >> > > >> > > >> > >> > -- >> > Best, Jingsong Lee >> > >> > > > -- > Best, Jingsong Lee > -- Best regards! Rui Li |
Hi everyone,
According to the feedback, it seems adding `isTemporary` to factory context is the preferred way to fix the issue. I'll go ahead and make the change if no one objects. On Tue, Aug 25, 2020 at 5:36 PM Rui Li <[hidden email]> wrote: > Hi, > > Thanks everyone for your inputs. > > Temporary hive table is not supported at the moment. If we want to support > it, I agree with Jingsong that the life cycle of the temporary table should > somehow be bound to the hive catalog. For example, hive catalog should be > responsible to delete the table folder when the temporary table is dropped, > or when hive catalog itself gets unregistered. Whether such a table should > be stored in metastore is another topic and open to discussion. Hive > doesn't store temporary tables in metastore, so we probably won't want to > do it either. > > I'm fine with either of the solutions proposed. I think we can add the > `isTemporary` flag to factory context, even though temporary tables > currently don't belong to a catalog. Because conceptually, whether a table > is temporary should be part of the context that a factory may be interested > in. > > On Tue, Aug 25, 2020 at 4:48 PM Jingsong Li <[hidden email]> > wrote: > >> Hi Jark, >> >> You raised a good point: Creating the Hive temporary table. >> AFAIK, Hive temporary tables should be stored in metastore, Hive >> metastore will maintain their life cycle. Correct me if I am wrong. >> >> So actually, if we want to support Hive temporary tables, we should >> finish one thing: >> - A temporary table should belong to Catalog. >> - Instead of current, a temporary table belongs to CatalogManager. >> >> It means, `createTemporaryTable` and `dropTemporaryTable` should be >> proxied into the Catalog. >> In this situation, actually, we don't need the "isTemporary" flag. (But >> we can have it too) >> >> Best, >> Jingsong >> >> On Tue, Aug 25, 2020 at 4:32 PM Jark Wu <[hidden email]> wrote: >> >>> Hi, >>> >>> I'm wondering if we always fallback to using SPI for temporary tables, >>> then >>> how does the create Hive temporary table using Hive dialect work? >>> >>> IMO, adding an "isTemporary" to the factory context sounds reasonable to >>> me, because the factory context should describe the full content of >>> create >>> table DDL. >>> >>> Best, >>> Jark >>> >>> >>> On Tue, 25 Aug 2020 at 16:01, Jingsong Li <[hidden email]> >>> wrote: >>> >>> > Hi Dawid, >>> > >>> > But the temporary table does not belong to Catalog, actually >>> > Catalog doesn't know the existence of the temporary table. Let the >>> table >>> > factory of catalog to create source/sink sounds a little sudden. >>> > >>> > If we want to make temporary tables belong to Catalog, I think we need >>> to >>> > involve catalog when creating temporary tables. >>> > >>> > Best, >>> > Jingsong >>> > >>> > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz < >>> [hidden email]> >>> > wrote: >>> > >>> > > Hi Rui, >>> > > >>> > > My take is that temporary tables should use the factory of the >>> catalog >>> > > they were registered with. >>> > > >>> > > What you are describing sounds very much like a limitation/bug in >>> Hive >>> > > catalog only. I'd be in favor of passing the *isTemporary* flag. >>> > > >>> > > Best, >>> > > >>> > > Dawid >>> > > >>> > > On 25/08/2020 09:37, Rui Li wrote: >>> > > > Hi Dev, >>> > > > >>> > > > Currently temporary generic tables cannot work with hive catalog >>> [1]. >>> > > When >>> > > > hive catalog is chosen as the current catalog, planner will use >>> > > > HiveTableFactory to create source/sink for the temporary >>> > > > table. HiveTableFactory cannot tell whether a table is temporary or >>> > not, >>> > > > and considers it as a Hive table, which leads to job failure. >>> > > > I've discussed with Jingsong offline and we believe one solution >>> is to >>> > > make >>> > > > planner avoid using catalog table factory for temporary tables. >>> But I'd >>> > > > also like to hear more opinions from others whether this is the >>> right >>> > way >>> > > > to go. I think a possible alternative is to add an *isTemporary* >>> field >>> > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that >>> > > > HiveTableFactory knows how to handle such tables. What do you >>> think? >>> > > > >>> > > > [1] https://issues.apache.org/jira/browse/FLINK-18999 >>> > > > >>> > > >>> > > >>> > >>> > -- >>> > Best, Jingsong Lee >>> > >>> >> >> >> -- >> Best, Jingsong Lee >> > > > -- > Best regards! > Rui Li > -- Best regards! Rui Li |
Free forum by Nabble | Edit this page |