[DISCUSS] Whether catalog table factory should be used for temporary tables

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Whether catalog table factory should be used for temporary tables

Rui Li
Hi Dev,

Currently temporary generic tables cannot work with hive catalog [1]. When
hive catalog is chosen as the current catalog, planner will use
HiveTableFactory to create source/sink for the temporary
table. HiveTableFactory cannot tell whether a table is temporary or not,
and considers it as a Hive table, which leads to job failure.
I've discussed with Jingsong offline and we believe one solution is to make
planner avoid using catalog table factory for temporary tables. But I'd
also like to hear more opinions from others whether this is the right way
to go. I think a possible alternative is to add an *isTemporary* field
to TableSourceFactory.Context & TableSinkFactory.Context, so that
HiveTableFactory knows how to handle such tables. What do you think?

[1] https://issues.apache.org/jira/browse/FLINK-18999

--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Whether catalog table factory should be used for temporary tables

dwysakowicz
Hi Rui,

My take is that temporary tables should use the factory of the catalog
they were registered with.

What you are describing sounds very much like a limitation/bug in Hive
catalog only. I'd be in favor of passing the *isTemporary* flag.

Best,

Dawid

On 25/08/2020 09:37, Rui Li wrote:

> Hi Dev,
>
> Currently temporary generic tables cannot work with hive catalog [1]. When
> hive catalog is chosen as the current catalog, planner will use
> HiveTableFactory to create source/sink for the temporary
> table. HiveTableFactory cannot tell whether a table is temporary or not,
> and considers it as a Hive table, which leads to job failure.
> I've discussed with Jingsong offline and we believe one solution is to make
> planner avoid using catalog table factory for temporary tables. But I'd
> also like to hear more opinions from others whether this is the right way
> to go. I think a possible alternative is to add an *isTemporary* field
> to TableSourceFactory.Context & TableSinkFactory.Context, so that
> HiveTableFactory knows how to handle such tables. What do you think?
>
> [1] https://issues.apache.org/jira/browse/FLINK-18999
>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Whether catalog table factory should be used for temporary tables

Jingsong Li
Hi Dawid,

But the temporary table does not belong to Catalog, actually
Catalog doesn't know the existence of the temporary table. Let the table
factory of catalog to create source/sink sounds a little sudden.

If we want to make temporary tables belong to Catalog, I think we need to
involve catalog when creating temporary tables.

Best,
Jingsong

On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <[hidden email]>
wrote:

> Hi Rui,
>
> My take is that temporary tables should use the factory of the catalog
> they were registered with.
>
> What you are describing sounds very much like a limitation/bug in Hive
> catalog only. I'd be in favor of passing the *isTemporary* flag.
>
> Best,
>
> Dawid
>
> On 25/08/2020 09:37, Rui Li wrote:
> > Hi Dev,
> >
> > Currently temporary generic tables cannot work with hive catalog [1].
> When
> > hive catalog is chosen as the current catalog, planner will use
> > HiveTableFactory to create source/sink for the temporary
> > table. HiveTableFactory cannot tell whether a table is temporary or not,
> > and considers it as a Hive table, which leads to job failure.
> > I've discussed with Jingsong offline and we believe one solution is to
> make
> > planner avoid using catalog table factory for temporary tables. But I'd
> > also like to hear more opinions from others whether this is the right way
> > to go. I think a possible alternative is to add an *isTemporary* field
> > to TableSourceFactory.Context & TableSinkFactory.Context, so that
> > HiveTableFactory knows how to handle such tables. What do you think?
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-18999
> >
>
>

--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Whether catalog table factory should be used for temporary tables

Jark Wu-2
Hi,

I'm wondering if we always fallback to using SPI for temporary tables, then
how does the create Hive temporary table using Hive dialect work?

IMO, adding an "isTemporary" to the factory context sounds reasonable to
me, because the factory context should describe the full content of create
table DDL.

Best,
Jark


On Tue, 25 Aug 2020 at 16:01, Jingsong Li <[hidden email]> wrote:

> Hi Dawid,
>
> But the temporary table does not belong to Catalog, actually
> Catalog doesn't know the existence of the temporary table. Let the table
> factory of catalog to create source/sink sounds a little sudden.
>
> If we want to make temporary tables belong to Catalog, I think we need to
> involve catalog when creating temporary tables.
>
> Best,
> Jingsong
>
> On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <[hidden email]>
> wrote:
>
> > Hi Rui,
> >
> > My take is that temporary tables should use the factory of the catalog
> > they were registered with.
> >
> > What you are describing sounds very much like a limitation/bug in Hive
> > catalog only. I'd be in favor of passing the *isTemporary* flag.
> >
> > Best,
> >
> > Dawid
> >
> > On 25/08/2020 09:37, Rui Li wrote:
> > > Hi Dev,
> > >
> > > Currently temporary generic tables cannot work with hive catalog [1].
> > When
> > > hive catalog is chosen as the current catalog, planner will use
> > > HiveTableFactory to create source/sink for the temporary
> > > table. HiveTableFactory cannot tell whether a table is temporary or
> not,
> > > and considers it as a Hive table, which leads to job failure.
> > > I've discussed with Jingsong offline and we believe one solution is to
> > make
> > > planner avoid using catalog table factory for temporary tables. But I'd
> > > also like to hear more opinions from others whether this is the right
> way
> > > to go. I think a possible alternative is to add an *isTemporary* field
> > > to TableSourceFactory.Context & TableSinkFactory.Context, so that
> > > HiveTableFactory knows how to handle such tables. What do you think?
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-18999
> > >
> >
> >
>
> --
> Best, Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Whether catalog table factory should be used for temporary tables

Jingsong Li
Hi Jark,

You raised a good point: Creating the Hive temporary table.
AFAIK, Hive temporary tables should be stored in metastore, Hive metastore
will maintain their life cycle. Correct me if I am wrong.

So actually, if we want to support Hive temporary tables, we should finish
one thing:
- A temporary table should belong to Catalog.
- Instead of current, a temporary table belongs to CatalogManager.

It means, `createTemporaryTable` and `dropTemporaryTable` should be proxied
into the Catalog.
In this situation, actually, we don't need the "isTemporary" flag. (But we
can have it too)

Best,
Jingsong

On Tue, Aug 25, 2020 at 4:32 PM Jark Wu <[hidden email]> wrote:

> Hi,
>
> I'm wondering if we always fallback to using SPI for temporary tables, then
> how does the create Hive temporary table using Hive dialect work?
>
> IMO, adding an "isTemporary" to the factory context sounds reasonable to
> me, because the factory context should describe the full content of create
> table DDL.
>
> Best,
> Jark
>
>
> On Tue, 25 Aug 2020 at 16:01, Jingsong Li <[hidden email]> wrote:
>
> > Hi Dawid,
> >
> > But the temporary table does not belong to Catalog, actually
> > Catalog doesn't know the existence of the temporary table. Let the table
> > factory of catalog to create source/sink sounds a little sudden.
> >
> > If we want to make temporary tables belong to Catalog, I think we need to
> > involve catalog when creating temporary tables.
> >
> > Best,
> > Jingsong
> >
> > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <[hidden email]
> >
> > wrote:
> >
> > > Hi Rui,
> > >
> > > My take is that temporary tables should use the factory of the catalog
> > > they were registered with.
> > >
> > > What you are describing sounds very much like a limitation/bug in Hive
> > > catalog only. I'd be in favor of passing the *isTemporary* flag.
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 25/08/2020 09:37, Rui Li wrote:
> > > > Hi Dev,
> > > >
> > > > Currently temporary generic tables cannot work with hive catalog [1].
> > > When
> > > > hive catalog is chosen as the current catalog, planner will use
> > > > HiveTableFactory to create source/sink for the temporary
> > > > table. HiveTableFactory cannot tell whether a table is temporary or
> > not,
> > > > and considers it as a Hive table, which leads to job failure.
> > > > I've discussed with Jingsong offline and we believe one solution is
> to
> > > make
> > > > planner avoid using catalog table factory for temporary tables. But
> I'd
> > > > also like to hear more opinions from others whether this is the right
> > way
> > > > to go. I think a possible alternative is to add an *isTemporary*
> field
> > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that
> > > > HiveTableFactory knows how to handle such tables. What do you think?
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-18999
> > > >
> > >
> > >
> >
> > --
> > Best, Jingsong Lee
> >
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Whether catalog table factory should be used for temporary tables

Rui Li
Hi,

Thanks everyone for your inputs.

Temporary hive table is not supported at the moment. If we want to support
it, I agree with Jingsong that the life cycle of the temporary table should
somehow be bound to the hive catalog. For example, hive catalog should be
responsible to delete the table folder when the temporary table is dropped,
or when hive catalog itself gets unregistered. Whether such a table should
be stored in metastore is another topic and open to discussion. Hive
doesn't store temporary tables in metastore, so we probably won't want to
do it either.

I'm fine with either of the solutions proposed. I think we can add the
`isTemporary` flag to factory context, even though temporary tables
currently don't belong to a catalog. Because conceptually, whether a table
is temporary should be part of the context that a factory may be interested
in.

On Tue, Aug 25, 2020 at 4:48 PM Jingsong Li <[hidden email]> wrote:

> Hi Jark,
>
> You raised a good point: Creating the Hive temporary table.
> AFAIK, Hive temporary tables should be stored in metastore, Hive metastore
> will maintain their life cycle. Correct me if I am wrong.
>
> So actually, if we want to support Hive temporary tables, we should finish
> one thing:
> - A temporary table should belong to Catalog.
> - Instead of current, a temporary table belongs to CatalogManager.
>
> It means, `createTemporaryTable` and `dropTemporaryTable` should be
> proxied into the Catalog.
> In this situation, actually, we don't need the "isTemporary" flag. (But we
> can have it too)
>
> Best,
> Jingsong
>
> On Tue, Aug 25, 2020 at 4:32 PM Jark Wu <[hidden email]> wrote:
>
>> Hi,
>>
>> I'm wondering if we always fallback to using SPI for temporary tables,
>> then
>> how does the create Hive temporary table using Hive dialect work?
>>
>> IMO, adding an "isTemporary" to the factory context sounds reasonable to
>> me, because the factory context should describe the full content of create
>> table DDL.
>>
>> Best,
>> Jark
>>
>>
>> On Tue, 25 Aug 2020 at 16:01, Jingsong Li <[hidden email]> wrote:
>>
>> > Hi Dawid,
>> >
>> > But the temporary table does not belong to Catalog, actually
>> > Catalog doesn't know the existence of the temporary table. Let the table
>> > factory of catalog to create source/sink sounds a little sudden.
>> >
>> > If we want to make temporary tables belong to Catalog, I think we need
>> to
>> > involve catalog when creating temporary tables.
>> >
>> > Best,
>> > Jingsong
>> >
>> > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <
>> [hidden email]>
>> > wrote:
>> >
>> > > Hi Rui,
>> > >
>> > > My take is that temporary tables should use the factory of the catalog
>> > > they were registered with.
>> > >
>> > > What you are describing sounds very much like a limitation/bug in Hive
>> > > catalog only. I'd be in favor of passing the *isTemporary* flag.
>> > >
>> > > Best,
>> > >
>> > > Dawid
>> > >
>> > > On 25/08/2020 09:37, Rui Li wrote:
>> > > > Hi Dev,
>> > > >
>> > > > Currently temporary generic tables cannot work with hive catalog
>> [1].
>> > > When
>> > > > hive catalog is chosen as the current catalog, planner will use
>> > > > HiveTableFactory to create source/sink for the temporary
>> > > > table. HiveTableFactory cannot tell whether a table is temporary or
>> > not,
>> > > > and considers it as a Hive table, which leads to job failure.
>> > > > I've discussed with Jingsong offline and we believe one solution is
>> to
>> > > make
>> > > > planner avoid using catalog table factory for temporary tables. But
>> I'd
>> > > > also like to hear more opinions from others whether this is the
>> right
>> > way
>> > > > to go. I think a possible alternative is to add an *isTemporary*
>> field
>> > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that
>> > > > HiveTableFactory knows how to handle such tables. What do you think?
>> > > >
>> > > > [1] https://issues.apache.org/jira/browse/FLINK-18999
>> > > >
>> > >
>> > >
>> >
>> > --
>> > Best, Jingsong Lee
>> >
>>
>
>
> --
> Best, Jingsong Lee
>


--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Whether catalog table factory should be used for temporary tables

Rui Li
Hi everyone,

According to the feedback, it seems adding `isTemporary` to factory context
is the preferred way to fix the issue. I'll go ahead and make the change if
no one objects.

On Tue, Aug 25, 2020 at 5:36 PM Rui Li <[hidden email]> wrote:

> Hi,
>
> Thanks everyone for your inputs.
>
> Temporary hive table is not supported at the moment. If we want to support
> it, I agree with Jingsong that the life cycle of the temporary table should
> somehow be bound to the hive catalog. For example, hive catalog should be
> responsible to delete the table folder when the temporary table is dropped,
> or when hive catalog itself gets unregistered. Whether such a table should
> be stored in metastore is another topic and open to discussion. Hive
> doesn't store temporary tables in metastore, so we probably won't want to
> do it either.
>
> I'm fine with either of the solutions proposed. I think we can add the
> `isTemporary` flag to factory context, even though temporary tables
> currently don't belong to a catalog. Because conceptually, whether a table
> is temporary should be part of the context that a factory may be interested
> in.
>
> On Tue, Aug 25, 2020 at 4:48 PM Jingsong Li <[hidden email]>
> wrote:
>
>> Hi Jark,
>>
>> You raised a good point: Creating the Hive temporary table.
>> AFAIK, Hive temporary tables should be stored in metastore, Hive
>> metastore will maintain their life cycle. Correct me if I am wrong.
>>
>> So actually, if we want to support Hive temporary tables, we should
>> finish one thing:
>> - A temporary table should belong to Catalog.
>> - Instead of current, a temporary table belongs to CatalogManager.
>>
>> It means, `createTemporaryTable` and `dropTemporaryTable` should be
>> proxied into the Catalog.
>> In this situation, actually, we don't need the "isTemporary" flag. (But
>> we can have it too)
>>
>> Best,
>> Jingsong
>>
>> On Tue, Aug 25, 2020 at 4:32 PM Jark Wu <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> I'm wondering if we always fallback to using SPI for temporary tables,
>>> then
>>> how does the create Hive temporary table using Hive dialect work?
>>>
>>> IMO, adding an "isTemporary" to the factory context sounds reasonable to
>>> me, because the factory context should describe the full content of
>>> create
>>> table DDL.
>>>
>>> Best,
>>> Jark
>>>
>>>
>>> On Tue, 25 Aug 2020 at 16:01, Jingsong Li <[hidden email]>
>>> wrote:
>>>
>>> > Hi Dawid,
>>> >
>>> > But the temporary table does not belong to Catalog, actually
>>> > Catalog doesn't know the existence of the temporary table. Let the
>>> table
>>> > factory of catalog to create source/sink sounds a little sudden.
>>> >
>>> > If we want to make temporary tables belong to Catalog, I think we need
>>> to
>>> > involve catalog when creating temporary tables.
>>> >
>>> > Best,
>>> > Jingsong
>>> >
>>> > On Tue, Aug 25, 2020 at 3:55 PM Dawid Wysakowicz <
>>> [hidden email]>
>>> > wrote:
>>> >
>>> > > Hi Rui,
>>> > >
>>> > > My take is that temporary tables should use the factory of the
>>> catalog
>>> > > they were registered with.
>>> > >
>>> > > What you are describing sounds very much like a limitation/bug in
>>> Hive
>>> > > catalog only. I'd be in favor of passing the *isTemporary* flag.
>>> > >
>>> > > Best,
>>> > >
>>> > > Dawid
>>> > >
>>> > > On 25/08/2020 09:37, Rui Li wrote:
>>> > > > Hi Dev,
>>> > > >
>>> > > > Currently temporary generic tables cannot work with hive catalog
>>> [1].
>>> > > When
>>> > > > hive catalog is chosen as the current catalog, planner will use
>>> > > > HiveTableFactory to create source/sink for the temporary
>>> > > > table. HiveTableFactory cannot tell whether a table is temporary or
>>> > not,
>>> > > > and considers it as a Hive table, which leads to job failure.
>>> > > > I've discussed with Jingsong offline and we believe one solution
>>> is to
>>> > > make
>>> > > > planner avoid using catalog table factory for temporary tables.
>>> But I'd
>>> > > > also like to hear more opinions from others whether this is the
>>> right
>>> > way
>>> > > > to go. I think a possible alternative is to add an *isTemporary*
>>> field
>>> > > > to TableSourceFactory.Context & TableSinkFactory.Context, so that
>>> > > > HiveTableFactory knows how to handle such tables. What do you
>>> think?
>>> > > >
>>> > > > [1] https://issues.apache.org/jira/browse/FLINK-18999
>>> > > >
>>> > >
>>> > >
>>> >
>>> > --
>>> > Best, Jingsong Lee
>>> >
>>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>
>
> --
> Best regards!
> Rui Li
>


--
Best regards!
Rui Li