FLIP-117: HBase catalog

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

FLIP-117: HBase catalog

Flavio Pompermaier
Hello everybody,
I started a new FLIP to discuss about an HBaseCatalog implementation[1]
after the opening of the relative issue by Bowen [2].
I drafted a very simple version of the FLIP just to discuss about the
critical points (in red) in order to decide how to proceed.

Best,
Flavio

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-117%3A+HBase+catalog
[2] https://issues.apache.org/jira/browse/FLINK-16575
Reply | Threaded
Open this post in threaded view
|

Re: FLIP-117: HBase catalog

Jingsong Li
Thanks Flavio for driving. Personally I am +1 for integrating HBase tables.

I start a new topic for discussion. It is related but not the core of this
FLIP.
In the FLIP, I can see:
- Does HBase support the concept of partitions..? I don't think so..
- Does HBase support functions? I don't think so..
- Does HBase support statistics? I don't think so..
- Does HBase support views? I don't think so..

And in JDBC catalog [1]. There are lots of UnsupportedOperationExceptions
too.
And maybe for confluent catalog, UnsupportedOperationExceptions come again.
Lots of UnsupportedOperationExceptions looks unhappy to this catalog api...
So can we do some refactor to catalog api? I can see a lot of catalogs just
need provide table information without partitions, functions, statistics,
views...

CC: @Dawid Wysakowicz <[hidden email]> @Bowen Li
<[hidden email]>

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog

Best,
Jingsong Lee

On Sat, Mar 14, 2020 at 7:36 AM Flavio Pompermaier <[hidden email]>
wrote:

> Hello everybody,
> I started a new FLIP to discuss about an HBaseCatalog implementation[1]
> after the opening of the relative issue by Bowen [2].
> I drafted a very simple version of the FLIP just to discuss about the
> critical points (in red) in order to decide how to proceed.
>
> Best,
> Flavio
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-117%3A+HBase+catalog
> [2] https://issues.apache.org/jira/browse/FLINK-16575
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: FLIP-117: HBase catalog

bowen.li
Hi,

I think core of the jira right now is to investigate if catalogs of
schemaless systems like HBase and Elasticsearch bring practical value to
users. I haven't used these SQL connectors before, and thus don't have much
to say in this case. Can anyone describe how it would work? Maybe @Yu
or @Zheng can chime in.

w.r.t unsupported operation exception, they should be thrown in targeted
getters (e.g. getView(), getFunction()). General listing APIs like
listView(), listFunction() should not throw them but just return empty
results, for the sake of not breaking user SQL experience. To dedup code,
such common implementations can be moved to AbstractCatalog to make APIs
look cleaner. I recall that there was an intention to refactor catalog API
signatures, but haven't kept up with it.

Bowen

On Sun, Mar 15, 2020 at 10:19 PM Jingsong Li <[hidden email]> wrote:

> Thanks Flavio for driving. Personally I am +1 for integrating HBase tables.
>
> I start a new topic for discussion. It is related but not the core of this
> FLIP.
> In the FLIP, I can see:
> - Does HBase support the concept of partitions..? I don't think so..
> - Does HBase support functions? I don't think so..
> - Does HBase support statistics? I don't think so..
> - Does HBase support views? I don't think so..
>
> And in JDBC catalog [1]. There are lots of UnsupportedOperationExceptions
> too.
> And maybe for confluent catalog, UnsupportedOperationExceptions come again.
> Lots of UnsupportedOperationExceptions looks unhappy to this catalog api...
> So can we do some refactor to catalog api? I can see a lot of catalogs
> just need provide table information without partitions, functions,
> statistics, views...
>
> CC: @Dawid Wysakowicz <[hidden email]> @Bowen Li
> <[hidden email]>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog
>
> Best,
> Jingsong Lee
>
> On Sat, Mar 14, 2020 at 7:36 AM Flavio Pompermaier <[hidden email]>
> wrote:
>
>> Hello everybody,
>> I started a new FLIP to discuss about an HBaseCatalog implementation[1]
>> after the opening of the relative issue by Bowen [2].
>> I drafted a very simple version of the FLIP just to discuss about the
>> critical points (in red) in order to decide how to proceed.
>>
>> Best,
>> Flavio
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-117%3A+HBase+catalog
>> [2] https://issues.apache.org/jira/browse/FLINK-16575
>>
>
>
> --
> Best, Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: FLIP-117: HBase catalog

Yu Li
Thanks for bringing up this discussion Flavio. And thanks Bowen for the
ping.

For me, I'm not quite sure whether adding an HBase catalog suits into the
existing Catalog interface. It seems to be coupled with SQL standard
instead of a more general database catalog [1], which also reflects in the
FLIP document, especially the below four questions:

- Does HBase support the concept of partitions..? I don't think so..
- Does HBase support functions? I don't think so..
- Does HBase support statistics? I don't think so..
- Does HBase support views? I don't think so..

Partitions/Functions/Statistics/Views are all SQL concepts. Since HBase is
a non-relational (NoSQL) database [2] [3], I don't think we could easily
map the concepts, except for regions to partitions. You may find more
concepts (such as functions [4], views [5]) aligned in Phoenix [6] with
HBase as backing store, but that's off track for this FLIP.

I'm also not sure whether it still benefits even if only parts of the
concepts/methods could be matched/implemented, and I'd like to delegate the
decision to experts on TableAPI/SQL modules.

And to be explicit, I'm just giving some inputs instead of cutting a vote
here (none of +1/+0/-1).

Hopefully my input helps. Thanks.

Best Regards,
Yu

[1] https://en.wikipedia.org/wiki/Database_catalog
[2] https://en.wikipedia.org/wiki/Apache_HBase
[3] https://www.mail-archive.com/announce@.../msg05739.html
[4] https://phoenix.apache.org/language/functions.html
[5] https://hexdocs.pm/phoenix/views.html
[6] https://en.wikipedia.org/wiki/Apache_Phoenix


On Tue, 17 Mar 2020 at 01:10, Bowen Li <[hidden email]> wrote:

> Hi,
>
> I think core of the jira right now is to investigate if catalogs of
> schemaless systems like HBase and Elasticsearch bring practical value to
> users. I haven't used these SQL connectors before, and thus don't have much
> to say in this case. Can anyone describe how it would work? Maybe @Yu
> or @Zheng can chime in.
>
> w.r.t unsupported operation exception, they should be thrown in targeted
> getters (e.g. getView(), getFunction()). General listing APIs like
> listView(), listFunction() should not throw them but just return empty
> results, for the sake of not breaking user SQL experience. To dedup code,
> such common implementations can be moved to AbstractCatalog to make APIs
> look cleaner. I recall that there was an intention to refactor catalog API
> signatures, but haven't kept up with it.
>
> Bowen
>
> On Sun, Mar 15, 2020 at 10:19 PM Jingsong Li <[hidden email]>
> wrote:
>
>> Thanks Flavio for driving. Personally I am +1 for integrating HBase
>> tables.
>>
>> I start a new topic for discussion. It is related but not the core of
>> this FLIP.
>> In the FLIP, I can see:
>> - Does HBase support the concept of partitions..? I don't think so..
>> - Does HBase support functions? I don't think so..
>> - Does HBase support statistics? I don't think so..
>> - Does HBase support views? I don't think so..
>>
>> And in JDBC catalog [1]. There are lots of UnsupportedOperationExceptions
>> too.
>> And maybe for confluent catalog, UnsupportedOperationExceptions come
>> again.
>> Lots of UnsupportedOperationExceptions looks unhappy to this catalog
>> api...
>> So can we do some refactor to catalog api? I can see a lot of catalogs
>> just need provide table information without partitions, functions,
>> statistics, views...
>>
>> CC: @Dawid Wysakowicz <[hidden email]> @Bowen Li
>> <[hidden email]>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-93%3A+JDBC+catalog+and+Postgres+catalog
>>
>> Best,
>> Jingsong Lee
>>
>> On Sat, Mar 14, 2020 at 7:36 AM Flavio Pompermaier <[hidden email]>
>> wrote:
>>
>>> Hello everybody,
>>> I started a new FLIP to discuss about an HBaseCatalog implementation[1]
>>> after the opening of the relative issue by Bowen [2].
>>> I drafted a very simple version of the FLIP just to discuss about the
>>> critical points (in red) in order to decide how to proceed.
>>>
>>> Best,
>>> Flavio
>>>
>>> [1]
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-117%3A+HBase+catalog
>>> [2] https://issues.apache.org/jira/browse/FLINK-16575
>>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>