[DISCUSS] FLIP-152: Hive Query Syntax Compatibility

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Rui Li
Hi guys,

I'd like to start a discussion about providing HiveQL compatibility for
users connecting to a hive warehouse. FLIP-123 has already covered most
DDLs. So now it's time to complement the other big missing part -- queries.
With FLIP-152, the hive dialect covers more scenarios and makes it even
easier for users to migrate to Flink. More details are in the FLIP wiki
page [1]. Looking forward to your feedback!

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility

--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Kurt Young
Thanks Rui for starting this discussion.

I can see the benefit that we improve hive compatibility further, as quite
some users are asking for this
feature in mailing lists [1][2][3] and some online chatting tools such as
DingTalk.

I have 3 comments regarding to the design doc:

a) Could you add a section to describe the typical use case you want to
support after this feature is introduced?
In that way, users can also have an impression how to use this feature and
what the behavior and outcome will be.

b) Regarding the naming: "BlinkParserFactory", I suggest renaming it to
"FlinkParserFactory".

c) About the two limitations you mentioned:
    1. Only works with Hive tables and the current catalog needs to be a
HiveCatalog.
    2. Queries cannot involve tables/views from multiple catalogs.
I assume this is because hive parser and analyzer doesn't support
referring to a name with "x.y.z" fashion? Since
we can control all the behaviors by leveraging the codes hive currently
use. Is it possible that we can remove such
limitations? The reason is I'm not sure if users can make the whole story
work purely depending on hive catalog (that's
the reason why I gave comment #a). If multiple catalogs are involved, with
this limitation I don't think any meaningful
pipeline could be built. For example, users want to stream data from Kafka
to Hive, fully use hive's dialect including
query part. The kafka table could be a temporary table or saved in default
memory catalog.


[1] http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
[2] http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
[3]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html

Best,
Kurt


On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]> wrote:

> Hi guys,
>
> I'd like to start a discussion about providing HiveQL compatibility for
> users connecting to a hive warehouse. FLIP-123 has already covered most
> DDLs. So now it's time to complement the other big missing part -- queries.
> With FLIP-152, the hive dialect covers more scenarios and makes it even
> easier for users to migrate to Flink. More details are in the FLIP wiki
> page [1]. Looking forward to your feedback!
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
>
> --
> Best regards!
> Rui Li
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Rui Li
Thanks Kurt for your inputs!

I agree we should extend Hive code to support non-Hive tables. I have
updated the wiki page to remove the limitations you mentioned, and add
typical use cases in the "Motivation" section.

Regarding comment #b, the interface is defined in flink-table-planner-blink
and only used by the blink planner. So I think "BlinkParserFactory" is a
better name, WDYT?

On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]> wrote:

> Thanks Rui for starting this discussion.
>
> I can see the benefit that we improve hive compatibility further, as quite
> some users are asking for this
> feature in mailing lists [1][2][3] and some online chatting tools such as
> DingTalk.
>
> I have 3 comments regarding to the design doc:
>
> a) Could you add a section to describe the typical use case you want to
> support after this feature is introduced?
> In that way, users can also have an impression how to use this feature and
> what the behavior and outcome will be.
>
> b) Regarding the naming: "BlinkParserFactory", I suggest renaming it to
> "FlinkParserFactory".
>
> c) About the two limitations you mentioned:
>     1. Only works with Hive tables and the current catalog needs to be a
> HiveCatalog.
>     2. Queries cannot involve tables/views from multiple catalogs.
> I assume this is because hive parser and analyzer doesn't support
> referring to a name with "x.y.z" fashion? Since
> we can control all the behaviors by leveraging the codes hive currently
> use. Is it possible that we can remove such
> limitations? The reason is I'm not sure if users can make the whole story
> work purely depending on hive catalog (that's
> the reason why I gave comment #a). If multiple catalogs are involved, with
> this limitation I don't think any meaningful
> pipeline could be built. For example, users want to stream data from Kafka
> to Hive, fully use hive's dialect including
> query part. The kafka table could be a temporary table or saved in default
> memory catalog.
>
>
> [1] http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> [2] http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> [3]
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
>
> Best,
> Kurt
>
>
> On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]> wrote:
>
> > Hi guys,
> >
> > I'd like to start a discussion about providing HiveQL compatibility for
> > users connecting to a hive warehouse. FLIP-123 has already covered most
> > DDLs. So now it's time to complement the other big missing part --
> queries.
> > With FLIP-152, the hive dialect covers more scenarios and makes it even
> > easier for users to migrate to Flink. More details are in the FLIP wiki
> > page [1]. Looking forward to your feedback!
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> >
> > --
> > Best regards!
> > Rui Li
> >
>


--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Zhijiang(wangzhijiang999)
Thanks for launching the discussion and the FLIP, Rui!

It is really nice to see our continuous efforts for compatibility with Hive and benefiting users in this area.
I am only curious that are there any other compatible limitations for Hive users after this FLIP? Or can I say that the Hive compatibility is completely resolved after this FLIP?
I am interested in the ultimate goal in this area. Maybe it is out of this FLIP scope, but still wish some insights from you if possible. :)

Best,
Zhijiang


------------------------------------------------------------------
From:Rui Li <[hidden email]>
Send Time:2020年12月10日(星期四) 16:46
To:dev <[hidden email]>
Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Thanks Kurt for your inputs!

I agree we should extend Hive code to support non-Hive tables. I have
updated the wiki page to remove the limitations you mentioned, and add
typical use cases in the "Motivation" section.

Regarding comment #b, the interface is defined in flink-table-planner-blink
and only used by the blink planner. So I think "BlinkParserFactory" is a
better name, WDYT?

On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]> wrote:

> Thanks Rui for starting this discussion.
>
> I can see the benefit that we improve hive compatibility further, as quite
> some users are asking for this
> feature in mailing lists [1][2][3] and some online chatting tools such as
> DingTalk.
>
> I have 3 comments regarding to the design doc:
>
> a) Could you add a section to describe the typical use case you want to
> support after this feature is introduced?
> In that way, users can also have an impression how to use this feature and
> what the behavior and outcome will be.
>
> b) Regarding the naming: "BlinkParserFactory", I suggest renaming it to
> "FlinkParserFactory".
>
> c) About the two limitations you mentioned:
>     1. Only works with Hive tables and the current catalog needs to be a
> HiveCatalog.
>     2. Queries cannot involve tables/views from multiple catalogs.
> I assume this is because hive parser and analyzer doesn't support
> referring to a name with "x.y.z" fashion? Since
> we can control all the behaviors by leveraging the codes hive currently
> use. Is it possible that we can remove such
> limitations? The reason is I'm not sure if users can make the whole story
> work purely depending on hive catalog (that's
> the reason why I gave comment #a). If multiple catalogs are involved, with
> this limitation I don't think any meaningful
> pipeline could be built. For example, users want to stream data from Kafka
> to Hive, fully use hive's dialect including
> query part. The kafka table could be a temporary table or saved in default
> memory catalog.
>
>
> [1] http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> [2] http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> [3]
>
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
>
> Best,
> Kurt
>
>
> On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]> wrote:
>
> > Hi guys,
> >
> > I'd like to start a discussion about providing HiveQL compatibility for
> > users connecting to a hive warehouse. FLIP-123 has already covered most
> > DDLs. So now it's time to complement the other big missing part --
> queries.
> > With FLIP-152, the hive dialect covers more scenarios and makes it even
> > easier for users to migrate to Flink. More details are in the FLIP wiki
> > page [1]. Looking forward to your feedback!
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> >
> > --
> > Best regards!
> > Rui Li
> >
>


--
Best regards!
Rui Li

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Rui Li
Hi Zhijiang,

Glad to know you're interested in this FLIP. I wouldn't claim 100%
compatibility with this FLIP. That's because Flink doesn't have the
functionalities to support all Hive's features. To list a few examples:

   1. Hive allows users to process data with shell scripts -- very similar
   to UDFs [1]
   2. Users can compile inline Groovy UDFs and use them in queries [2]
   3. Users can dynamically add/delete jars, or even execute arbitrary
   shell command [3]

These features cannot be supported merely by a parser/planner, and it's
open to discussion whether Flink even should support them at all.

So the ultimate goal of this FLIP is to provide Hive syntax compatibility
to features that are already available in Flink, which I believe will cover
most common use cases.

[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
[2]
https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
[3]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands

On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <[hidden email]>
wrote:

> Thanks for launching the discussion and the FLIP, Rui!
>
> It is really nice to see our continuous efforts for compatibility with
> Hive and benefiting users in this area.
> I am only curious that are there any other compatible limitations for Hive
> users after this FLIP? Or can I say that the Hive compatibility is
> completely resolved after this FLIP?
> I am interested in the ultimate goal in this area. Maybe it is out of this
> FLIP scope, but still wish some insights from you if possible. :)
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> From:Rui Li <[hidden email]>
> Send Time:2020年12月10日(星期四) 16:46
> To:dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>
> Thanks Kurt for your inputs!
>
> I agree we should extend Hive code to support non-Hive tables. I have
> updated the wiki page to remove the limitations you mentioned, and add
> typical use cases in the "Motivation" section.
>
> Regarding comment #b, the interface is defined in flink-table-planner-blink
> and only used by the blink planner. So I think "BlinkParserFactory" is a
> better name, WDYT?
>
> On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]> wrote:
>
> > Thanks Rui for starting this discussion.
> >
> > I can see the benefit that we improve hive compatibility further, as
> quite
> > some users are asking for this
> > feature in mailing lists [1][2][3] and some online chatting tools such as
> > DingTalk.
> >
> > I have 3 comments regarding to the design doc:
> >
> > a) Could you add a section to describe the typical use case you want to
> > support after this feature is introduced?
> > In that way, users can also have an impression how to use this feature
> and
> > what the behavior and outcome will be.
> >
> > b) Regarding the naming: "BlinkParserFactory", I suggest renaming it to
> > "FlinkParserFactory".
> >
> > c) About the two limitations you mentioned:
> >     1. Only works with Hive tables and the current catalog needs to be a
> > HiveCatalog.
> >     2. Queries cannot involve tables/views from multiple catalogs.
> > I assume this is because hive parser and analyzer doesn't support
> > referring to a name with "x.y.z" fashion? Since
> > we can control all the behaviors by leveraging the codes hive currently
> > use. Is it possible that we can remove such
> > limitations? The reason is I'm not sure if users can make the whole story
> > work purely depending on hive catalog (that's
> > the reason why I gave comment #a). If multiple catalogs are involved,
> with
> > this limitation I don't think any meaningful
> > pipeline could be built. For example, users want to stream data from
> Kafka
> > to Hive, fully use hive's dialect including
> > query part. The kafka table could be a temporary table or saved in
> default
> > memory catalog.
> >
> >
> > [1] http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > [2]
> http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > [3]
> >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]> wrote:
> >
> > > Hi guys,
> > >
> > > I'd like to start a discussion about providing HiveQL compatibility for
> > > users connecting to a hive warehouse. FLIP-123 has already covered most
> > > DDLs. So now it's time to complement the other big missing part --
> > queries.
> > > With FLIP-152, the hive dialect covers more scenarios and makes it even
> > > easier for users to migrate to Flink. More details are in the FLIP wiki
> > > page [1]. Looking forward to your feedback!
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
>
>
> --
> Best regards!
> Rui Li
>
>

--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Zhijiang(wangzhijiang999)
Thanks for the further info and explanations! I have no other concerns.

Best,
Zhijiang


------------------------------------------------------------------
From:Rui Li <[hidden email]>
Send Time:2020年12月10日(星期四) 20:35
To:dev <[hidden email]>; Zhijiang <[hidden email]>
Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Hi Zhijiang,

Glad to know you're interested in this FLIP. I wouldn't claim 100%
compatibility with this FLIP. That's because Flink doesn't have the
functionalities to support all Hive's features. To list a few examples:

   1. Hive allows users to process data with shell scripts -- very similar
   to UDFs [1]
   2. Users can compile inline Groovy UDFs and use them in queries [2]
   3. Users can dynamically add/delete jars, or even execute arbitrary
   shell command [3]

These features cannot be supported merely by a parser/planner, and it's
open to discussion whether Flink even should support them at all.

So the ultimate goal of this FLIP is to provide Hive syntax compatibility
to features that are already available in Flink, which I believe will cover
most common use cases.

[1]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
[2]
https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
[3]
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands

On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <[hidden email]>
wrote:

> Thanks for launching the discussion and the FLIP, Rui!
>
> It is really nice to see our continuous efforts for compatibility with
> Hive and benefiting users in this area.
> I am only curious that are there any other compatible limitations for Hive
> users after this FLIP? Or can I say that the Hive compatibility is
> completely resolved after this FLIP?
> I am interested in the ultimate goal in this area. Maybe it is out of this
> FLIP scope, but still wish some insights from you if possible. :)
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> From:Rui Li <[hidden email]>
> Send Time:2020年12月10日(星期四) 16:46
> To:dev <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>
> Thanks Kurt for your inputs!
>
> I agree we should extend Hive code to support non-Hive tables. I have
> updated the wiki page to remove the limitations you mentioned, and add
> typical use cases in the "Motivation" section.
>
> Regarding comment #b, the interface is defined in flink-table-planner-blink
> and only used by the blink planner. So I think "BlinkParserFactory" is a
> better name, WDYT?
>
> On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]> wrote:
>
> > Thanks Rui for starting this discussion.
> >
> > I can see the benefit that we improve hive compatibility further, as
> quite
> > some users are asking for this
> > feature in mailing lists [1][2][3] and some online chatting tools such as
> > DingTalk.
> >
> > I have 3 comments regarding to the design doc:
> >
> > a) Could you add a section to describe the typical use case you want to
> > support after this feature is introduced?
> > In that way, users can also have an impression how to use this feature
> and
> > what the behavior and outcome will be.
> >
> > b) Regarding the naming: "BlinkParserFactory", I suggest renaming it to
> > "FlinkParserFactory".
> >
> > c) About the two limitations you mentioned:
> >     1. Only works with Hive tables and the current catalog needs to be a
> > HiveCatalog.
> >     2. Queries cannot involve tables/views from multiple catalogs.
> > I assume this is because hive parser and analyzer doesn't support
> > referring to a name with "x.y.z" fashion? Since
> > we can control all the behaviors by leveraging the codes hive currently
> > use. Is it possible that we can remove such
> > limitations? The reason is I'm not sure if users can make the whole story
> > work purely depending on hive catalog (that's
> > the reason why I gave comment #a). If multiple catalogs are involved,
> with
> > this limitation I don't think any meaningful
> > pipeline could be built. For example, users want to stream data from
> Kafka
> > to Hive, fully use hive's dialect including
> > query part. The kafka table could be a temporary table or saved in
> default
> > memory catalog.
> >
> >
> > [1] http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > [2]
> http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > [3]
> >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]> wrote:
> >
> > > Hi guys,
> > >
> > > I'd like to start a discussion about providing HiveQL compatibility for
> > > users connecting to a hive warehouse. FLIP-123 has already covered most
> > > DDLs. So now it's time to complement the other big missing part --
> > queries.
> > > With FLIP-152, the hive dialect covers more scenarios and makes it even
> > > easier for users to migrate to Flink. More details are in the FLIP wiki
> > > page [1]. Looking forward to your feedback!
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
>
>
> --
> Best regards!
> Rui Li
>
>

--
Best regards!
Rui Li

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Jingsong Li
Thanks Rui for the proposal, I think this FLIP is required by many users,
and it is very good to traditional Hive users. I have some confusion:

# Version

Which Hive version do you want to choose? Maybe, Hive 3.X and Hive 2.X have
some differences?

# Hive Codes

Can you evaluate how much code we need to copy to our flink-hive-connector?
Do we need to change them? We need to maintain them anyway.

# Functions

About Hive functions, I don't think it is a limitation, we are using
HiveModule to be compatible with Hive, right? So it is a solution instead
of a limitation.

# Keywords

Do you think there will be a keyword problem? Or can we be 100% compatible
with Hive?

On the whole, the FLIP looks very good and I'm looking forward to it.

Best,
Jingsong

On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
<[hidden email]> wrote:

> Thanks for the further info and explanations! I have no other concerns.
>
> Best,
> Zhijiang
>
>
> ------------------------------------------------------------------
> From:Rui Li <[hidden email]>
> Send Time:2020年12月10日(星期四) 20:35
> To:dev <[hidden email]>; Zhijiang <[hidden email]>
> Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>
> Hi Zhijiang,
>
> Glad to know you're interested in this FLIP. I wouldn't claim 100%
> compatibility with this FLIP. That's because Flink doesn't have the
> functionalities to support all Hive's features. To list a few examples:
>
>    1. Hive allows users to process data with shell scripts -- very similar
>    to UDFs [1]
>    2. Users can compile inline Groovy UDFs and use them in queries [2]
>    3. Users can dynamically add/delete jars, or even execute arbitrary
>    shell command [3]
>
> These features cannot be supported merely by a parser/planner, and it's
> open to discussion whether Flink even should support them at all.
>
> So the ultimate goal of this FLIP is to provide Hive syntax compatibility
> to features that are already available in Flink, which I believe will cover
> most common use cases.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
> [2]
>
> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
> [3]
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
>
> On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <[hidden email]
> .invalid>
> wrote:
>
> > Thanks for launching the discussion and the FLIP, Rui!
> >
> > It is really nice to see our continuous efforts for compatibility with
> > Hive and benefiting users in this area.
> > I am only curious that are there any other compatible limitations for
> Hive
> > users after this FLIP? Or can I say that the Hive compatibility is
> > completely resolved after this FLIP?
> > I am interested in the ultimate goal in this area. Maybe it is out of
> this
> > FLIP scope, but still wish some insights from you if possible. :)
> >
> > Best,
> > Zhijiang
> >
> >
> > ------------------------------------------------------------------
> > From:Rui Li <[hidden email]>
> > Send Time:2020年12月10日(星期四) 16:46
> > To:dev <[hidden email]>
> > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> >
> > Thanks Kurt for your inputs!
> >
> > I agree we should extend Hive code to support non-Hive tables. I have
> > updated the wiki page to remove the limitations you mentioned, and add
> > typical use cases in the "Motivation" section.
> >
> > Regarding comment #b, the interface is defined in
> flink-table-planner-blink
> > and only used by the blink planner. So I think "BlinkParserFactory" is a
> > better name, WDYT?
> >
> > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]> wrote:
> >
> > > Thanks Rui for starting this discussion.
> > >
> > > I can see the benefit that we improve hive compatibility further, as
> > quite
> > > some users are asking for this
> > > feature in mailing lists [1][2][3] and some online chatting tools such
> as
> > > DingTalk.
> > >
> > > I have 3 comments regarding to the design doc:
> > >
> > > a) Could you add a section to describe the typical use case you want to
> > > support after this feature is introduced?
> > > In that way, users can also have an impression how to use this feature
> > and
> > > what the behavior and outcome will be.
> > >
> > > b) Regarding the naming: "BlinkParserFactory", I suggest renaming it to
> > > "FlinkParserFactory".
> > >
> > > c) About the two limitations you mentioned:
> > >     1. Only works with Hive tables and the current catalog needs to be
> a
> > > HiveCatalog.
> > >     2. Queries cannot involve tables/views from multiple catalogs.
> > > I assume this is because hive parser and analyzer doesn't support
> > > referring to a name with "x.y.z" fashion? Since
> > > we can control all the behaviors by leveraging the codes hive currently
> > > use. Is it possible that we can remove such
> > > limitations? The reason is I'm not sure if users can make the whole
> story
> > > work purely depending on hive catalog (that's
> > > the reason why I gave comment #a). If multiple catalogs are involved,
> > with
> > > this limitation I don't think any meaningful
> > > pipeline could be built. For example, users want to stream data from
> > Kafka
> > > to Hive, fully use hive's dialect including
> > > query part. The kafka table could be a temporary table or saved in
> > default
> > > memory catalog.
> > >
> > >
> > > [1] http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > > [2]
> > http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > > [3]
> > >
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]> wrote:
> > >
> > > > Hi guys,
> > > >
> > > > I'd like to start a discussion about providing HiveQL compatibility
> for
> > > > users connecting to a hive warehouse. FLIP-123 has already covered
> most
> > > > DDLs. So now it's time to complement the other big missing part --
> > > queries.
> > > > With FLIP-152, the hive dialect covers more scenarios and makes it
> even
> > > > easier for users to migrate to Flink. More details are in the FLIP
> wiki
> > > > page [1]. Looking forward to your feedback!
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > >
> >
> >
> > --
> > Best regards!
> > Rui Li
> >
> >
>
> --
> Best regards!
> Rui Li
>
>

--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Rui Li
Hi Jingsong,

Thanks for your comments and they're very good questions.

Regarding # Version, we need to do some tradeoff here. Choosing the latest
3.x will cover all the features we want to support. But as you said, 3.x
and 2.x can have some differences and requires more efforts to support
lower versions. I decided to pick 2.x and evolve from there to support new
features in 3.x. Because I think most hive users, especially those who are
likely to be interested in this feature, are still using 2.x or even 1.x.
So the priority is to cover 2.x and 1.x first.

Regarding # Hive Codes, in my PoC, I simply copy the code and make as few
changes as possible. I believe we can do some clean up or refactor to
reduce it. With that in mind, I expect it to be over 10k lines of java
code, and even more if we count ANTLR grammar files as well.

Regarding # Functions, you're right that HiveModule is more of a solution
than limitation. I just want to emphasize that HiveModule and hive dialect
need to be used together to achieve better compatibility.

Regarding # Keywords, new hive versions can have more reserved keywords
than old versions. Since we're based on hive 2.x code, it may not provide
100% keyword-compatibility to 1.x users. But I expect it to be good enough
for most cases. If not, we can provide different grammar files for lower
versions.

On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li <[hidden email]> wrote:

> Thanks Rui for the proposal, I think this FLIP is required by many users,
> and it is very good to traditional Hive users. I have some confusion:
>
> # Version
>
> Which Hive version do you want to choose? Maybe, Hive 3.X and Hive 2.X have
> some differences?
>
> # Hive Codes
>
> Can you evaluate how much code we need to copy to our flink-hive-connector?
> Do we need to change them? We need to maintain them anyway.
>
> # Functions
>
> About Hive functions, I don't think it is a limitation, we are using
> HiveModule to be compatible with Hive, right? So it is a solution instead
> of a limitation.
>
> # Keywords
>
> Do you think there will be a keyword problem? Or can we be 100% compatible
> with Hive?
>
> On the whole, the FLIP looks very good and I'm looking forward to it.
>
> Best,
> Jingsong
>
> On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
> <[hidden email]> wrote:
>
> > Thanks for the further info and explanations! I have no other concerns.
> >
> > Best,
> > Zhijiang
> >
> >
> > ------------------------------------------------------------------
> > From:Rui Li <[hidden email]>
> > Send Time:2020年12月10日(星期四) 20:35
> > To:dev <[hidden email]>; Zhijiang <[hidden email]>
> > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> >
> > Hi Zhijiang,
> >
> > Glad to know you're interested in this FLIP. I wouldn't claim 100%
> > compatibility with this FLIP. That's because Flink doesn't have the
> > functionalities to support all Hive's features. To list a few examples:
> >
> >    1. Hive allows users to process data with shell scripts -- very
> similar
> >    to UDFs [1]
> >    2. Users can compile inline Groovy UDFs and use them in queries [2]
> >    3. Users can dynamically add/delete jars, or even execute arbitrary
> >    shell command [3]
> >
> > These features cannot be supported merely by a parser/planner, and it's
> > open to discussion whether Flink even should support them at all.
> >
> > So the ultimate goal of this FLIP is to provide Hive syntax compatibility
> > to features that are already available in Flink, which I believe will
> cover
> > most common use cases.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
> > [2]
> >
> >
> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
> > [3]
> >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
> >
> > On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <[hidden email]
> > .invalid>
> > wrote:
> >
> > > Thanks for launching the discussion and the FLIP, Rui!
> > >
> > > It is really nice to see our continuous efforts for compatibility with
> > > Hive and benefiting users in this area.
> > > I am only curious that are there any other compatible limitations for
> > Hive
> > > users after this FLIP? Or can I say that the Hive compatibility is
> > > completely resolved after this FLIP?
> > > I am interested in the ultimate goal in this area. Maybe it is out of
> > this
> > > FLIP scope, but still wish some insights from you if possible. :)
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Rui Li <[hidden email]>
> > > Send Time:2020年12月10日(星期四) 16:46
> > > To:dev <[hidden email]>
> > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > >
> > > Thanks Kurt for your inputs!
> > >
> > > I agree we should extend Hive code to support non-Hive tables. I have
> > > updated the wiki page to remove the limitations you mentioned, and add
> > > typical use cases in the "Motivation" section.
> > >
> > > Regarding comment #b, the interface is defined in
> > flink-table-planner-blink
> > > and only used by the blink planner. So I think "BlinkParserFactory" is
> a
> > > better name, WDYT?
> > >
> > > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]> wrote:
> > >
> > > > Thanks Rui for starting this discussion.
> > > >
> > > > I can see the benefit that we improve hive compatibility further, as
> > > quite
> > > > some users are asking for this
> > > > feature in mailing lists [1][2][3] and some online chatting tools
> such
> > as
> > > > DingTalk.
> > > >
> > > > I have 3 comments regarding to the design doc:
> > > >
> > > > a) Could you add a section to describe the typical use case you want
> to
> > > > support after this feature is introduced?
> > > > In that way, users can also have an impression how to use this
> feature
> > > and
> > > > what the behavior and outcome will be.
> > > >
> > > > b) Regarding the naming: "BlinkParserFactory", I suggest renaming it
> to
> > > > "FlinkParserFactory".
> > > >
> > > > c) About the two limitations you mentioned:
> > > >     1. Only works with Hive tables and the current catalog needs to
> be
> > a
> > > > HiveCatalog.
> > > >     2. Queries cannot involve tables/views from multiple catalogs.
> > > > I assume this is because hive parser and analyzer doesn't support
> > > > referring to a name with "x.y.z" fashion? Since
> > > > we can control all the behaviors by leveraging the codes hive
> currently
> > > > use. Is it possible that we can remove such
> > > > limitations? The reason is I'm not sure if users can make the whole
> > story
> > > > work purely depending on hive catalog (that's
> > > > the reason why I gave comment #a). If multiple catalogs are involved,
> > > with
> > > > this limitation I don't think any meaningful
> > > > pipeline could be built. For example, users want to stream data from
> > > Kafka
> > > > to Hive, fully use hive's dialect including
> > > > query part. The kafka table could be a temporary table or saved in
> > > default
> > > > memory catalog.
> > > >
> > > >
> > > > [1]
> http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > > > [2]
> > > http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > > > [3]
> > > >
> > > >
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]>
> wrote:
> > > >
> > > > > Hi guys,
> > > > >
> > > > > I'd like to start a discussion about providing HiveQL compatibility
> > for
> > > > > users connecting to a hive warehouse. FLIP-123 has already covered
> > most
> > > > > DDLs. So now it's time to complement the other big missing part --
> > > > queries.
> > > > > With FLIP-152, the hive dialect covers more scenarios and makes it
> > even
> > > > > easier for users to migrate to Flink. More details are in the FLIP
> > wiki
> > > > > page [1]. Looking forward to your feedback!
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > > > >
> > > > > --
> > > > > Best regards!
> > > > > Rui Li
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> > >
> >
> > --
> > Best regards!
> > Rui Li
> >
> >
>
> --
> Best, Jingsong Lee
>


--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

godfreyhe
Thanks for bringing up the discussion, Rui!

Regarding the DDL part in the "Introduce HiveParser" section,
I would like to choose the second option. Because if we could
use one hive parser to parse all hive SQLs, we need not to copy
Calcite parser code, and the framework and the code will be very simple.

Regarding the "Go Beyond Hive" section, is that the scope of this FLIP ?
Could you list all the extensions and give some examples ?

One minor suggestion about the name of ParserImplFactory.
How about renaming ParserImplFactory to DefaultParserFactory ?

Best,
Godfrey

Rui Li <[hidden email]> 于2021年2月3日周三 上午11:16写道:

> Hi Jingsong,
>
> Thanks for your comments and they're very good questions.
>
> Regarding # Version, we need to do some tradeoff here. Choosing the latest
> 3.x will cover all the features we want to support. But as you said, 3.x
> and 2.x can have some differences and requires more efforts to support
> lower versions. I decided to pick 2.x and evolve from there to support new
> features in 3.x. Because I think most hive users, especially those who are
> likely to be interested in this feature, are still using 2.x or even 1.x.
> So the priority is to cover 2.x and 1.x first.
>
> Regarding # Hive Codes, in my PoC, I simply copy the code and make as few
> changes as possible. I believe we can do some clean up or refactor to
> reduce it. With that in mind, I expect it to be over 10k lines of java
> code, and even more if we count ANTLR grammar files as well.
>
> Regarding # Functions, you're right that HiveModule is more of a solution
> than limitation. I just want to emphasize that HiveModule and hive dialect
> need to be used together to achieve better compatibility.
>
> Regarding # Keywords, new hive versions can have more reserved keywords
> than old versions. Since we're based on hive 2.x code, it may not provide
> 100% keyword-compatibility to 1.x users. But I expect it to be good enough
> for most cases. If not, we can provide different grammar files for lower
> versions.
>
> On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li <[hidden email]> wrote:
>
> > Thanks Rui for the proposal, I think this FLIP is required by many users,
> > and it is very good to traditional Hive users. I have some confusion:
> >
> > # Version
> >
> > Which Hive version do you want to choose? Maybe, Hive 3.X and Hive 2.X
> have
> > some differences?
> >
> > # Hive Codes
> >
> > Can you evaluate how much code we need to copy to our
> flink-hive-connector?
> > Do we need to change them? We need to maintain them anyway.
> >
> > # Functions
> >
> > About Hive functions, I don't think it is a limitation, we are using
> > HiveModule to be compatible with Hive, right? So it is a solution instead
> > of a limitation.
> >
> > # Keywords
> >
> > Do you think there will be a keyword problem? Or can we be 100%
> compatible
> > with Hive?
> >
> > On the whole, the FLIP looks very good and I'm looking forward to it.
> >
> > Best,
> > Jingsong
> >
> > On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
> > <[hidden email]> wrote:
> >
> > > Thanks for the further info and explanations! I have no other concerns.
> > >
> > > Best,
> > > Zhijiang
> > >
> > >
> > > ------------------------------------------------------------------
> > > From:Rui Li <[hidden email]>
> > > Send Time:2020年12月10日(星期四) 20:35
> > > To:dev <[hidden email]>; Zhijiang <[hidden email]>
> > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > >
> > > Hi Zhijiang,
> > >
> > > Glad to know you're interested in this FLIP. I wouldn't claim 100%
> > > compatibility with this FLIP. That's because Flink doesn't have the
> > > functionalities to support all Hive's features. To list a few examples:
> > >
> > >    1. Hive allows users to process data with shell scripts -- very
> > similar
> > >    to UDFs [1]
> > >    2. Users can compile inline Groovy UDFs and use them in queries [2]
> > >    3. Users can dynamically add/delete jars, or even execute arbitrary
> > >    shell command [3]
> > >
> > > These features cannot be supported merely by a parser/planner, and it's
> > > open to discussion whether Flink even should support them at all.
> > >
> > > So the ultimate goal of this FLIP is to provide Hive syntax
> compatibility
> > > to features that are already available in Flink, which I believe will
> > cover
> > > most common use cases.
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
> > > [2]
> > >
> > >
> >
> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
> > > [3]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
> > >
> > > On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <[hidden email]
> > > .invalid>
> > > wrote:
> > >
> > > > Thanks for launching the discussion and the FLIP, Rui!
> > > >
> > > > It is really nice to see our continuous efforts for compatibility
> with
> > > > Hive and benefiting users in this area.
> > > > I am only curious that are there any other compatible limitations for
> > > Hive
> > > > users after this FLIP? Or can I say that the Hive compatibility is
> > > > completely resolved after this FLIP?
> > > > I am interested in the ultimate goal in this area. Maybe it is out of
> > > this
> > > > FLIP scope, but still wish some insights from you if possible. :)
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > From:Rui Li <[hidden email]>
> > > > Send Time:2020年12月10日(星期四) 16:46
> > > > To:dev <[hidden email]>
> > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > > >
> > > > Thanks Kurt for your inputs!
> > > >
> > > > I agree we should extend Hive code to support non-Hive tables. I have
> > > > updated the wiki page to remove the limitations you mentioned, and
> add
> > > > typical use cases in the "Motivation" section.
> > > >
> > > > Regarding comment #b, the interface is defined in
> > > flink-table-planner-blink
> > > > and only used by the blink planner. So I think "BlinkParserFactory"
> is
> > a
> > > > better name, WDYT?
> > > >
> > > > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]> wrote:
> > > >
> > > > > Thanks Rui for starting this discussion.
> > > > >
> > > > > I can see the benefit that we improve hive compatibility further,
> as
> > > > quite
> > > > > some users are asking for this
> > > > > feature in mailing lists [1][2][3] and some online chatting tools
> > such
> > > as
> > > > > DingTalk.
> > > > >
> > > > > I have 3 comments regarding to the design doc:
> > > > >
> > > > > a) Could you add a section to describe the typical use case you
> want
> > to
> > > > > support after this feature is introduced?
> > > > > In that way, users can also have an impression how to use this
> > feature
> > > > and
> > > > > what the behavior and outcome will be.
> > > > >
> > > > > b) Regarding the naming: "BlinkParserFactory", I suggest renaming
> it
> > to
> > > > > "FlinkParserFactory".
> > > > >
> > > > > c) About the two limitations you mentioned:
> > > > >     1. Only works with Hive tables and the current catalog needs to
> > be
> > > a
> > > > > HiveCatalog.
> > > > >     2. Queries cannot involve tables/views from multiple catalogs.
> > > > > I assume this is because hive parser and analyzer doesn't support
> > > > > referring to a name with "x.y.z" fashion? Since
> > > > > we can control all the behaviors by leveraging the codes hive
> > currently
> > > > > use. Is it possible that we can remove such
> > > > > limitations? The reason is I'm not sure if users can make the whole
> > > story
> > > > > work purely depending on hive catalog (that's
> > > > > the reason why I gave comment #a). If multiple catalogs are
> involved,
> > > > with
> > > > > this limitation I don't think any meaningful
> > > > > pipeline could be built. For example, users want to stream data
> from
> > > > Kafka
> > > > > to Hive, fully use hive's dialect including
> > > > > query part. The kafka table could be a temporary table or saved in
> > > > default
> > > > > memory catalog.
> > > > >
> > > > >
> > > > > [1]
> > http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > > > > [2]
> > > >
> http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > > > > [3]
> > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]>
> > wrote:
> > > > >
> > > > > > Hi guys,
> > > > > >
> > > > > > I'd like to start a discussion about providing HiveQL
> compatibility
> > > for
> > > > > > users connecting to a hive warehouse. FLIP-123 has already
> covered
> > > most
> > > > > > DDLs. So now it's time to complement the other big missing part
> --
> > > > > queries.
> > > > > > With FLIP-152, the hive dialect covers more scenarios and makes
> it
> > > even
> > > > > > easier for users to migrate to Flink. More details are in the
> FLIP
> > > wiki
> > > > > > page [1]. Looking forward to your feedback!
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > > > > >
> > > > > > --
> > > > > > Best regards!
> > > > > > Rui Li
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> > >
> >
> > --
> > Best, Jingsong Lee
> >
>
>
> --
> Best regards!
> Rui Li
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Jark Wu-2
Thanks Rui for the great proposal, I believe this can be very attractive
for many Hive users.

The FLIP looks good to me in general, I only have some minor comments:

1) BlinkParserFactory
IIUC, BlinkParserFactory is located in the flink-table-api-java module with
the Parser interface there.
I suggest renaming it to `ParserFactory`, because it creates Parser instead
of BlinkParser.
And the implementations can be `HiveParserFactory` and
`FlinkParserFactory`.
I think we should avoid the `Blink` keyword in interfaces, blink planner is
already the default planner and
the old planner will be removed in the near future. There will be no
`blink` in the future then.

2) "create a new instance each time getParser is called"
Finding parser for every time getParser is called sounds heavy to me. I
think we can improve this by simplify
caching the Parser instance,  and creating a new one if current sql-dialect
is different from the cached Parser.

3) Hive version
How much code needs to be done to support new features in 3.x based on 2.x?
Is this also included in this FLIP/release?
I don't fully understand this because the FLIP says "we can use a newer
version to support older versions."

Best,
Jark

On Wed, 3 Feb 2021 at 11:48, godfrey he <[hidden email]> wrote:

> Thanks for bringing up the discussion, Rui!
>
> Regarding the DDL part in the "Introduce HiveParser" section,
> I would like to choose the second option. Because if we could
> use one hive parser to parse all hive SQLs, we need not to copy
> Calcite parser code, and the framework and the code will be very simple.
>
> Regarding the "Go Beyond Hive" section, is that the scope of this FLIP ?
> Could you list all the extensions and give some examples ?
>
> One minor suggestion about the name of ParserImplFactory.
> How about renaming ParserImplFactory to DefaultParserFactory ?
>
> Best,
> Godfrey
>
> Rui Li <[hidden email]> 于2021年2月3日周三 上午11:16写道:
>
> > Hi Jingsong,
> >
> > Thanks for your comments and they're very good questions.
> >
> > Regarding # Version, we need to do some tradeoff here. Choosing the
> latest
> > 3.x will cover all the features we want to support. But as you said, 3.x
> > and 2.x can have some differences and requires more efforts to support
> > lower versions. I decided to pick 2.x and evolve from there to support
> new
> > features in 3.x. Because I think most hive users, especially those who
> are
> > likely to be interested in this feature, are still using 2.x or even 1.x.
> > So the priority is to cover 2.x and 1.x first.
> >
> > Regarding # Hive Codes, in my PoC, I simply copy the code and make as few
> > changes as possible. I believe we can do some clean up or refactor to
> > reduce it. With that in mind, I expect it to be over 10k lines of java
> > code, and even more if we count ANTLR grammar files as well.
> >
> > Regarding # Functions, you're right that HiveModule is more of a solution
> > than limitation. I just want to emphasize that HiveModule and hive
> dialect
> > need to be used together to achieve better compatibility.
> >
> > Regarding # Keywords, new hive versions can have more reserved keywords
> > than old versions. Since we're based on hive 2.x code, it may not provide
> > 100% keyword-compatibility to 1.x users. But I expect it to be good
> enough
> > for most cases. If not, we can provide different grammar files for lower
> > versions.
> >
> > On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li <[hidden email]>
> wrote:
> >
> > > Thanks Rui for the proposal, I think this FLIP is required by many
> users,
> > > and it is very good to traditional Hive users. I have some confusion:
> > >
> > > # Version
> > >
> > > Which Hive version do you want to choose? Maybe, Hive 3.X and Hive 2.X
> > have
> > > some differences?
> > >
> > > # Hive Codes
> > >
> > > Can you evaluate how much code we need to copy to our
> > flink-hive-connector?
> > > Do we need to change them? We need to maintain them anyway.
> > >
> > > # Functions
> > >
> > > About Hive functions, I don't think it is a limitation, we are using
> > > HiveModule to be compatible with Hive, right? So it is a solution
> instead
> > > of a limitation.
> > >
> > > # Keywords
> > >
> > > Do you think there will be a keyword problem? Or can we be 100%
> > compatible
> > > with Hive?
> > >
> > > On the whole, the FLIP looks very good and I'm looking forward to it.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
> > > <[hidden email]> wrote:
> > >
> > > > Thanks for the further info and explanations! I have no other
> concerns.
> > > >
> > > > Best,
> > > > Zhijiang
> > > >
> > > >
> > > > ------------------------------------------------------------------
> > > > From:Rui Li <[hidden email]>
> > > > Send Time:2020年12月10日(星期四) 20:35
> > > > To:dev <[hidden email]>; Zhijiang <[hidden email]>
> > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > > >
> > > > Hi Zhijiang,
> > > >
> > > > Glad to know you're interested in this FLIP. I wouldn't claim 100%
> > > > compatibility with this FLIP. That's because Flink doesn't have the
> > > > functionalities to support all Hive's features. To list a few
> examples:
> > > >
> > > >    1. Hive allows users to process data with shell scripts -- very
> > > similar
> > > >    to UDFs [1]
> > > >    2. Users can compile inline Groovy UDFs and use them in queries
> [2]
> > > >    3. Users can dynamically add/delete jars, or even execute
> arbitrary
> > > >    shell command [3]
> > > >
> > > > These features cannot be supported merely by a parser/planner, and
> it's
> > > > open to discussion whether Flink even should support them at all.
> > > >
> > > > So the ultimate goal of this FLIP is to provide Hive syntax
> > compatibility
> > > > to features that are already available in Flink, which I believe will
> > > cover
> > > > most common use cases.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
> > > > [2]
> > > >
> > > >
> > >
> >
> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
> > > > [3]
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
> > > >
> > > > On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <[hidden email]
> > > > .invalid>
> > > > wrote:
> > > >
> > > > > Thanks for launching the discussion and the FLIP, Rui!
> > > > >
> > > > > It is really nice to see our continuous efforts for compatibility
> > with
> > > > > Hive and benefiting users in this area.
> > > > > I am only curious that are there any other compatible limitations
> for
> > > > Hive
> > > > > users after this FLIP? Or can I say that the Hive compatibility is
> > > > > completely resolved after this FLIP?
> > > > > I am interested in the ultimate goal in this area. Maybe it is out
> of
> > > > this
> > > > > FLIP scope, but still wish some insights from you if possible. :)
> > > > >
> > > > > Best,
> > > > > Zhijiang
> > > > >
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > From:Rui Li <[hidden email]>
> > > > > Send Time:2020年12月10日(星期四) 16:46
> > > > > To:dev <[hidden email]>
> > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > > > >
> > > > > Thanks Kurt for your inputs!
> > > > >
> > > > > I agree we should extend Hive code to support non-Hive tables. I
> have
> > > > > updated the wiki page to remove the limitations you mentioned, and
> > add
> > > > > typical use cases in the "Motivation" section.
> > > > >
> > > > > Regarding comment #b, the interface is defined in
> > > > flink-table-planner-blink
> > > > > and only used by the blink planner. So I think "BlinkParserFactory"
> > is
> > > a
> > > > > better name, WDYT?
> > > > >
> > > > > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]>
> wrote:
> > > > >
> > > > > > Thanks Rui for starting this discussion.
> > > > > >
> > > > > > I can see the benefit that we improve hive compatibility further,
> > as
> > > > > quite
> > > > > > some users are asking for this
> > > > > > feature in mailing lists [1][2][3] and some online chatting tools
> > > such
> > > > as
> > > > > > DingTalk.
> > > > > >
> > > > > > I have 3 comments regarding to the design doc:
> > > > > >
> > > > > > a) Could you add a section to describe the typical use case you
> > want
> > > to
> > > > > > support after this feature is introduced?
> > > > > > In that way, users can also have an impression how to use this
> > > feature
> > > > > and
> > > > > > what the behavior and outcome will be.
> > > > > >
> > > > > > b) Regarding the naming: "BlinkParserFactory", I suggest renaming
> > it
> > > to
> > > > > > "FlinkParserFactory".
> > > > > >
> > > > > > c) About the two limitations you mentioned:
> > > > > >     1. Only works with Hive tables and the current catalog needs
> to
> > > be
> > > > a
> > > > > > HiveCatalog.
> > > > > >     2. Queries cannot involve tables/views from multiple
> catalogs.
> > > > > > I assume this is because hive parser and analyzer doesn't support
> > > > > > referring to a name with "x.y.z" fashion? Since
> > > > > > we can control all the behaviors by leveraging the codes hive
> > > currently
> > > > > > use. Is it possible that we can remove such
> > > > > > limitations? The reason is I'm not sure if users can make the
> whole
> > > > story
> > > > > > work purely depending on hive catalog (that's
> > > > > > the reason why I gave comment #a). If multiple catalogs are
> > involved,
> > > > > with
> > > > > > this limitation I don't think any meaningful
> > > > > > pipeline could be built. For example, users want to stream data
> > from
> > > > > Kafka
> > > > > > to Hive, fully use hive's dialect including
> > > > > > query part. The kafka table could be a temporary table or saved
> in
> > > > > default
> > > > > > memory catalog.
> > > > > >
> > > > > >
> > > > > > [1]
> > > http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > > > > > [2]
> > > > >
> > http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > > > > > [3]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> > > > > >
> > > > > > Best,
> > > > > > Kurt
> > > > > >
> > > > > >
> > > > > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]>
> > > wrote:
> > > > > >
> > > > > > > Hi guys,
> > > > > > >
> > > > > > > I'd like to start a discussion about providing HiveQL
> > compatibility
> > > > for
> > > > > > > users connecting to a hive warehouse. FLIP-123 has already
> > covered
> > > > most
> > > > > > > DDLs. So now it's time to complement the other big missing part
> > --
> > > > > > queries.
> > > > > > > With FLIP-152, the hive dialect covers more scenarios and makes
> > it
> > > > even
> > > > > > > easier for users to migrate to Flink. More details are in the
> > FLIP
> > > > wiki
> > > > > > > page [1]. Looking forward to your feedback!
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > > > > > >
> > > > > > > --
> > > > > > > Best regards!
> > > > > > > Rui Li
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards!
> > > > > Rui Li
> > > > >
> > > > >
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > > >
> > >
> > > --
> > > Best, Jingsong Lee
> > >
> >
> >
> > --
> > Best regards!
> > Rui Li
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Rui Li
Thanks Godfrey & Jark for your comments!

Since both of you mentioned the naming of the parser factory, I'll answer
this question first.

I only intend to introduce the pluggable Parser for blink planner. So
the BlinkParserFactory will be added to the flink-table-planner-blink
module. We already have several factory classes there such as
"BlinkPlannerFactory" and "BlinkExecutorFactory". So it seems
"BlinkParserFactory" is following the naming convention of existing
classes. But I'm also fine with "ParserFactory" if we're going to remove
the "blink" prefix anyway.
For the concrete implementation, I prefer "DefaultParserFactory". Having a
"FlinkParserFactory" in blink planner seems a little weird. We can revisit
this when we really get rid of the legacy planner.

To answer the other questions:

@Godfrey
Yes I agree DDLs should also be handled by the HiveParser. That'll give
users more consistent experience.

The "Go Beyond Hive" section is mainly about decoupling the dialect from
hive objects, so that users can write HiveQL to query non-hive tables or
call non-hive functions. I think it's in the scope of this FLIP, but it
doesn't have to be done in the first release. I'll update the FLIP to be
more specific about it.

@Jark
Creating a new Parser only when dialect changes is a good idea. I'll update
the FLIP to mention that.

The 3.x new features we need to support in this FLIP include:

   1. PK & NOT NULL constraints
   2. Alter DB to change location

These DDLs are already covered by FLIP-123. Missing them will be a
regression.

Other new features can be identified with more tests and user feedback, and
will be supported incrementally.

By "we can use a newer version to support older versions", I mean syntax in
the new version is a superset of the old one. But we still need extra
efforts to make sure the copied code works with different hive versions,
e.g. more shim methods are required. So I'd rather start with a popular
version than the newest version.


On Wed, Feb 3, 2021 at 11:51 AM Jark Wu <[hidden email]> wrote:

> Thanks Rui for the great proposal, I believe this can be very attractive
> for many Hive users.
>
> The FLIP looks good to me in general, I only have some minor comments:
>
> 1) BlinkParserFactory
> IIUC, BlinkParserFactory is located in the flink-table-api-java module with
> the Parser interface there.
> I suggest renaming it to `ParserFactory`, because it creates Parser instead
> of BlinkParser.
> And the implementations can be `HiveParserFactory` and
> `FlinkParserFactory`.
> I think we should avoid the `Blink` keyword in interfaces, blink planner is
> already the default planner and
> the old planner will be removed in the near future. There will be no
> `blink` in the future then.
>
> 2) "create a new instance each time getParser is called"
> Finding parser for every time getParser is called sounds heavy to me. I
> think we can improve this by simplify
> caching the Parser instance,  and creating a new one if current sql-dialect
> is different from the cached Parser.
>
> 3) Hive version
> How much code needs to be done to support new features in 3.x based on 2.x?
> Is this also included in this FLIP/release?
> I don't fully understand this because the FLIP says "we can use a newer
> version to support older versions."
>
> Best,
> Jark
>
> On Wed, 3 Feb 2021 at 11:48, godfrey he <[hidden email]> wrote:
>
> > Thanks for bringing up the discussion, Rui!
> >
> > Regarding the DDL part in the "Introduce HiveParser" section,
> > I would like to choose the second option. Because if we could
> > use one hive parser to parse all hive SQLs, we need not to copy
> > Calcite parser code, and the framework and the code will be very simple.
> >
> > Regarding the "Go Beyond Hive" section, is that the scope of this FLIP ?
> > Could you list all the extensions and give some examples ?
> >
> > One minor suggestion about the name of ParserImplFactory.
> > How about renaming ParserImplFactory to DefaultParserFactory ?
> >
> > Best,
> > Godfrey
> >
> > Rui Li <[hidden email]> 于2021年2月3日周三 上午11:16写道:
> >
> > > Hi Jingsong,
> > >
> > > Thanks for your comments and they're very good questions.
> > >
> > > Regarding # Version, we need to do some tradeoff here. Choosing the
> > latest
> > > 3.x will cover all the features we want to support. But as you said,
> 3.x
> > > and 2.x can have some differences and requires more efforts to support
> > > lower versions. I decided to pick 2.x and evolve from there to support
> > new
> > > features in 3.x. Because I think most hive users, especially those who
> > are
> > > likely to be interested in this feature, are still using 2.x or even
> 1.x.
> > > So the priority is to cover 2.x and 1.x first.
> > >
> > > Regarding # Hive Codes, in my PoC, I simply copy the code and make as
> few
> > > changes as possible. I believe we can do some clean up or refactor to
> > > reduce it. With that in mind, I expect it to be over 10k lines of java
> > > code, and even more if we count ANTLR grammar files as well.
> > >
> > > Regarding # Functions, you're right that HiveModule is more of a
> solution
> > > than limitation. I just want to emphasize that HiveModule and hive
> > dialect
> > > need to be used together to achieve better compatibility.
> > >
> > > Regarding # Keywords, new hive versions can have more reserved keywords
> > > than old versions. Since we're based on hive 2.x code, it may not
> provide
> > > 100% keyword-compatibility to 1.x users. But I expect it to be good
> > enough
> > > for most cases. If not, we can provide different grammar files for
> lower
> > > versions.
> > >
> > > On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li <[hidden email]>
> > wrote:
> > >
> > > > Thanks Rui for the proposal, I think this FLIP is required by many
> > users,
> > > > and it is very good to traditional Hive users. I have some confusion:
> > > >
> > > > # Version
> > > >
> > > > Which Hive version do you want to choose? Maybe, Hive 3.X and Hive
> 2.X
> > > have
> > > > some differences?
> > > >
> > > > # Hive Codes
> > > >
> > > > Can you evaluate how much code we need to copy to our
> > > flink-hive-connector?
> > > > Do we need to change them? We need to maintain them anyway.
> > > >
> > > > # Functions
> > > >
> > > > About Hive functions, I don't think it is a limitation, we are using
> > > > HiveModule to be compatible with Hive, right? So it is a solution
> > instead
> > > > of a limitation.
> > > >
> > > > # Keywords
> > > >
> > > > Do you think there will be a keyword problem? Or can we be 100%
> > > compatible
> > > > with Hive?
> > > >
> > > > On the whole, the FLIP looks very good and I'm looking forward to it.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
> > > > <[hidden email]> wrote:
> > > >
> > > > > Thanks for the further info and explanations! I have no other
> > concerns.
> > > > >
> > > > > Best,
> > > > > Zhijiang
> > > > >
> > > > >
> > > > > ------------------------------------------------------------------
> > > > > From:Rui Li <[hidden email]>
> > > > > Send Time:2020年12月10日(星期四) 20:35
> > > > > To:dev <[hidden email]>; Zhijiang <
> [hidden email]>
> > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > > > >
> > > > > Hi Zhijiang,
> > > > >
> > > > > Glad to know you're interested in this FLIP. I wouldn't claim 100%
> > > > > compatibility with this FLIP. That's because Flink doesn't have the
> > > > > functionalities to support all Hive's features. To list a few
> > examples:
> > > > >
> > > > >    1. Hive allows users to process data with shell scripts -- very
> > > > similar
> > > > >    to UDFs [1]
> > > > >    2. Users can compile inline Groovy UDFs and use them in queries
> > [2]
> > > > >    3. Users can dynamically add/delete jars, or even execute
> > arbitrary
> > > > >    shell command [3]
> > > > >
> > > > > These features cannot be supported merely by a parser/planner, and
> > it's
> > > > > open to discussion whether Flink even should support them at all.
> > > > >
> > > > > So the ultimate goal of this FLIP is to provide Hive syntax
> > > compatibility
> > > > > to features that are already available in Flink, which I believe
> will
> > > > cover
> > > > > most common use cases.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
> > > > > [3]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
> > > > >
> > > > > On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <
> [hidden email]
> > > > > .invalid>
> > > > > wrote:
> > > > >
> > > > > > Thanks for launching the discussion and the FLIP, Rui!
> > > > > >
> > > > > > It is really nice to see our continuous efforts for compatibility
> > > with
> > > > > > Hive and benefiting users in this area.
> > > > > > I am only curious that are there any other compatible limitations
> > for
> > > > > Hive
> > > > > > users after this FLIP? Or can I say that the Hive compatibility
> is
> > > > > > completely resolved after this FLIP?
> > > > > > I am interested in the ultimate goal in this area. Maybe it is
> out
> > of
> > > > > this
> > > > > > FLIP scope, but still wish some insights from you if possible. :)
> > > > > >
> > > > > > Best,
> > > > > > Zhijiang
> > > > > >
> > > > > >
> > > > > >
> ------------------------------------------------------------------
> > > > > > From:Rui Li <[hidden email]>
> > > > > > Send Time:2020年12月10日(星期四) 16:46
> > > > > > To:dev <[hidden email]>
> > > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
> > > > > >
> > > > > > Thanks Kurt for your inputs!
> > > > > >
> > > > > > I agree we should extend Hive code to support non-Hive tables. I
> > have
> > > > > > updated the wiki page to remove the limitations you mentioned,
> and
> > > add
> > > > > > typical use cases in the "Motivation" section.
> > > > > >
> > > > > > Regarding comment #b, the interface is defined in
> > > > > flink-table-planner-blink
> > > > > > and only used by the blink planner. So I think
> "BlinkParserFactory"
> > > is
> > > > a
> > > > > > better name, WDYT?
> > > > > >
> > > > > > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]>
> > wrote:
> > > > > >
> > > > > > > Thanks Rui for starting this discussion.
> > > > > > >
> > > > > > > I can see the benefit that we improve hive compatibility
> further,
> > > as
> > > > > > quite
> > > > > > > some users are asking for this
> > > > > > > feature in mailing lists [1][2][3] and some online chatting
> tools
> > > > such
> > > > > as
> > > > > > > DingTalk.
> > > > > > >
> > > > > > > I have 3 comments regarding to the design doc:
> > > > > > >
> > > > > > > a) Could you add a section to describe the typical use case you
> > > want
> > > > to
> > > > > > > support after this feature is introduced?
> > > > > > > In that way, users can also have an impression how to use this
> > > > feature
> > > > > > and
> > > > > > > what the behavior and outcome will be.
> > > > > > >
> > > > > > > b) Regarding the naming: "BlinkParserFactory", I suggest
> renaming
> > > it
> > > > to
> > > > > > > "FlinkParserFactory".
> > > > > > >
> > > > > > > c) About the two limitations you mentioned:
> > > > > > >     1. Only works with Hive tables and the current catalog
> needs
> > to
> > > > be
> > > > > a
> > > > > > > HiveCatalog.
> > > > > > >     2. Queries cannot involve tables/views from multiple
> > catalogs.
> > > > > > > I assume this is because hive parser and analyzer doesn't
> support
> > > > > > > referring to a name with "x.y.z" fashion? Since
> > > > > > > we can control all the behaviors by leveraging the codes hive
> > > > currently
> > > > > > > use. Is it possible that we can remove such
> > > > > > > limitations? The reason is I'm not sure if users can make the
> > whole
> > > > > story
> > > > > > > work purely depending on hive catalog (that's
> > > > > > > the reason why I gave comment #a). If multiple catalogs are
> > > involved,
> > > > > > with
> > > > > > > this limitation I don't think any meaningful
> > > > > > > pipeline could be built. For example, users want to stream data
> > > from
> > > > > > Kafka
> > > > > > > to Hive, fully use hive's dialect including
> > > > > > > query part. The kafka table could be a temporary table or saved
> > in
> > > > > > default
> > > > > > > memory catalog.
> > > > > > >
> > > > > > >
> > > > > > > [1]
> > > > http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
> > > > > > > [2]
> > > > > >
> > > http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
> > > > > > > [3]
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
> > > > > > >
> > > > > > > Best,
> > > > > > > Kurt
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi guys,
> > > > > > > >
> > > > > > > > I'd like to start a discussion about providing HiveQL
> > > compatibility
> > > > > for
> > > > > > > > users connecting to a hive warehouse. FLIP-123 has already
> > > covered
> > > > > most
> > > > > > > > DDLs. So now it's time to complement the other big missing
> part
> > > --
> > > > > > > queries.
> > > > > > > > With FLIP-152, the hive dialect covers more scenarios and
> makes
> > > it
> > > > > even
> > > > > > > > easier for users to migrate to Flink. More details are in the
> > > FLIP
> > > > > wiki
> > > > > > > > page [1]. Looking forward to your feedback!
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards!
> > > > > > > > Rui Li
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards!
> > > > > > Rui Li
> > > > > >
> > > > > >
> > > > >
> > > > > --
> > > > > Best regards!
> > > > > Rui Li
> > > > >
> > > > >
> > > >
> > > > --
> > > > Best, Jingsong Lee
> > > >
> > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
>


--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility

Rui Li
Hi Jingsong, Godfrey and Jark,

I have updated the FLIP to incorporate your suggestions. Please let me know
if you have further comments. Thank you.

On Wed, Feb 3, 2021 at 4:51 PM Rui Li <[hidden email]> wrote:

> Thanks Godfrey & Jark for your comments!
>
> Since both of you mentioned the naming of the parser factory, I'll answer
> this question first.
>
> I only intend to introduce the pluggable Parser for blink planner. So
> the BlinkParserFactory will be added to the flink-table-planner-blink
> module. We already have several factory classes there such as
> "BlinkPlannerFactory" and "BlinkExecutorFactory". So it seems
> "BlinkParserFactory" is following the naming convention of existing
> classes. But I'm also fine with "ParserFactory" if we're going to remove
> the "blink" prefix anyway.
> For the concrete implementation, I prefer "DefaultParserFactory". Having a
> "FlinkParserFactory" in blink planner seems a little weird. We can revisit
> this when we really get rid of the legacy planner.
>
> To answer the other questions:
>
> @Godfrey
> Yes I agree DDLs should also be handled by the HiveParser. That'll give
> users more consistent experience.
>
> The "Go Beyond Hive" section is mainly about decoupling the dialect from
> hive objects, so that users can write HiveQL to query non-hive tables or
> call non-hive functions. I think it's in the scope of this FLIP, but it
> doesn't have to be done in the first release. I'll update the FLIP to be
> more specific about it.
>
> @Jark
> Creating a new Parser only when dialect changes is a good idea. I'll
> update the FLIP to mention that.
>
> The 3.x new features we need to support in this FLIP include:
>
>    1. PK & NOT NULL constraints
>    2. Alter DB to change location
>
> These DDLs are already covered by FLIP-123. Missing them will be a
> regression.
>
> Other new features can be identified with more tests and user feedback,
> and will be supported incrementally.
>
> By "we can use a newer version to support older versions", I mean syntax
> in the new version is a superset of the old one. But we still need extra
> efforts to make sure the copied code works with different hive versions,
> e.g. more shim methods are required. So I'd rather start with a popular
> version than the newest version.
>
>
> On Wed, Feb 3, 2021 at 11:51 AM Jark Wu <[hidden email]> wrote:
>
>> Thanks Rui for the great proposal, I believe this can be very attractive
>> for many Hive users.
>>
>> The FLIP looks good to me in general, I only have some minor comments:
>>
>> 1) BlinkParserFactory
>> IIUC, BlinkParserFactory is located in the flink-table-api-java module
>> with
>> the Parser interface there.
>> I suggest renaming it to `ParserFactory`, because it creates Parser
>> instead
>> of BlinkParser.
>> And the implementations can be `HiveParserFactory` and
>> `FlinkParserFactory`.
>> I think we should avoid the `Blink` keyword in interfaces, blink planner
>> is
>> already the default planner and
>> the old planner will be removed in the near future. There will be no
>> `blink` in the future then.
>>
>> 2) "create a new instance each time getParser is called"
>> Finding parser for every time getParser is called sounds heavy to me. I
>> think we can improve this by simplify
>> caching the Parser instance,  and creating a new one if current
>> sql-dialect
>> is different from the cached Parser.
>>
>> 3) Hive version
>> How much code needs to be done to support new features in 3.x based on
>> 2.x?
>> Is this also included in this FLIP/release?
>> I don't fully understand this because the FLIP says "we can use a newer
>> version to support older versions."
>>
>> Best,
>> Jark
>>
>> On Wed, 3 Feb 2021 at 11:48, godfrey he <[hidden email]> wrote:
>>
>> > Thanks for bringing up the discussion, Rui!
>> >
>> > Regarding the DDL part in the "Introduce HiveParser" section,
>> > I would like to choose the second option. Because if we could
>> > use one hive parser to parse all hive SQLs, we need not to copy
>> > Calcite parser code, and the framework and the code will be very simple.
>> >
>> > Regarding the "Go Beyond Hive" section, is that the scope of this FLIP ?
>> > Could you list all the extensions and give some examples ?
>> >
>> > One minor suggestion about the name of ParserImplFactory.
>> > How about renaming ParserImplFactory to DefaultParserFactory ?
>> >
>> > Best,
>> > Godfrey
>> >
>> > Rui Li <[hidden email]> 于2021年2月3日周三 上午11:16写道:
>> >
>> > > Hi Jingsong,
>> > >
>> > > Thanks for your comments and they're very good questions.
>> > >
>> > > Regarding # Version, we need to do some tradeoff here. Choosing the
>> > latest
>> > > 3.x will cover all the features we want to support. But as you said,
>> 3.x
>> > > and 2.x can have some differences and requires more efforts to support
>> > > lower versions. I decided to pick 2.x and evolve from there to support
>> > new
>> > > features in 3.x. Because I think most hive users, especially those who
>> > are
>> > > likely to be interested in this feature, are still using 2.x or even
>> 1.x.
>> > > So the priority is to cover 2.x and 1.x first.
>> > >
>> > > Regarding # Hive Codes, in my PoC, I simply copy the code and make as
>> few
>> > > changes as possible. I believe we can do some clean up or refactor to
>> > > reduce it. With that in mind, I expect it to be over 10k lines of java
>> > > code, and even more if we count ANTLR grammar files as well.
>> > >
>> > > Regarding # Functions, you're right that HiveModule is more of a
>> solution
>> > > than limitation. I just want to emphasize that HiveModule and hive
>> > dialect
>> > > need to be used together to achieve better compatibility.
>> > >
>> > > Regarding # Keywords, new hive versions can have more reserved
>> keywords
>> > > than old versions. Since we're based on hive 2.x code, it may not
>> provide
>> > > 100% keyword-compatibility to 1.x users. But I expect it to be good
>> > enough
>> > > for most cases. If not, we can provide different grammar files for
>> lower
>> > > versions.
>> > >
>> > > On Tue, Feb 2, 2021 at 5:10 PM Jingsong Li <[hidden email]>
>> > wrote:
>> > >
>> > > > Thanks Rui for the proposal, I think this FLIP is required by many
>> > users,
>> > > > and it is very good to traditional Hive users. I have some
>> confusion:
>> > > >
>> > > > # Version
>> > > >
>> > > > Which Hive version do you want to choose? Maybe, Hive 3.X and Hive
>> 2.X
>> > > have
>> > > > some differences?
>> > > >
>> > > > # Hive Codes
>> > > >
>> > > > Can you evaluate how much code we need to copy to our
>> > > flink-hive-connector?
>> > > > Do we need to change them? We need to maintain them anyway.
>> > > >
>> > > > # Functions
>> > > >
>> > > > About Hive functions, I don't think it is a limitation, we are using
>> > > > HiveModule to be compatible with Hive, right? So it is a solution
>> > instead
>> > > > of a limitation.
>> > > >
>> > > > # Keywords
>> > > >
>> > > > Do you think there will be a keyword problem? Or can we be 100%
>> > > compatible
>> > > > with Hive?
>> > > >
>> > > > On the whole, the FLIP looks very good and I'm looking forward to
>> it.
>> > > >
>> > > > Best,
>> > > > Jingsong
>> > > >
>> > > > On Fri, Dec 11, 2020 at 11:35 AM Zhijiang
>> > > > <[hidden email]> wrote:
>> > > >
>> > > > > Thanks for the further info and explanations! I have no other
>> > concerns.
>> > > > >
>> > > > > Best,
>> > > > > Zhijiang
>> > > > >
>> > > > >
>> > > > > ------------------------------------------------------------------
>> > > > > From:Rui Li <[hidden email]>
>> > > > > Send Time:2020年12月10日(星期四) 20:35
>> > > > > To:dev <[hidden email]>; Zhijiang <
>> [hidden email]>
>> > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>> > > > >
>> > > > > Hi Zhijiang,
>> > > > >
>> > > > > Glad to know you're interested in this FLIP. I wouldn't claim 100%
>> > > > > compatibility with this FLIP. That's because Flink doesn't have
>> the
>> > > > > functionalities to support all Hive's features. To list a few
>> > examples:
>> > > > >
>> > > > >    1. Hive allows users to process data with shell scripts -- very
>> > > > similar
>> > > > >    to UDFs [1]
>> > > > >    2. Users can compile inline Groovy UDFs and use them in queries
>> > [2]
>> > > > >    3. Users can dynamically add/delete jars, or even execute
>> > arbitrary
>> > > > >    shell command [3]
>> > > > >
>> > > > > These features cannot be supported merely by a parser/planner, and
>> > it's
>> > > > > open to discussion whether Flink even should support them at all.
>> > > > >
>> > > > > So the ultimate goal of this FLIP is to provide Hive syntax
>> > > compatibility
>> > > > > to features that are already available in Flink, which I believe
>> will
>> > > > cover
>> > > > > most common use cases.
>> > > > >
>> > > > > [1]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Transform#LanguageManualTransform-TRANSFORMExamples
>> > > > > [2]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://community.cloudera.com/t5/Community-Articles/Apache-Hive-Groovy-UDF-examples/ta-p/245060
>> > > > > [3]
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveInteractiveShellCommands
>> > > > >
>> > > > > On Thu, Dec 10, 2020 at 6:11 PM Zhijiang <
>> [hidden email]
>> > > > > .invalid>
>> > > > > wrote:
>> > > > >
>> > > > > > Thanks for launching the discussion and the FLIP, Rui!
>> > > > > >
>> > > > > > It is really nice to see our continuous efforts for
>> compatibility
>> > > with
>> > > > > > Hive and benefiting users in this area.
>> > > > > > I am only curious that are there any other compatible
>> limitations
>> > for
>> > > > > Hive
>> > > > > > users after this FLIP? Or can I say that the Hive compatibility
>> is
>> > > > > > completely resolved after this FLIP?
>> > > > > > I am interested in the ultimate goal in this area. Maybe it is
>> out
>> > of
>> > > > > this
>> > > > > > FLIP scope, but still wish some insights from you if possible.
>> :)
>> > > > > >
>> > > > > > Best,
>> > > > > > Zhijiang
>> > > > > >
>> > > > > >
>> > > > > >
>> ------------------------------------------------------------------
>> > > > > > From:Rui Li <[hidden email]>
>> > > > > > Send Time:2020年12月10日(星期四) 16:46
>> > > > > > To:dev <[hidden email]>
>> > > > > > Subject:Re: [DISCUSS] FLIP-152: Hive Query Syntax Compatibility
>> > > > > >
>> > > > > > Thanks Kurt for your inputs!
>> > > > > >
>> > > > > > I agree we should extend Hive code to support non-Hive tables. I
>> > have
>> > > > > > updated the wiki page to remove the limitations you mentioned,
>> and
>> > > add
>> > > > > > typical use cases in the "Motivation" section.
>> > > > > >
>> > > > > > Regarding comment #b, the interface is defined in
>> > > > > flink-table-planner-blink
>> > > > > > and only used by the blink planner. So I think
>> "BlinkParserFactory"
>> > > is
>> > > > a
>> > > > > > better name, WDYT?
>> > > > > >
>> > > > > > On Mon, Dec 7, 2020 at 12:28 PM Kurt Young <[hidden email]>
>> > wrote:
>> > > > > >
>> > > > > > > Thanks Rui for starting this discussion.
>> > > > > > >
>> > > > > > > I can see the benefit that we improve hive compatibility
>> further,
>> > > as
>> > > > > > quite
>> > > > > > > some users are asking for this
>> > > > > > > feature in mailing lists [1][2][3] and some online chatting
>> tools
>> > > > such
>> > > > > as
>> > > > > > > DingTalk.
>> > > > > > >
>> > > > > > > I have 3 comments regarding to the design doc:
>> > > > > > >
>> > > > > > > a) Could you add a section to describe the typical use case
>> you
>> > > want
>> > > > to
>> > > > > > > support after this feature is introduced?
>> > > > > > > In that way, users can also have an impression how to use this
>> > > > feature
>> > > > > > and
>> > > > > > > what the behavior and outcome will be.
>> > > > > > >
>> > > > > > > b) Regarding the naming: "BlinkParserFactory", I suggest
>> renaming
>> > > it
>> > > > to
>> > > > > > > "FlinkParserFactory".
>> > > > > > >
>> > > > > > > c) About the two limitations you mentioned:
>> > > > > > >     1. Only works with Hive tables and the current catalog
>> needs
>> > to
>> > > > be
>> > > > > a
>> > > > > > > HiveCatalog.
>> > > > > > >     2. Queries cannot involve tables/views from multiple
>> > catalogs.
>> > > > > > > I assume this is because hive parser and analyzer doesn't
>> support
>> > > > > > > referring to a name with "x.y.z" fashion? Since
>> > > > > > > we can control all the behaviors by leveraging the codes hive
>> > > > currently
>> > > > > > > use. Is it possible that we can remove such
>> > > > > > > limitations? The reason is I'm not sure if users can make the
>> > whole
>> > > > > story
>> > > > > > > work purely depending on hive catalog (that's
>> > > > > > > the reason why I gave comment #a). If multiple catalogs are
>> > > involved,
>> > > > > > with
>> > > > > > > this limitation I don't think any meaningful
>> > > > > > > pipeline could be built. For example, users want to stream
>> data
>> > > from
>> > > > > > Kafka
>> > > > > > > to Hive, fully use hive's dialect including
>> > > > > > > query part. The kafka table could be a temporary table or
>> saved
>> > in
>> > > > > > default
>> > > > > > > memory catalog.
>> > > > > > >
>> > > > > > >
>> > > > > > > [1]
>> > > > http://apache-flink.147419.n8.nabble.com/calcite-td9059.html#a9118
>> > > > > > > [2]
>> > > > > >
>> > >
>> http://apache-flink.147419.n8.nabble.com/hive-sql-flink-11-td9116.html
>> > > > > > > [3]
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-to-in-Flink-to-support-below-HIVE-SQL-td34162.html
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Kurt
>> > > > > > >
>> > > > > > >
>> > > > > > > On Wed, Dec 2, 2020 at 10:02 PM Rui Li <[hidden email]
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > > Hi guys,
>> > > > > > > >
>> > > > > > > > I'd like to start a discussion about providing HiveQL
>> > > compatibility
>> > > > > for
>> > > > > > > > users connecting to a hive warehouse. FLIP-123 has already
>> > > covered
>> > > > > most
>> > > > > > > > DDLs. So now it's time to complement the other big missing
>> part
>> > > --
>> > > > > > > queries.
>> > > > > > > > With FLIP-152, the hive dialect covers more scenarios and
>> makes
>> > > it
>> > > > > even
>> > > > > > > > easier for users to migrate to Flink. More details are in
>> the
>> > > FLIP
>> > > > > wiki
>> > > > > > > > page [1]. Looking forward to your feedback!
>> > > > > > > >
>> > > > > > > > [1]
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
>> > > > > > > >
>> > > > > > > > --
>> > > > > > > > Best regards!
>> > > > > > > > Rui Li
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > --
>> > > > > > Best regards!
>> > > > > > Rui Li
>> > > > > >
>> > > > > >
>> > > > >
>> > > > > --
>> > > > > Best regards!
>> > > > > Rui Li
>> > > > >
>> > > > >
>> > > >
>> > > > --
>> > > > Best, Jingsong Lee
>> > > >
>> > >
>> > >
>> > > --
>> > > Best regards!
>> > > Rui Li
>> > >
>> >
>>
>
>
> --
> Best regards!
> Rui Li
>


--
Best regards!
Rui Li