[DISCUSS]FLIP-113: Support SQL and planner hints

classic Classic list List threaded Threaded
68 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[DISCUSS]FLIP-113: Support SQL and planner hints

Danny Chan
Hi, fellows ~

I would like to propose the supports for SQL hints for our Flink SQL.

We would support hints syntax as following:

select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', parallelism='24') */
from
emp /*+ INDEX(idx1, idx2) */
join
dept /*+ PROPERTIES(k1='v1', k2='v2') */
on
emp.deptno = dept.deptno

Basically we would support both query hints(after the SELECT keyword) and table hints(after the referenced table name), for 1.11, we plan to only support table hints with a hint probably named PROPERTIES:

table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/

I am looking forward to your comments.

You can access the FLIP here:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints

Best,
Danny Chan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Danny Chan
Note:
we only plan to support table hints in Flink release 1.11, so please focus mainly on the table hints part and just ignore the planner hints, sorry for that mistake ~

Best,
Danny Chan
在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:

> Hi, fellows ~
>
> I would like to propose the supports for SQL hints for our Flink SQL.
>
> We would support hints syntax as following:
>
> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', parallelism='24') */
> from
> emp /*+ INDEX(idx1, idx2) */
> join
> dept /*+ PROPERTIES(k1='v1', k2='v2') */
> on
> emp.deptno = dept.deptno
>
> Basically we would support both query hints(after the SELECT keyword) and table hints(after the referenced table name), for 1.11, we plan to only support table hints with a hint probably named PROPERTIES:
>
> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
>
> I am looking forward to your comments.
>
> You can access the FLIP here:
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
>
> Best,
> Danny Chan
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

DONG, Weike
Hi Danny,

This is a nice feature, +1.

One thing I am interested in but not mentioned in the proposal is the error
handling, as it is quite common for users to write inappropriate hints in
SQL code, if illegal or "bad" hints are given, would the system simply
ignore them or throw exceptions?

Thanks : )

Best,
Weike

On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]> wrote:

> Note:
> we only plan to support table hints in Flink release 1.11, so please focus
> mainly on the table hints part and just ignore the planner hints, sorry for
> that mistake ~
>
> Best,
> Danny Chan
> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> > Hi, fellows ~
> >
> > I would like to propose the supports for SQL hints for our Flink SQL.
> >
> > We would support hints syntax as following:
> >
> > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', parallelism='24') */
> > from
> > emp /*+ INDEX(idx1, idx2) */
> > join
> > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > on
> > emp.deptno = dept.deptno
> >
> > Basically we would support both query hints(after the SELECT keyword)
> and table hints(after the referenced table name), for 1.11, we plan to only
> support table hints with a hint probably named PROPERTIES:
> >
> > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> >
> > I am looking forward to your comments.
> >
> > You can access the FLIP here:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> >
> > Best,
> > Danny Chan
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Jark Wu-2
Thanks Danny for starting the discussion.
+1 for this feature.

If we just focus on the table hints not the query hints in this release,
could you split the FLIP into two FLIPs?
Because it's hard to vote on partial part of a FLIP. You can keep the table
hints proposal in FLIP-113 and move query hints into another FLIP.
So that we can focuse on the table hints in the FLIP.

Thanks,
Jark



On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]> wrote:

> Hi Danny,
>
> This is a nice feature, +1.
>
> One thing I am interested in but not mentioned in the proposal is the error
> handling, as it is quite common for users to write inappropriate hints in
> SQL code, if illegal or "bad" hints are given, would the system simply
> ignore them or throw exceptions?
>
> Thanks : )
>
> Best,
> Weike
>
> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]> wrote:
>
> > Note:
> > we only plan to support table hints in Flink release 1.11, so please
> focus
> > mainly on the table hints part and just ignore the planner hints, sorry
> for
> > that mistake ~
> >
> > Best,
> > Danny Chan
> > 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> > > Hi, fellows ~
> > >
> > > I would like to propose the supports for SQL hints for our Flink SQL.
> > >
> > > We would support hints syntax as following:
> > >
> > > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', parallelism='24') */
> > > from
> > > emp /*+ INDEX(idx1, idx2) */
> > > join
> > > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > on
> > > emp.deptno = dept.deptno
> > >
> > > Basically we would support both query hints(after the SELECT keyword)
> > and table hints(after the referenced table name), for 1.11, we plan to
> only
> > support table hints with a hint probably named PROPERTIES:
> > >
> > > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > >
> > > I am looking forward to your comments.
> > >
> > > You can access the FLIP here:
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > >
> > > Best,
> > > Danny Chan
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Jingsong Li
Hi Danny, +1 for table hints, thanks for driving.

I took a look to FLIP, most of content are talking about query hints. It is
hard to discussion and voting. So +1 to split it as Jark said.

Another thing is configuration that suitable to config with table hints:
"connector.path" and "connector.topic", Are they really suitable for table
hints? Looks weird to me. Because I think these properties are the core of
table.

Best,
Jingsong Lee

On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:

> Thanks Danny for starting the discussion.
> +1 for this feature.
>
> If we just focus on the table hints not the query hints in this release,
> could you split the FLIP into two FLIPs?
> Because it's hard to vote on partial part of a FLIP. You can keep the table
> hints proposal in FLIP-113 and move query hints into another FLIP.
> So that we can focuse on the table hints in the FLIP.
>
> Thanks,
> Jark
>
>
>
> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]> wrote:
>
> > Hi Danny,
> >
> > This is a nice feature, +1.
> >
> > One thing I am interested in but not mentioned in the proposal is the
> error
> > handling, as it is quite common for users to write inappropriate hints in
> > SQL code, if illegal or "bad" hints are given, would the system simply
> > ignore them or throw exceptions?
> >
> > Thanks : )
> >
> > Best,
> > Weike
> >
> > On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]> wrote:
> >
> > > Note:
> > > we only plan to support table hints in Flink release 1.11, so please
> > focus
> > > mainly on the table hints part and just ignore the planner hints, sorry
> > for
> > > that mistake ~
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> > > > Hi, fellows ~
> > > >
> > > > I would like to propose the supports for SQL hints for our Flink SQL.
> > > >
> > > > We would support hints syntax as following:
> > > >
> > > > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', parallelism='24') */
> > > > from
> > > > emp /*+ INDEX(idx1, idx2) */
> > > > join
> > > > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > on
> > > > emp.deptno = dept.deptno
> > > >
> > > > Basically we would support both query hints(after the SELECT keyword)
> > > and table hints(after the referenced table name), for 1.11, we plan to
> > only
> > > support table hints with a hint probably named PROPERTIES:
> > > >
> > > > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > >
> > > > I am looking forward to your comments.
> > > >
> > > > You can access the FLIP here:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > >
> > > > Best,
> > > > Danny Chan
> > >
> >
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Timo Walther-2
Hi Danny,

thanks for the proposal. I agree with Jark and Jingsong. Planner hints
and table hints are orthogonal topics that should be discussed separately.

I share Jingsong's opinion that we should not use planner hints for
passing connector properties. Planner hints should be optional at any
time. They should not include semantics but only affect execution time.
Connector properties are an important part of the query itself.

Have you thought about options such as `SELECT * FROM t(k=v, k=v)`? How
are other vendors deal with this problem?

Regards,
Timo


On 09.03.20 10:37, Jingsong Li wrote:

> Hi Danny, +1 for table hints, thanks for driving.
>
> I took a look to FLIP, most of content are talking about query hints. It is
> hard to discussion and voting. So +1 to split it as Jark said.
>
> Another thing is configuration that suitable to config with table hints:
> "connector.path" and "connector.topic", Are they really suitable for table
> hints? Looks weird to me. Because I think these properties are the core of
> table.
>
> Best,
> Jingsong Lee
>
> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
>
>> Thanks Danny for starting the discussion.
>> +1 for this feature.
>>
>> If we just focus on the table hints not the query hints in this release,
>> could you split the FLIP into two FLIPs?
>> Because it's hard to vote on partial part of a FLIP. You can keep the table
>> hints proposal in FLIP-113 and move query hints into another FLIP.
>> So that we can focuse on the table hints in the FLIP.
>>
>> Thanks,
>> Jark
>>
>>
>>
>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]> wrote:
>>
>>> Hi Danny,
>>>
>>> This is a nice feature, +1.
>>>
>>> One thing I am interested in but not mentioned in the proposal is the
>> error
>>> handling, as it is quite common for users to write inappropriate hints in
>>> SQL code, if illegal or "bad" hints are given, would the system simply
>>> ignore them or throw exceptions?
>>>
>>> Thanks : )
>>>
>>> Best,
>>> Weike
>>>
>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]> wrote:
>>>
>>>> Note:
>>>> we only plan to support table hints in Flink release 1.11, so please
>>> focus
>>>> mainly on the table hints part and just ignore the planner hints, sorry
>>> for
>>>> that mistake ~
>>>>
>>>> Best,
>>>> Danny Chan
>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
>>>>> Hi, fellows ~
>>>>>
>>>>> I would like to propose the supports for SQL hints for our Flink SQL.
>>>>>
>>>>> We would support hints syntax as following:
>>>>>
>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', parallelism='24') */
>>>>> from
>>>>> emp /*+ INDEX(idx1, idx2) */
>>>>> join
>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
>>>>> on
>>>>> emp.deptno = dept.deptno
>>>>>
>>>>> Basically we would support both query hints(after the SELECT keyword)
>>>> and table hints(after the referenced table name), for 1.11, we plan to
>>> only
>>>> support table hints with a hint probably named PROPERTIES:
>>>>>
>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
>>>>>
>>>>> I am looking forward to your comments.
>>>>>
>>>>> You can access the FLIP here:
>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
>>>>>
>>>>> Best,
>>>>> Danny Chan
>>>>
>>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Danny Chan
To Weike: About the Error Handing

To be consistent with other SQL vendors, the default is to log warnings and if there is any error (invalid hint name or options), the hint is just ignored. I have already addressed in the wiki.

To Timo: About the PROPERTIES Table Hint

• The properties hints is also optional, user can pass in an option to override the table properties but this does not mean it is required.
• They should not include semantics: does the properties belong to semantic ? I don't think so, the plan does not change right ? The result set may be affected, but there are already some hints do so, for example, MS-SQL  MAXRECURSION and SNAPSHOT hint [1]
• `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard compared to the hints way(which is included in comments)
• I actually didn't found any vendors to support such grammar, and there is no way to override table level properties dynamically. For normal RDBMS, I think there are no requests for such dynamic parameters because all the table have the same storage and computation and they are almost all batch tables.
• While Flink as a computation engine has many connectors, especially for some message queue like Kafka, we would have a start_offset which is different each time we start the query, such parameters can not be persisted to catalog, because it’s not static, this is actually the background we propose the table hints to indicate such properties dynamically.


To Jark and Jinsong: I have removed the query hints part and change the title.

[1] https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15

Best,
Danny Chan
在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:

> Hi Danny,
>
> thanks for the proposal. I agree with Jark and Jingsong. Planner hints
> and table hints are orthogonal topics that should be discussed separately.
>
> I share Jingsong's opinion that we should not use planner hints for
> passing connector properties. Planner hints should be optional at any
> time. They should not include semantics but only affect execution time.
> Connector properties are an important part of the query itself.
>
> Have you thought about options such as `SELECT * FROM t(k=v, k=v)`? How
> are other vendors deal with this problem?
>
> Regards,
> Timo
>
>
> On 09.03.20 10:37, Jingsong Li wrote:
> > Hi Danny, +1 for table hints, thanks for driving.
> >
> > I took a look to FLIP, most of content are talking about query hints. It is
> > hard to discussion and voting. So +1 to split it as Jark said.
> >
> > Another thing is configuration that suitable to config with table hints:
> > "connector.path" and "connector.topic", Are they really suitable for table
> > hints? Looks weird to me. Because I think these properties are the core of
> > table.
> >
> > Best,
> > Jingsong Lee
> >
> > On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
> >
> > > Thanks Danny for starting the discussion.
> > > +1 for this feature.
> > >
> > > If we just focus on the table hints not the query hints in this release,
> > > could you split the FLIP into two FLIPs?
> > > Because it's hard to vote on partial part of a FLIP. You can keep the table
> > > hints proposal in FLIP-113 and move query hints into another FLIP.
> > > So that we can focuse on the table hints in the FLIP.
> > >
> > > Thanks,
> > > Jark
> > >
> > >
> > >
> > > On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]> wrote:
> > >
> > > > Hi Danny,
> > > >
> > > > This is a nice feature, +1.
> > > >
> > > > One thing I am interested in but not mentioned in the proposal is the
> > > error
> > > > handling, as it is quite common for users to write inappropriate hints in
> > > > SQL code, if illegal or "bad" hints are given, would the system simply
> > > > ignore them or throw exceptions?
> > > >
> > > > Thanks : )
> > > >
> > > > Best,
> > > > Weike
> > > >
> > > > On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]> wrote:
> > > >
> > > > > Note:
> > > > > we only plan to support table hints in Flink release 1.11, so please
> > > > focus
> > > > > mainly on the table hints part and just ignore the planner hints, sorry
> > > > for
> > > > > that mistake ~
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> > > > > > Hi, fellows ~
> > > > > >
> > > > > > I would like to propose the supports for SQL hints for our Flink SQL.
> > > > > >
> > > > > > We would support hints syntax as following:
> > > > > >
> > > > > > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb', parallelism='24') */
> > > > > > from
> > > > > > emp /*+ INDEX(idx1, idx2) */
> > > > > > join
> > > > > > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > > > on
> > > > > > emp.deptno = dept.deptno
> > > > > >
> > > > > > Basically we would support both query hints(after the SELECT keyword)
> > > > > and table hints(after the referenced table name), for 1.11, we plan to
> > > > only
> > > > > support table hints with a hint probably named PROPERTIES:
> > > > > >
> > > > > > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > > > >
> > > > > > I am looking forward to your comments.
> > > > > >
> > > > > > You can access the FLIP here:
> > > > > >
> > > > >
> > > >
> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > >
> > > >
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

wenlong.lwl
Hi Danny, thanks for the proposal. +1 for adding table hints, it is really
a necessary feature for flink sql to integrate with a catalog.

For error handling, I think it would be more natural to throw an
exception when error table hint provided, because the properties in hint
will be merged and used to find the table factory which would cause an
exception when error properties provided, right? On the other hand, unlike
other hints which just affect the way to execute the query, the property
table hint actually affects the result of the query, we should never ignore
the given property hints.

For the format of property hints, currently, in sql client, we accept
properties in format of string only in DDL: 'connector.type'='kafka', I
think the format of properties in hint should be the same as the format we
defined in ddl. What do you think?

Bests,
Wenlong Lyu

On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]> wrote:

> To Weike: About the Error Handing
>
> To be consistent with other SQL vendors, the default is to log warnings
> and if there is any error (invalid hint name or options), the hint is just
> ignored. I have already addressed in the wiki.
>
> To Timo: About the PROPERTIES Table Hint
>
> • The properties hints is also optional, user can pass in an option to
> override the table properties but this does not mean it is required.
> • They should not include semantics: does the properties belong to
> semantic ? I don't think so, the plan does not change right ? The result
> set may be affected, but there are already some hints do so, for example,
> MS-SQL  MAXRECURSION and SNAPSHOT hint [1]
> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
> compared to the hints way(which is included in comments)
> • I actually didn't found any vendors to support such grammar, and there
> is no way to override table level properties dynamically. For normal RDBMS,
> I think there are no requests for such dynamic parameters because all the
> table have the same storage and computation and they are almost all batch
> tables.
> • While Flink as a computation engine has many connectors, especially for
> some message queue like Kafka, we would have a start_offset which is
> different each time we start the query, such parameters can not be
> persisted to catalog, because it’s not static, this is actually the
> background we propose the table hints to indicate such properties
> dynamically.
>
>
> To Jark and Jinsong: I have removed the query hints part and change the
> title.
>
> [1]
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
>
> Best,
> Danny Chan
> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> > Hi Danny,
> >
> > thanks for the proposal. I agree with Jark and Jingsong. Planner hints
> > and table hints are orthogonal topics that should be discussed
> separately.
> >
> > I share Jingsong's opinion that we should not use planner hints for
> > passing connector properties. Planner hints should be optional at any
> > time. They should not include semantics but only affect execution time.
> > Connector properties are an important part of the query itself.
> >
> > Have you thought about options such as `SELECT * FROM t(k=v, k=v)`? How
> > are other vendors deal with this problem?
> >
> > Regards,
> > Timo
> >
> >
> > On 09.03.20 10:37, Jingsong Li wrote:
> > > Hi Danny, +1 for table hints, thanks for driving.
> > >
> > > I took a look to FLIP, most of content are talking about query hints.
> It is
> > > hard to discussion and voting. So +1 to split it as Jark said.
> > >
> > > Another thing is configuration that suitable to config with table
> hints:
> > > "connector.path" and "connector.topic", Are they really suitable for
> table
> > > hints? Looks weird to me. Because I think these properties are the
> core of
> > > table.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > > On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
> > >
> > > > Thanks Danny for starting the discussion.
> > > > +1 for this feature.
> > > >
> > > > If we just focus on the table hints not the query hints in this
> release,
> > > > could you split the FLIP into two FLIPs?
> > > > Because it's hard to vote on partial part of a FLIP. You can keep
> the table
> > > > hints proposal in FLIP-113 and move query hints into another FLIP.
> > > > So that we can focuse on the table hints in the FLIP.
> > > >
> > > > Thanks,
> > > > Jark
> > > >
> > > >
> > > >
> > > > On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]>
> wrote:
> > > >
> > > > > Hi Danny,
> > > > >
> > > > > This is a nice feature, +1.
> > > > >
> > > > > One thing I am interested in but not mentioned in the proposal is
> the
> > > > error
> > > > > handling, as it is quite common for users to write inappropriate
> hints in
> > > > > SQL code, if illegal or "bad" hints are given, would the system
> simply
> > > > > ignore them or throw exceptions?
> > > > >
> > > > > Thanks : )
> > > > >
> > > > > Best,
> > > > > Weike
> > > > >
> > > > > On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]>
> wrote:
> > > > >
> > > > > > Note:
> > > > > > we only plan to support table hints in Flink release 1.11, so
> please
> > > > > focus
> > > > > > mainly on the table hints part and just ignore the planner
> hints, sorry
> > > > > for
> > > > > > that mistake ~
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> > > > > > > Hi, fellows ~
> > > > > > >
> > > > > > > I would like to propose the supports for SQL hints for our
> Flink SQL.
> > > > > > >
> > > > > > > We would support hints syntax as following:
> > > > > > >
> > > > > > > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> parallelism='24') */
> > > > > > > from
> > > > > > > emp /*+ INDEX(idx1, idx2) */
> > > > > > > join
> > > > > > > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > > > > on
> > > > > > > emp.deptno = dept.deptno
> > > > > > >
> > > > > > > Basically we would support both query hints(after the SELECT
> keyword)
> > > > > > and table hints(after the referenced table name), for 1.11, we
> plan to
> > > > > only
> > > > > > support table hints with a hint probably named PROPERTIES:
> > > > > > >
> > > > > > > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > > > > >
> > > > > > > I am looking forward to your comments.
> > > > > > >
> > > > > > > You can access the FLIP here:
> > > > > > >
> > > > > >
> > > > >
> > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > >
> > > > >
> > > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Danny Chan
Thanks Wenlong ~

For PROPERTIES Hint Error handling

Actually we have no way to figure out whether a error prone hint is a PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, we do not know if this hint is a PROPERTIES hint, what we know is that the hint name was not registered in our Flink.

If the user writes the hint name correctly (i.e. PROPERTIES), we did can enforce the validation of the hint options though the pluggable HintOptionChecker.

For PROPERTIES Hint Option Format

For a key value style hint option, the key can be either a simple identifier or a string literal, which means that it’s compatible with our DDL syntax. We support simple identifier because many other hints do not have the component complex keys like the table properties, and we want to unify the parse block.

Best,
Danny Chan
在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]>,写道:

> Hi Danny, thanks for the proposal. +1 for adding table hints, it is really
> a necessary feature for flink sql to integrate with a catalog.
>
> For error handling, I think it would be more natural to throw an
> exception when error table hint provided, because the properties in hint
> will be merged and used to find the table factory which would cause an
> exception when error properties provided, right? On the other hand, unlike
> other hints which just affect the way to execute the query, the property
> table hint actually affects the result of the query, we should never ignore
> the given property hints.
>
> For the format of property hints, currently, in sql client, we accept
> properties in format of string only in DDL: 'connector.type'='kafka', I
> think the format of properties in hint should be the same as the format we
> defined in ddl. What do you think?
>
> Bests,
> Wenlong Lyu
>
> On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]> wrote:
>
> > To Weike: About the Error Handing
> >
> > To be consistent with other SQL vendors, the default is to log warnings
> > and if there is any error (invalid hint name or options), the hint is just
> > ignored. I have already addressed in the wiki.
> >
> > To Timo: About the PROPERTIES Table Hint
> >
> > • The properties hints is also optional, user can pass in an option to
> > override the table properties but this does not mean it is required.
> > • They should not include semantics: does the properties belong to
> > semantic ? I don't think so, the plan does not change right ? The result
> > set may be affected, but there are already some hints do so, for example,
> > MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> > • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
> > compared to the hints way(which is included in comments)
> > • I actually didn't found any vendors to support such grammar, and there
> > is no way to override table level properties dynamically. For normal RDBMS,
> > I think there are no requests for such dynamic parameters because all the
> > table have the same storage and computation and they are almost all batch
> > tables.
> > • While Flink as a computation engine has many connectors, especially for
> > some message queue like Kafka, we would have a start_offset which is
> > different each time we start the query, such parameters can not be
> > persisted to catalog, because it’s not static, this is actually the
> > background we propose the table hints to indicate such properties
> > dynamically.
> >
> >
> > To Jark and Jinsong: I have removed the query hints part and change the
> > title.
> >
> > [1]
> > https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> >
> > Best,
> > Danny Chan
> > 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> > > Hi Danny,
> > >
> > > thanks for the proposal. I agree with Jark and Jingsong. Planner hints
> > > and table hints are orthogonal topics that should be discussed
> > separately.
> > >
> > > I share Jingsong's opinion that we should not use planner hints for
> > > passing connector properties. Planner hints should be optional at any
> > > time. They should not include semantics but only affect execution time.
> > > Connector properties are an important part of the query itself.
> > >
> > > Have you thought about options such as `SELECT * FROM t(k=v, k=v)`? How
> > > are other vendors deal with this problem?
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 09.03.20 10:37, Jingsong Li wrote:
> > > > Hi Danny, +1 for table hints, thanks for driving.
> > > >
> > > > I took a look to FLIP, most of content are talking about query hints.
> > It is
> > > > hard to discussion and voting. So +1 to split it as Jark said.
> > > >
> > > > Another thing is configuration that suitable to config with table
> > hints:
> > > > "connector.path" and "connector.topic", Are they really suitable for
> > table
> > > > hints? Looks weird to me. Because I think these properties are the
> > core of
> > > > table.
> > > >
> > > > Best,
> > > > Jingsong Lee
> > > >
> > > > On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
> > > >
> > > > > Thanks Danny for starting the discussion.
> > > > > +1 for this feature.
> > > > >
> > > > > If we just focus on the table hints not the query hints in this
> > release,
> > > > > could you split the FLIP into two FLIPs?
> > > > > Because it's hard to vote on partial part of a FLIP. You can keep
> > the table
> > > > > hints proposal in FLIP-113 and move query hints into another FLIP.
> > > > > So that we can focuse on the table hints in the FLIP.
> > > > >
> > > > > Thanks,
> > > > > Jark
> > > > >
> > > > >
> > > > >
> > > > > On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]>
> > wrote:
> > > > >
> > > > > > Hi Danny,
> > > > > >
> > > > > > This is a nice feature, +1.
> > > > > >
> > > > > > One thing I am interested in but not mentioned in the proposal is
> > the
> > > > > error
> > > > > > handling, as it is quite common for users to write inappropriate
> > hints in
> > > > > > SQL code, if illegal or "bad" hints are given, would the system
> > simply
> > > > > > ignore them or throw exceptions?
> > > > > >
> > > > > > Thanks : )
> > > > > >
> > > > > > Best,
> > > > > > Weike
> > > > > >
> > > > > > On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]>
> > wrote:
> > > > > >
> > > > > > > Note:
> > > > > > > we only plan to support table hints in Flink release 1.11, so
> > please
> > > > > > focus
> > > > > > > mainly on the table hints part and just ignore the planner
> > hints, sorry
> > > > > > for
> > > > > > > that mistake ~
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> > > > > > > > Hi, fellows ~
> > > > > > > >
> > > > > > > > I would like to propose the supports for SQL hints for our
> > Flink SQL.
> > > > > > > >
> > > > > > > > We would support hints syntax as following:
> > > > > > > >
> > > > > > > > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> > parallelism='24') */
> > > > > > > > from
> > > > > > > > emp /*+ INDEX(idx1, idx2) */
> > > > > > > > join
> > > > > > > > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > > > > > on
> > > > > > > > emp.deptno = dept.deptno
> > > > > > > >
> > > > > > > > Basically we would support both query hints(after the SELECT
> > keyword)
> > > > > > > and table hints(after the referenced table name), for 1.11, we
> > plan to
> > > > > > only
> > > > > > > support table hints with a hint probably named PROPERTIES:
> > > > > > > >
> > > > > > > > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > > > > > >
> > > > > > > > I am looking forward to your comments.
> > > > > > > >
> > > > > > > > You can access the FLIP here:
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Danny Chan
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Timo Walther-2
Hi Danny,

shouldn't FLIP-110[1] solve most of the problems we have around defining
table properties more dynamically without manual schema work? Also
offset definition is easier with such a syntax. They must not be defined
in catalog but could be temporary tables that extend from the original
table.

In general, we should aim to keep the syntax concise and don't provide
too many ways of doing the same thing. Hints should give "hints" but not
affect the actual produced result.

Some connector properties might also change the plan or schema in the
future. E.g. they might also define whether a table source supports
certain push-downs (e.g. predicate push-down).

Dawid is currently working a draft that might makes it possible to
expose a Kafka offset via the schema such that `SELECT * FROM Topic
WHERE offset > 10` would become possible and could be pushed down. But
this is of course, not planned initially.

Regards,
Timo


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE



On 10.03.20 08:34, Danny Chan wrote:

> Thanks Wenlong ~
>
> For PROPERTIES Hint Error handling
>
> Actually we have no way to figure out whether a error prone hint is a PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, we do not know if this hint is a PROPERTIES hint, what we know is that the hint name was not registered in our Flink.
>
> If the user writes the hint name correctly (i.e. PROPERTIES), we did can enforce the validation of the hint options though the pluggable HintOptionChecker.
>
> For PROPERTIES Hint Option Format
>
> For a key value style hint option, the key can be either a simple identifier or a string literal, which means that it’s compatible with our DDL syntax. We support simple identifier because many other hints do not have the component complex keys like the table properties, and we want to unify the parse block.
>
> Best,
> Danny Chan
> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]>,写道:
>> Hi Danny, thanks for the proposal. +1 for adding table hints, it is really
>> a necessary feature for flink sql to integrate with a catalog.
>>
>> For error handling, I think it would be more natural to throw an
>> exception when error table hint provided, because the properties in hint
>> will be merged and used to find the table factory which would cause an
>> exception when error properties provided, right? On the other hand, unlike
>> other hints which just affect the way to execute the query, the property
>> table hint actually affects the result of the query, we should never ignore
>> the given property hints.
>>
>> For the format of property hints, currently, in sql client, we accept
>> properties in format of string only in DDL: 'connector.type'='kafka', I
>> think the format of properties in hint should be the same as the format we
>> defined in ddl. What do you think?
>>
>> Bests,
>> Wenlong Lyu
>>
>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]> wrote:
>>
>>> To Weike: About the Error Handing
>>>
>>> To be consistent with other SQL vendors, the default is to log warnings
>>> and if there is any error (invalid hint name or options), the hint is just
>>> ignored. I have already addressed in the wiki.
>>>
>>> To Timo: About the PROPERTIES Table Hint
>>>
>>> • The properties hints is also optional, user can pass in an option to
>>> override the table properties but this does not mean it is required.
>>> • They should not include semantics: does the properties belong to
>>> semantic ? I don't think so, the plan does not change right ? The result
>>> set may be affected, but there are already some hints do so, for example,
>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1]
>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
>>> compared to the hints way(which is included in comments)
>>> • I actually didn't found any vendors to support such grammar, and there
>>> is no way to override table level properties dynamically. For normal RDBMS,
>>> I think there are no requests for such dynamic parameters because all the
>>> table have the same storage and computation and they are almost all batch
>>> tables.
>>> • While Flink as a computation engine has many connectors, especially for
>>> some message queue like Kafka, we would have a start_offset which is
>>> different each time we start the query, such parameters can not be
>>> persisted to catalog, because it’s not static, this is actually the
>>> background we propose the table hints to indicate such properties
>>> dynamically.
>>>
>>>
>>> To Jark and Jinsong: I have removed the query hints part and change the
>>> title.
>>>
>>> [1]
>>> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
>>>
>>> Best,
>>> Danny Chan
>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
>>>> Hi Danny,
>>>>
>>>> thanks for the proposal. I agree with Jark and Jingsong. Planner hints
>>>> and table hints are orthogonal topics that should be discussed
>>> separately.
>>>>
>>>> I share Jingsong's opinion that we should not use planner hints for
>>>> passing connector properties. Planner hints should be optional at any
>>>> time. They should not include semantics but only affect execution time.
>>>> Connector properties are an important part of the query itself.
>>>>
>>>> Have you thought about options such as `SELECT * FROM t(k=v, k=v)`? How
>>>> are other vendors deal with this problem?
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 09.03.20 10:37, Jingsong Li wrote:
>>>>> Hi Danny, +1 for table hints, thanks for driving.
>>>>>
>>>>> I took a look to FLIP, most of content are talking about query hints.
>>> It is
>>>>> hard to discussion and voting. So +1 to split it as Jark said.
>>>>>
>>>>> Another thing is configuration that suitable to config with table
>>> hints:
>>>>> "connector.path" and "connector.topic", Are they really suitable for
>>> table
>>>>> hints? Looks weird to me. Because I think these properties are the
>>> core of
>>>>> table.
>>>>>
>>>>> Best,
>>>>> Jingsong Lee
>>>>>
>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
>>>>>
>>>>>> Thanks Danny for starting the discussion.
>>>>>> +1 for this feature.
>>>>>>
>>>>>> If we just focus on the table hints not the query hints in this
>>> release,
>>>>>> could you split the FLIP into two FLIPs?
>>>>>> Because it's hard to vote on partial part of a FLIP. You can keep
>>> the table
>>>>>> hints proposal in FLIP-113 and move query hints into another FLIP.
>>>>>> So that we can focuse on the table hints in the FLIP.
>>>>>>
>>>>>> Thanks,
>>>>>> Jark
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]>
>>> wrote:
>>>>>>
>>>>>>> Hi Danny,
>>>>>>>
>>>>>>> This is a nice feature, +1.
>>>>>>>
>>>>>>> One thing I am interested in but not mentioned in the proposal is
>>> the
>>>>>> error
>>>>>>> handling, as it is quite common for users to write inappropriate
>>> hints in
>>>>>>> SQL code, if illegal or "bad" hints are given, would the system
>>> simply
>>>>>>> ignore them or throw exceptions?
>>>>>>>
>>>>>>> Thanks : )
>>>>>>>
>>>>>>> Best,
>>>>>>> Weike
>>>>>>>
>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]>
>>> wrote:
>>>>>>>
>>>>>>>> Note:
>>>>>>>> we only plan to support table hints in Flink release 1.11, so
>>> please
>>>>>>> focus
>>>>>>>> mainly on the table hints part and just ignore the planner
>>> hints, sorry
>>>>>>> for
>>>>>>>> that mistake ~
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Danny Chan
>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
>>>>>>>>> Hi, fellows ~
>>>>>>>>>
>>>>>>>>> I would like to propose the supports for SQL hints for our
>>> Flink SQL.
>>>>>>>>>
>>>>>>>>> We would support hints syntax as following:
>>>>>>>>>
>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
>>> parallelism='24') */
>>>>>>>>> from
>>>>>>>>> emp /*+ INDEX(idx1, idx2) */
>>>>>>>>> join
>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
>>>>>>>>> on
>>>>>>>>> emp.deptno = dept.deptno
>>>>>>>>>
>>>>>>>>> Basically we would support both query hints(after the SELECT
>>> keyword)
>>>>>>>> and table hints(after the referenced table name), for 1.11, we
>>> plan to
>>>>>>> only
>>>>>>>> support table hints with a hint probably named PROPERTIES:
>>>>>>>>>
>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
>>>>>>>>>
>>>>>>>>> I am looking forward to your comments.
>>>>>>>>>
>>>>>>>>> You can access the FLIP here:
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Danny Chan
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Danny Chan
Thanks Timo ~

Personally I would say that offset > 0 and start offset = 10 does not have the same semantic, so from the SQL aspect, we can not implement a “starting offset” hint for query with such a syntax.

And the CREATE TABLE LIKE syntax is a DDL which is just verbose for defining such dynamic parameters even if it could do that, shall we force users to define a temporal table for each query with dynamic params, I would say it’s an awkward solution.

"Hints should give "hints" but not affect the actual produced result.” You mentioned that multiple times and could we give a reason, what’s the problem there if we user the table hints to support “start offset” ? From my side I saw some benefits for that:


• It’s very convent to set up these parameters, the syntax is very much like the DDL definition
• It’s scope is very clear, right on the table it attathed
• It does not affect the table schema, which means in order to specify the offset, there is no need to define an offset column which is weird actually, offset should never be a column, it’s more like a metadata or a start option.

So in total, FLIP-110 uses the offset more like a Hive partition prune, we can do that if we have an offset column, but most of the case we do not define that, so there is actually no conflict or overlap.

Best,
Danny Chan
在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:

> Hi Danny,
>
> shouldn't FLIP-110[1] solve most of the problems we have around defining
> table properties more dynamically without manual schema work? Also
> offset definition is easier with such a syntax. They must not be defined
> in catalog but could be temporary tables that extend from the original
> table.
>
> In general, we should aim to keep the syntax concise and don't provide
> too many ways of doing the same thing. Hints should give "hints" but not
> affect the actual produced result.
>
> Some connector properties might also change the plan or schema in the
> future. E.g. they might also define whether a table source supports
> certain push-downs (e.g. predicate push-down).
>
> Dawid is currently working a draft that might makes it possible to
> expose a Kafka offset via the schema such that `SELECT * FROM Topic
> WHERE offset > 10` would become possible and could be pushed down. But
> this is of course, not planned initially.
>
> Regards,
> Timo
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
>
>
>
> On 10.03.20 08:34, Danny Chan wrote:
> > Thanks Wenlong ~
> >
> > For PROPERTIES Hint Error handling
> >
> > Actually we have no way to figure out whether a error prone hint is a PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, we do not know if this hint is a PROPERTIES hint, what we know is that the hint name was not registered in our Flink.
> >
> > If the user writes the hint name correctly (i.e. PROPERTIES), we did can enforce the validation of the hint options though the pluggable HintOptionChecker.
> >
> > For PROPERTIES Hint Option Format
> >
> > For a key value style hint option, the key can be either a simple identifier or a string literal, which means that it’s compatible with our DDL syntax. We support simple identifier because many other hints do not have the component complex keys like the table properties, and we want to unify the parse block.
> >
> > Best,
> > Danny Chan
> > 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]>,写道:
> > > Hi Danny, thanks for the proposal. +1 for adding table hints, it is really
> > > a necessary feature for flink sql to integrate with a catalog.
> > >
> > > For error handling, I think it would be more natural to throw an
> > > exception when error table hint provided, because the properties in hint
> > > will be merged and used to find the table factory which would cause an
> > > exception when error properties provided, right? On the other hand, unlike
> > > other hints which just affect the way to execute the query, the property
> > > table hint actually affects the result of the query, we should never ignore
> > > the given property hints.
> > >
> > > For the format of property hints, currently, in sql client, we accept
> > > properties in format of string only in DDL: 'connector.type'='kafka', I
> > > think the format of properties in hint should be the same as the format we
> > > defined in ddl. What do you think?
> > >
> > > Bests,
> > > Wenlong Lyu
> > >
> > > On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]> wrote:
> > >
> > > > To Weike: About the Error Handing
> > > >
> > > > To be consistent with other SQL vendors, the default is to log warnings
> > > > and if there is any error (invalid hint name or options), the hint is just
> > > > ignored. I have already addressed in the wiki.
> > > >
> > > > To Timo: About the PROPERTIES Table Hint
> > > >
> > > > • The properties hints is also optional, user can pass in an option to
> > > > override the table properties but this does not mean it is required.
> > > > • They should not include semantics: does the properties belong to
> > > > semantic ? I don't think so, the plan does not change right ? The result
> > > > set may be affected, but there are already some hints do so, for example,
> > > > MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> > > > • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
> > > > compared to the hints way(which is included in comments)
> > > > • I actually didn't found any vendors to support such grammar, and there
> > > > is no way to override table level properties dynamically. For normal RDBMS,
> > > > I think there are no requests for such dynamic parameters because all the
> > > > table have the same storage and computation and they are almost all batch
> > > > tables.
> > > > • While Flink as a computation engine has many connectors, especially for
> > > > some message queue like Kafka, we would have a start_offset which is
> > > > different each time we start the query, such parameters can not be
> > > > persisted to catalog, because it’s not static, this is actually the
> > > > background we propose the table hints to indicate such properties
> > > > dynamically.
> > > >
> > > >
> > > > To Jark and Jinsong: I have removed the query hints part and change the
> > > > title.
> > > >
> > > > [1]
> > > > https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> > > > > Hi Danny,
> > > > >
> > > > > thanks for the proposal. I agree with Jark and Jingsong. Planner hints
> > > > > and table hints are orthogonal topics that should be discussed
> > > > separately.
> > > > >
> > > > > I share Jingsong's opinion that we should not use planner hints for
> > > > > passing connector properties. Planner hints should be optional at any
> > > > > time. They should not include semantics but only affect execution time.
> > > > > Connector properties are an important part of the query itself.
> > > > >
> > > > > Have you thought about options such as `SELECT * FROM t(k=v, k=v)`? How
> > > > > are other vendors deal with this problem?
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > >
> > > > > On 09.03.20 10:37, Jingsong Li wrote:
> > > > > > Hi Danny, +1 for table hints, thanks for driving.
> > > > > >
> > > > > > I took a look to FLIP, most of content are talking about query hints.
> > > > It is
> > > > > > hard to discussion and voting. So +1 to split it as Jark said.
> > > > > >
> > > > > > Another thing is configuration that suitable to config with table
> > > > hints:
> > > > > > "connector.path" and "connector.topic", Are they really suitable for
> > > > table
> > > > > > hints? Looks weird to me. Because I think these properties are the
> > > > core of
> > > > > > table.
> > > > > >
> > > > > > Best,
> > > > > > Jingsong Lee
> > > > > >
> > > > > > On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
> > > > > >
> > > > > > > Thanks Danny for starting the discussion.
> > > > > > > +1 for this feature.
> > > > > > >
> > > > > > > If we just focus on the table hints not the query hints in this
> > > > release,
> > > > > > > could you split the FLIP into two FLIPs?
> > > > > > > Because it's hard to vote on partial part of a FLIP. You can keep
> > > > the table
> > > > > > > hints proposal in FLIP-113 and move query hints into another FLIP.
> > > > > > > So that we can focuse on the table hints in the FLIP.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jark
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]>
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Danny,
> > > > > > > >
> > > > > > > > This is a nice feature, +1.
> > > > > > > >
> > > > > > > > One thing I am interested in but not mentioned in the proposal is
> > > > the
> > > > > > > error
> > > > > > > > handling, as it is quite common for users to write inappropriate
> > > > hints in
> > > > > > > > SQL code, if illegal or "bad" hints are given, would the system
> > > > simply
> > > > > > > > ignore them or throw exceptions?
> > > > > > > >
> > > > > > > > Thanks : )
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Weike
> > > > > > > >
> > > > > > > > On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]>
> > > > wrote:
> > > > > > > >
> > > > > > > > > Note:
> > > > > > > > > we only plan to support table hints in Flink release 1.11, so
> > > > please
> > > > > > > > focus
> > > > > > > > > mainly on the table hints part and just ignore the planner
> > > > hints, sorry
> > > > > > > > for
> > > > > > > > > that mistake ~
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Danny Chan
> > > > > > > > > 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> > > > > > > > > > Hi, fellows ~
> > > > > > > > > >
> > > > > > > > > > I would like to propose the supports for SQL hints for our
> > > > Flink SQL.
> > > > > > > > > >
> > > > > > > > > > We would support hints syntax as following:
> > > > > > > > > >
> > > > > > > > > > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> > > > parallelism='24') */
> > > > > > > > > > from
> > > > > > > > > > emp /*+ INDEX(idx1, idx2) */
> > > > > > > > > > join
> > > > > > > > > > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > > > > > > > on
> > > > > > > > > > emp.deptno = dept.deptno
> > > > > > > > > >
> > > > > > > > > > Basically we would support both query hints(after the SELECT
> > > > keyword)
> > > > > > > > > and table hints(after the referenced table name), for 1.11, we
> > > > plan to
> > > > > > > > only
> > > > > > > > > support table hints with a hint probably named PROPERTIES:
> > > > > > > > > >
> > > > > > > > > > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > > > > > > > >
> > > > > > > > > > I am looking forward to your comments.
> > > > > > > > > >
> > > > > > > > > > You can access the FLIP here:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Danny Chan
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Fabian Hueske-2
Hi,

RE error handling:
I don't think it is a good idea to simply log invalid hints. This makes it
very hard for systems that integrate Flink to consume these errors.
I'm not saying we should throw an exception. We could also execute a query
with invalid hints but should have a mechanism to provide information about
invalid hints to systems that integrates Flink.

RE changing query semantics with hints:
I agree with Timo on this one, especially if queries with invalid hints are
executed.
Otherwise, it will be hard for users to validate that the result is what
they expected it to be.
If hints only affect execution strategies, users can be sure that the query
result is correct even if the execution plan was not fixed as hinted.

Best,
Fabian


Am Di., 10. März 2020 um 10:34 Uhr schrieb Danny Chan <[hidden email]
>:

> Thanks Timo ~
>
> Personally I would say that offset > 0 and start offset = 10 does not have
> the same semantic, so from the SQL aspect, we can not implement a “starting
> offset” hint for query with such a syntax.
>
> And the CREATE TABLE LIKE syntax is a DDL which is just verbose for
> defining such dynamic parameters even if it could do that, shall we force
> users to define a temporal table for each query with dynamic params, I
> would say it’s an awkward solution.
>
> "Hints should give "hints" but not affect the actual produced result.” You
> mentioned that multiple times and could we give a reason, what’s the
> problem there if we user the table hints to support “start offset” ? From
> my side I saw some benefits for that:
>
>
> • It’s very convent to set up these parameters, the syntax is very much
> like the DDL definition
> • It’s scope is very clear, right on the table it attathed
> • It does not affect the table schema, which means in order to specify the
> offset, there is no need to define an offset column which is weird
> actually, offset should never be a column, it’s more like a metadata or a
> start option.
>
> So in total, FLIP-110 uses the offset more like a Hive partition prune, we
> can do that if we have an offset column, but most of the case we do not
> define that, so there is actually no conflict or overlap.
>
> Best,
> Danny Chan
> 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
> > Hi Danny,
> >
> > shouldn't FLIP-110[1] solve most of the problems we have around defining
> > table properties more dynamically without manual schema work? Also
> > offset definition is easier with such a syntax. They must not be defined
> > in catalog but could be temporary tables that extend from the original
> > table.
> >
> > In general, we should aim to keep the syntax concise and don't provide
> > too many ways of doing the same thing. Hints should give "hints" but not
> > affect the actual produced result.
> >
> > Some connector properties might also change the plan or schema in the
> > future. E.g. they might also define whether a table source supports
> > certain push-downs (e.g. predicate push-down).
> >
> > Dawid is currently working a draft that might makes it possible to
> > expose a Kafka offset via the schema such that `SELECT * FROM Topic
> > WHERE offset > 10` would become possible and could be pushed down. But
> > this is of course, not planned initially.
> >
> > Regards,
> > Timo
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> >
> >
> >
> > On 10.03.20 08:34, Danny Chan wrote:
> > > Thanks Wenlong ~
> > >
> > > For PROPERTIES Hint Error handling
> > >
> > > Actually we have no way to figure out whether a error prone hint is a
> PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, we do
> not know if this hint is a PROPERTIES hint, what we know is that the hint
> name was not registered in our Flink.
> > >
> > > If the user writes the hint name correctly (i.e. PROPERTIES), we did
> can enforce the validation of the hint options though the pluggable
> HintOptionChecker.
> > >
> > > For PROPERTIES Hint Option Format
> > >
> > > For a key value style hint option, the key can be either a simple
> identifier or a string literal, which means that it’s compatible with our
> DDL syntax. We support simple identifier because many other hints do not
> have the component complex keys like the table properties, and we want to
> unify the parse block.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]>,写道:
> > > > Hi Danny, thanks for the proposal. +1 for adding table hints, it is
> really
> > > > a necessary feature for flink sql to integrate with a catalog.
> > > >
> > > > For error handling, I think it would be more natural to throw an
> > > > exception when error table hint provided, because the properties in
> hint
> > > > will be merged and used to find the table factory which would cause
> an
> > > > exception when error properties provided, right? On the other hand,
> unlike
> > > > other hints which just affect the way to execute the query, the
> property
> > > > table hint actually affects the result of the query, we should never
> ignore
> > > > the given property hints.
> > > >
> > > > For the format of property hints, currently, in sql client, we accept
> > > > properties in format of string only in DDL:
> 'connector.type'='kafka', I
> > > > think the format of properties in hint should be the same as the
> format we
> > > > defined in ddl. What do you think?
> > > >
> > > > Bests,
> > > > Wenlong Lyu
> > > >
> > > > On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]>
> wrote:
> > > >
> > > > > To Weike: About the Error Handing
> > > > >
> > > > > To be consistent with other SQL vendors, the default is to log
> warnings
> > > > > and if there is any error (invalid hint name or options), the hint
> is just
> > > > > ignored. I have already addressed in the wiki.
> > > > >
> > > > > To Timo: About the PROPERTIES Table Hint
> > > > >
> > > > > • The properties hints is also optional, user can pass in an
> option to
> > > > > override the table properties but this does not mean it is
> required.
> > > > > • They should not include semantics: does the properties belong to
> > > > > semantic ? I don't think so, the plan does not change right ? The
> result
> > > > > set may be affected, but there are already some hints do so, for
> example,
> > > > > MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> > > > > • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
> > > > > compared to the hints way(which is included in comments)
> > > > > • I actually didn't found any vendors to support such grammar, and
> there
> > > > > is no way to override table level properties dynamically. For
> normal RDBMS,
> > > > > I think there are no requests for such dynamic parameters because
> all the
> > > > > table have the same storage and computation and they are almost
> all batch
> > > > > tables.
> > > > > • While Flink as a computation engine has many connectors,
> especially for
> > > > > some message queue like Kafka, we would have a start_offset which
> is
> > > > > different each time we start the query, such parameters can not be
> > > > > persisted to catalog, because it’s not static, this is actually the
> > > > > background we propose the table hints to indicate such properties
> > > > > dynamically.
> > > > >
> > > > >
> > > > > To Jark and Jinsong: I have removed the query hints part and
> change the
> > > > > title.
> > > > >
> > > > > [1]
> > > > >
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> > > > > > Hi Danny,
> > > > > >
> > > > > > thanks for the proposal. I agree with Jark and Jingsong. Planner
> hints
> > > > > > and table hints are orthogonal topics that should be discussed
> > > > > separately.
> > > > > >
> > > > > > I share Jingsong's opinion that we should not use planner hints
> for
> > > > > > passing connector properties. Planner hints should be optional
> at any
> > > > > > time. They should not include semantics but only affect
> execution time.
> > > > > > Connector properties are an important part of the query itself.
> > > > > >
> > > > > > Have you thought about options such as `SELECT * FROM t(k=v,
> k=v)`? How
> > > > > > are other vendors deal with this problem?
> > > > > >
> > > > > > Regards,
> > > > > > Timo
> > > > > >
> > > > > >
> > > > > > On 09.03.20 10:37, Jingsong Li wrote:
> > > > > > > Hi Danny, +1 for table hints, thanks for driving.
> > > > > > >
> > > > > > > I took a look to FLIP, most of content are talking about query
> hints.
> > > > > It is
> > > > > > > hard to discussion and voting. So +1 to split it as Jark said.
> > > > > > >
> > > > > > > Another thing is configuration that suitable to config with
> table
> > > > > hints:
> > > > > > > "connector.path" and "connector.topic", Are they really
> suitable for
> > > > > table
> > > > > > > hints? Looks weird to me. Because I think these properties are
> the
> > > > > core of
> > > > > > > table.
> > > > > > >
> > > > > > > Best,
> > > > > > > Jingsong Lee
> > > > > > >
> > > > > > > On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]>
> wrote:
> > > > > > >
> > > > > > > > Thanks Danny for starting the discussion.
> > > > > > > > +1 for this feature.
> > > > > > > >
> > > > > > > > If we just focus on the table hints not the query hints in
> this
> > > > > release,
> > > > > > > > could you split the FLIP into two FLIPs?
> > > > > > > > Because it's hard to vote on partial part of a FLIP. You can
> keep
> > > > > the table
> > > > > > > > hints proposal in FLIP-113 and move query hints into another
> FLIP.
> > > > > > > > So that we can focuse on the table hints in the FLIP.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Jark
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, 9 Mar 2020 at 17:14, DONG, Weike <
> [hidden email]>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Danny,
> > > > > > > > >
> > > > > > > > > This is a nice feature, +1.
> > > > > > > > >
> > > > > > > > > One thing I am interested in but not mentioned in the
> proposal is
> > > > > the
> > > > > > > > error
> > > > > > > > > handling, as it is quite common for users to write
> inappropriate
> > > > > hints in
> > > > > > > > > SQL code, if illegal or "bad" hints are given, would the
> system
> > > > > simply
> > > > > > > > > ignore them or throw exceptions?
> > > > > > > > >
> > > > > > > > > Thanks : )
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Weike
> > > > > > > > >
> > > > > > > > > On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <
> [hidden email]>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Note:
> > > > > > > > > > we only plan to support table hints in Flink release
> 1.11, so
> > > > > please
> > > > > > > > > focus
> > > > > > > > > > mainly on the table hints part and just ignore the
> planner
> > > > > hints, sorry
> > > > > > > > > for
> > > > > > > > > > that mistake ~
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Danny Chan
> > > > > > > > > > 在 2020年3月9日 +0800 PM4:36,Danny Chan <
> [hidden email]>,写道:
> > > > > > > > > > > Hi, fellows ~
> > > > > > > > > > >
> > > > > > > > > > > I would like to propose the supports for SQL hints for
> our
> > > > > Flink SQL.
> > > > > > > > > > >
> > > > > > > > > > > We would support hints syntax as following:
> > > > > > > > > > >
> > > > > > > > > > > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> > > > > parallelism='24') */
> > > > > > > > > > > from
> > > > > > > > > > > emp /*+ INDEX(idx1, idx2) */
> > > > > > > > > > > join
> > > > > > > > > > > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > > > > > > > > on
> > > > > > > > > > > emp.deptno = dept.deptno
> > > > > > > > > > >
> > > > > > > > > > > Basically we would support both query hints(after the
> SELECT
> > > > > keyword)
> > > > > > > > > > and table hints(after the referenced table name), for
> 1.11, we
> > > > > plan to
> > > > > > > > > only
> > > > > > > > > > support table hints with a hint probably named
> PROPERTIES:
> > > > > > > > > > >
> > > > > > > > > > > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > > > > > > > > >
> > > > > > > > > > > I am looking forward to your comments.
> > > > > > > > > > >
> > > > > > > > > > > You can access the FLIP here:
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Danny Chan
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Timo Walther-2
In reply to this post by Danny Chan
Hi Danny,

compared to the hints, FLIP-110 is fully compliant to the SQL standard.

I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH (k=v)` is
too verbose or awkward for the power of basically changing the entire
connector. Usually, this statement would just precede the query in a
multiline file. So it can be change "in-place" like the hints you proposed.

Many companies have a well-defined set of tables that should be used. It
would be dangerous if users can change the path or topic in a hint. The
catalog/catalog manager should be the entity that controls which tables
exist and how they can be accessed.

 > what’s the problem there if we user the table hints to support “start
offset”?

IMHO it violates the meaning of a hint. According to the dictionary, a
hint is "a statement that expresses indirectly what one prefers not to
say explicitly". But offsets are a property that are very explicit.

If we go with the hint approach, it should be expressible in the
TableSourceFactory which properties are supported for hinting. Or do you
plan to offer those hints in a separate Map<String, String> that cannot
overwrite existing properties? I think this would be a different story...

Regards,
Timo


On 10.03.20 10:34, Danny Chan wrote:

> Thanks Timo ~
>
> Personally I would say that offset > 0 and start offset = 10 does not have the same semantic, so from the SQL aspect, we can not implement a “starting offset” hint for query with such a syntax.
>
> And the CREATE TABLE LIKE syntax is a DDL which is just verbose for defining such dynamic parameters even if it could do that, shall we force users to define a temporal table for each query with dynamic params, I would say it’s an awkward solution.
>
> "Hints should give "hints" but not affect the actual produced result.” You mentioned that multiple times and could we give a reason, what’s the problem there if we user the table hints to support “start offset” ? From my side I saw some benefits for that:
>
>
> • It’s very convent to set up these parameters, the syntax is very much like the DDL definition
> • It’s scope is very clear, right on the table it attathed
> • It does not affect the table schema, which means in order to specify the offset, there is no need to define an offset column which is weird actually, offset should never be a column, it’s more like a metadata or a start option.
>
> So in total, FLIP-110 uses the offset more like a Hive partition prune, we can do that if we have an offset column, but most of the case we do not define that, so there is actually no conflict or overlap.
>
> Best,
> Danny Chan
> 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
>> Hi Danny,
>>
>> shouldn't FLIP-110[1] solve most of the problems we have around defining
>> table properties more dynamically without manual schema work? Also
>> offset definition is easier with such a syntax. They must not be defined
>> in catalog but could be temporary tables that extend from the original
>> table.
>>
>> In general, we should aim to keep the syntax concise and don't provide
>> too many ways of doing the same thing. Hints should give "hints" but not
>> affect the actual produced result.
>>
>> Some connector properties might also change the plan or schema in the
>> future. E.g. they might also define whether a table source supports
>> certain push-downs (e.g. predicate push-down).
>>
>> Dawid is currently working a draft that might makes it possible to
>> expose a Kafka offset via the schema such that `SELECT * FROM Topic
>> WHERE offset > 10` would become possible and could be pushed down. But
>> this is of course, not planned initially.
>>
>> Regards,
>> Timo
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
>>
>>
>>
>> On 10.03.20 08:34, Danny Chan wrote:
>>> Thanks Wenlong ~
>>>
>>> For PROPERTIES Hint Error handling
>>>
>>> Actually we have no way to figure out whether a error prone hint is a PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, we do not know if this hint is a PROPERTIES hint, what we know is that the hint name was not registered in our Flink.
>>>
>>> If the user writes the hint name correctly (i.e. PROPERTIES), we did can enforce the validation of the hint options though the pluggable HintOptionChecker.
>>>
>>> For PROPERTIES Hint Option Format
>>>
>>> For a key value style hint option, the key can be either a simple identifier or a string literal, which means that it’s compatible with our DDL syntax. We support simple identifier because many other hints do not have the component complex keys like the table properties, and we want to unify the parse block.
>>>
>>> Best,
>>> Danny Chan
>>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]>,写道:
>>>> Hi Danny, thanks for the proposal. +1 for adding table hints, it is really
>>>> a necessary feature for flink sql to integrate with a catalog.
>>>>
>>>> For error handling, I think it would be more natural to throw an
>>>> exception when error table hint provided, because the properties in hint
>>>> will be merged and used to find the table factory which would cause an
>>>> exception when error properties provided, right? On the other hand, unlike
>>>> other hints which just affect the way to execute the query, the property
>>>> table hint actually affects the result of the query, we should never ignore
>>>> the given property hints.
>>>>
>>>> For the format of property hints, currently, in sql client, we accept
>>>> properties in format of string only in DDL: 'connector.type'='kafka', I
>>>> think the format of properties in hint should be the same as the format we
>>>> defined in ddl. What do you think?
>>>>
>>>> Bests,
>>>> Wenlong Lyu
>>>>
>>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]> wrote:
>>>>
>>>>> To Weike: About the Error Handing
>>>>>
>>>>> To be consistent with other SQL vendors, the default is to log warnings
>>>>> and if there is any error (invalid hint name or options), the hint is just
>>>>> ignored. I have already addressed in the wiki.
>>>>>
>>>>> To Timo: About the PROPERTIES Table Hint
>>>>>
>>>>> • The properties hints is also optional, user can pass in an option to
>>>>> override the table properties but this does not mean it is required.
>>>>> • They should not include semantics: does the properties belong to
>>>>> semantic ? I don't think so, the plan does not change right ? The result
>>>>> set may be affected, but there are already some hints do so, for example,
>>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1]
>>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
>>>>> compared to the hints way(which is included in comments)
>>>>> • I actually didn't found any vendors to support such grammar, and there
>>>>> is no way to override table level properties dynamically. For normal RDBMS,
>>>>> I think there are no requests for such dynamic parameters because all the
>>>>> table have the same storage and computation and they are almost all batch
>>>>> tables.
>>>>> • While Flink as a computation engine has many connectors, especially for
>>>>> some message queue like Kafka, we would have a start_offset which is
>>>>> different each time we start the query, such parameters can not be
>>>>> persisted to catalog, because it’s not static, this is actually the
>>>>> background we propose the table hints to indicate such properties
>>>>> dynamically.
>>>>>
>>>>>
>>>>> To Jark and Jinsong: I have removed the query hints part and change the
>>>>> title.
>>>>>
>>>>> [1]
>>>>> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
>>>>>
>>>>> Best,
>>>>> Danny Chan
>>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
>>>>>> Hi Danny,
>>>>>>
>>>>>> thanks for the proposal. I agree with Jark and Jingsong. Planner hints
>>>>>> and table hints are orthogonal topics that should be discussed
>>>>> separately.
>>>>>>
>>>>>> I share Jingsong's opinion that we should not use planner hints for
>>>>>> passing connector properties. Planner hints should be optional at any
>>>>>> time. They should not include semantics but only affect execution time.
>>>>>> Connector properties are an important part of the query itself.
>>>>>>
>>>>>> Have you thought about options such as `SELECT * FROM t(k=v, k=v)`? How
>>>>>> are other vendors deal with this problem?
>>>>>>
>>>>>> Regards,
>>>>>> Timo
>>>>>>
>>>>>>
>>>>>> On 09.03.20 10:37, Jingsong Li wrote:
>>>>>>> Hi Danny, +1 for table hints, thanks for driving.
>>>>>>>
>>>>>>> I took a look to FLIP, most of content are talking about query hints.
>>>>> It is
>>>>>>> hard to discussion and voting. So +1 to split it as Jark said.
>>>>>>>
>>>>>>> Another thing is configuration that suitable to config with table
>>>>> hints:
>>>>>>> "connector.path" and "connector.topic", Are they really suitable for
>>>>> table
>>>>>>> hints? Looks weird to me. Because I think these properties are the
>>>>> core of
>>>>>>> table.
>>>>>>>
>>>>>>> Best,
>>>>>>> Jingsong Lee
>>>>>>>
>>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
>>>>>>>
>>>>>>>> Thanks Danny for starting the discussion.
>>>>>>>> +1 for this feature.
>>>>>>>>
>>>>>>>> If we just focus on the table hints not the query hints in this
>>>>> release,
>>>>>>>> could you split the FLIP into two FLIPs?
>>>>>>>> Because it's hard to vote on partial part of a FLIP. You can keep
>>>>> the table
>>>>>>>> hints proposal in FLIP-113 and move query hints into another FLIP.
>>>>>>>> So that we can focuse on the table hints in the FLIP.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jark
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Danny,
>>>>>>>>>
>>>>>>>>> This is a nice feature, +1.
>>>>>>>>>
>>>>>>>>> One thing I am interested in but not mentioned in the proposal is
>>>>> the
>>>>>>>> error
>>>>>>>>> handling, as it is quite common for users to write inappropriate
>>>>> hints in
>>>>>>>>> SQL code, if illegal or "bad" hints are given, would the system
>>>>> simply
>>>>>>>>> ignore them or throw exceptions?
>>>>>>>>>
>>>>>>>>> Thanks : )
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Weike
>>>>>>>>>
>>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]>
>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Note:
>>>>>>>>>> we only plan to support table hints in Flink release 1.11, so
>>>>> please
>>>>>>>>> focus
>>>>>>>>>> mainly on the table hints part and just ignore the planner
>>>>> hints, sorry
>>>>>>>>> for
>>>>>>>>>> that mistake ~
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Danny Chan
>>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
>>>>>>>>>>> Hi, fellows ~
>>>>>>>>>>>
>>>>>>>>>>> I would like to propose the supports for SQL hints for our
>>>>> Flink SQL.
>>>>>>>>>>>
>>>>>>>>>>> We would support hints syntax as following:
>>>>>>>>>>>
>>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
>>>>> parallelism='24') */
>>>>>>>>>>> from
>>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */
>>>>>>>>>>> join
>>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
>>>>>>>>>>> on
>>>>>>>>>>> emp.deptno = dept.deptno
>>>>>>>>>>>
>>>>>>>>>>> Basically we would support both query hints(after the SELECT
>>>>> keyword)
>>>>>>>>>> and table hints(after the referenced table name), for 1.11, we
>>>>> plan to
>>>>>>>>> only
>>>>>>>>>> support table hints with a hint probably named PROPERTIES:
>>>>>>>>>>>
>>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
>>>>>>>>>>>
>>>>>>>>>>> I am looking forward to your comments.
>>>>>>>>>>>
>>>>>>>>>>> You can access the FLIP here:
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>> Danny Chan
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Jark Wu-2
Hi everyone,

I want to jump in the discussion about the "dynamic start offset" problem.
First of all, I share the same concern with Timo and Fabian, that the
"start offset" affects the query semantics, i.e. the query result.
But "hints" is just used for optimization which should affect the result?

I think the "dynamic start offset" is an very important usability problem
which will be faced by many streaming platforms.
I also agree "CREATE TEMPORARY TABLE Temp (LIKE t) WITH
('connector.startup-timestamp-millis' = '1578538374471')" is verbose, what
if we have 10 tables to join?

However, what I want to propose (should be another thread) is a global
configuration to reset start offsets of all the source connectors
in the query session, e.g. "table.sources.start-offset". This is possible
now because `TableSourceFactory.Context` has `getConfiguration`
method to get the session configuration, and use it to create an adapted
TableSource.
Then we can also expose to SQL CLI via SET command, e.g. `SET
'table.sources.start-offset'='earliest';`, which is pretty simple and
straightforward.

This is very similar to KSQL's `SET 'auto.offset.reset'='earliest'` which
is very helpful IMO.

Best,
Jark


On Tue, 10 Mar 2020 at 22:29, Timo Walther <[hidden email]> wrote:

> Hi Danny,
>
> compared to the hints, FLIP-110 is fully compliant to the SQL standard.
>
> I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH (k=v)` is
> too verbose or awkward for the power of basically changing the entire
> connector. Usually, this statement would just precede the query in a
> multiline file. So it can be change "in-place" like the hints you proposed.
>
> Many companies have a well-defined set of tables that should be used. It
> would be dangerous if users can change the path or topic in a hint. The
> catalog/catalog manager should be the entity that controls which tables
> exist and how they can be accessed.
>
>  > what’s the problem there if we user the table hints to support “start
> offset”?
>
> IMHO it violates the meaning of a hint. According to the dictionary, a
> hint is "a statement that expresses indirectly what one prefers not to
> say explicitly". But offsets are a property that are very explicit.
>
> If we go with the hint approach, it should be expressible in the
> TableSourceFactory which properties are supported for hinting. Or do you
> plan to offer those hints in a separate Map<String, String> that cannot
> overwrite existing properties? I think this would be a different story...
>
> Regards,
> Timo
>
>
> On 10.03.20 10:34, Danny Chan wrote:
> > Thanks Timo ~
> >
> > Personally I would say that offset > 0 and start offset = 10 does not
> have the same semantic, so from the SQL aspect, we can not implement a
> “starting offset” hint for query with such a syntax.
> >
> > And the CREATE TABLE LIKE syntax is a DDL which is just verbose for
> defining such dynamic parameters even if it could do that, shall we force
> users to define a temporal table for each query with dynamic params, I
> would say it’s an awkward solution.
> >
> > "Hints should give "hints" but not affect the actual produced result.”
> You mentioned that multiple times and could we give a reason, what’s the
> problem there if we user the table hints to support “start offset” ? From
> my side I saw some benefits for that:
> >
> >
> > • It’s very convent to set up these parameters, the syntax is very much
> like the DDL definition
> > • It’s scope is very clear, right on the table it attathed
> > • It does not affect the table schema, which means in order to specify
> the offset, there is no need to define an offset column which is weird
> actually, offset should never be a column, it’s more like a metadata or a
> start option.
> >
> > So in total, FLIP-110 uses the offset more like a Hive partition prune,
> we can do that if we have an offset column, but most of the case we do not
> define that, so there is actually no conflict or overlap.
> >
> > Best,
> > Danny Chan
> > 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
> >> Hi Danny,
> >>
> >> shouldn't FLIP-110[1] solve most of the problems we have around defining
> >> table properties more dynamically without manual schema work? Also
> >> offset definition is easier with such a syntax. They must not be defined
> >> in catalog but could be temporary tables that extend from the original
> >> table.
> >>
> >> In general, we should aim to keep the syntax concise and don't provide
> >> too many ways of doing the same thing. Hints should give "hints" but not
> >> affect the actual produced result.
> >>
> >> Some connector properties might also change the plan or schema in the
> >> future. E.g. they might also define whether a table source supports
> >> certain push-downs (e.g. predicate push-down).
> >>
> >> Dawid is currently working a draft that might makes it possible to
> >> expose a Kafka offset via the schema such that `SELECT * FROM Topic
> >> WHERE offset > 10` would become possible and could be pushed down. But
> >> this is of course, not planned initially.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> >>
> >>
> >>
> >> On 10.03.20 08:34, Danny Chan wrote:
> >>> Thanks Wenlong ~
> >>>
> >>> For PROPERTIES Hint Error handling
> >>>
> >>> Actually we have no way to figure out whether a error prone hint is a
> PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, we do
> not know if this hint is a PROPERTIES hint, what we know is that the hint
> name was not registered in our Flink.
> >>>
> >>> If the user writes the hint name correctly (i.e. PROPERTIES), we did
> can enforce the validation of the hint options though the pluggable
> HintOptionChecker.
> >>>
> >>> For PROPERTIES Hint Option Format
> >>>
> >>> For a key value style hint option, the key can be either a simple
> identifier or a string literal, which means that it’s compatible with our
> DDL syntax. We support simple identifier because many other hints do not
> have the component complex keys like the table properties, and we want to
> unify the parse block.
> >>>
> >>> Best,
> >>> Danny Chan
> >>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]>,写道:
> >>>> Hi Danny, thanks for the proposal. +1 for adding table hints, it is
> really
> >>>> a necessary feature for flink sql to integrate with a catalog.
> >>>>
> >>>> For error handling, I think it would be more natural to throw an
> >>>> exception when error table hint provided, because the properties in
> hint
> >>>> will be merged and used to find the table factory which would cause an
> >>>> exception when error properties provided, right? On the other hand,
> unlike
> >>>> other hints which just affect the way to execute the query, the
> property
> >>>> table hint actually affects the result of the query, we should never
> ignore
> >>>> the given property hints.
> >>>>
> >>>> For the format of property hints, currently, in sql client, we accept
> >>>> properties in format of string only in DDL: 'connector.type'='kafka',
> I
> >>>> think the format of properties in hint should be the same as the
> format we
> >>>> defined in ddl. What do you think?
> >>>>
> >>>> Bests,
> >>>> Wenlong Lyu
> >>>>
> >>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]>
> wrote:
> >>>>
> >>>>> To Weike: About the Error Handing
> >>>>>
> >>>>> To be consistent with other SQL vendors, the default is to log
> warnings
> >>>>> and if there is any error (invalid hint name or options), the hint
> is just
> >>>>> ignored. I have already addressed in the wiki.
> >>>>>
> >>>>> To Timo: About the PROPERTIES Table Hint
> >>>>>
> >>>>> • The properties hints is also optional, user can pass in an option
> to
> >>>>> override the table properties but this does not mean it is required.
> >>>>> • They should not include semantics: does the properties belong to
> >>>>> semantic ? I don't think so, the plan does not change right ? The
> result
> >>>>> set may be affected, but there are already some hints do so, for
> example,
> >>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> >>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
> >>>>> compared to the hints way(which is included in comments)
> >>>>> • I actually didn't found any vendors to support such grammar, and
> there
> >>>>> is no way to override table level properties dynamically. For normal
> RDBMS,
> >>>>> I think there are no requests for such dynamic parameters because
> all the
> >>>>> table have the same storage and computation and they are almost all
> batch
> >>>>> tables.
> >>>>> • While Flink as a computation engine has many connectors,
> especially for
> >>>>> some message queue like Kafka, we would have a start_offset which is
> >>>>> different each time we start the query, such parameters can not be
> >>>>> persisted to catalog, because it’s not static, this is actually the
> >>>>> background we propose the table hints to indicate such properties
> >>>>> dynamically.
> >>>>>
> >>>>>
> >>>>> To Jark and Jinsong: I have removed the query hints part and change
> the
> >>>>> title.
> >>>>>
> >>>>> [1]
> >>>>>
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> >>>>>
> >>>>> Best,
> >>>>> Danny Chan
> >>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> >>>>>> Hi Danny,
> >>>>>>
> >>>>>> thanks for the proposal. I agree with Jark and Jingsong. Planner
> hints
> >>>>>> and table hints are orthogonal topics that should be discussed
> >>>>> separately.
> >>>>>>
> >>>>>> I share Jingsong's opinion that we should not use planner hints for
> >>>>>> passing connector properties. Planner hints should be optional at
> any
> >>>>>> time. They should not include semantics but only affect execution
> time.
> >>>>>> Connector properties are an important part of the query itself.
> >>>>>>
> >>>>>> Have you thought about options such as `SELECT * FROM t(k=v, k=v)`?
> How
> >>>>>> are other vendors deal with this problem?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Timo
> >>>>>>
> >>>>>>
> >>>>>> On 09.03.20 10:37, Jingsong Li wrote:
> >>>>>>> Hi Danny, +1 for table hints, thanks for driving.
> >>>>>>>
> >>>>>>> I took a look to FLIP, most of content are talking about query
> hints.
> >>>>> It is
> >>>>>>> hard to discussion and voting. So +1 to split it as Jark said.
> >>>>>>>
> >>>>>>> Another thing is configuration that suitable to config with table
> >>>>> hints:
> >>>>>>> "connector.path" and "connector.topic", Are they really suitable
> for
> >>>>> table
> >>>>>>> hints? Looks weird to me. Because I think these properties are the
> >>>>> core of
> >>>>>>> table.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jingsong Lee
> >>>>>>>
> >>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
> >>>>>>>
> >>>>>>>> Thanks Danny for starting the discussion.
> >>>>>>>> +1 for this feature.
> >>>>>>>>
> >>>>>>>> If we just focus on the table hints not the query hints in this
> >>>>> release,
> >>>>>>>> could you split the FLIP into two FLIPs?
> >>>>>>>> Because it's hard to vote on partial part of a FLIP. You can keep
> >>>>> the table
> >>>>>>>> hints proposal in FLIP-113 and move query hints into another FLIP.
> >>>>>>>> So that we can focuse on the table hints in the FLIP.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]
> >
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Danny,
> >>>>>>>>>
> >>>>>>>>> This is a nice feature, +1.
> >>>>>>>>>
> >>>>>>>>> One thing I am interested in but not mentioned in the proposal is
> >>>>> the
> >>>>>>>> error
> >>>>>>>>> handling, as it is quite common for users to write inappropriate
> >>>>> hints in
> >>>>>>>>> SQL code, if illegal or "bad" hints are given, would the system
> >>>>> simply
> >>>>>>>>> ignore them or throw exceptions?
> >>>>>>>>>
> >>>>>>>>> Thanks : )
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Weike
> >>>>>>>>>
> >>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Note:
> >>>>>>>>>> we only plan to support table hints in Flink release 1.11, so
> >>>>> please
> >>>>>>>>> focus
> >>>>>>>>>> mainly on the table hints part and just ignore the planner
> >>>>> hints, sorry
> >>>>>>>>> for
> >>>>>>>>>> that mistake ~
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Danny Chan
> >>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> >>>>>>>>>>> Hi, fellows ~
> >>>>>>>>>>>
> >>>>>>>>>>> I would like to propose the supports for SQL hints for our
> >>>>> Flink SQL.
> >>>>>>>>>>>
> >>>>>>>>>>> We would support hints syntax as following:
> >>>>>>>>>>>
> >>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> >>>>> parallelism='24') */
> >>>>>>>>>>> from
> >>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */
> >>>>>>>>>>> join
> >>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
> >>>>>>>>>>> on
> >>>>>>>>>>> emp.deptno = dept.deptno
> >>>>>>>>>>>
> >>>>>>>>>>> Basically we would support both query hints(after the SELECT
> >>>>> keyword)
> >>>>>>>>>> and table hints(after the referenced table name), for 1.11, we
> >>>>> plan to
> >>>>>>>>> only
> >>>>>>>>>> support table hints with a hint probably named PROPERTIES:
> >>>>>>>>>>>
> >>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> >>>>>>>>>>>
> >>>>>>>>>>> I am looking forward to your comments.
> >>>>>>>>>>>
> >>>>>>>>>>> You can access the FLIP here:
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Danny Chan
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Danny Chan-2
In reply to this post by Timo Walther-2
Thanks Timo and Fabian ~

compared to the hints, FLIP-110 is fully compliant to the SQL standard.


I don't think so, here is the syntax of CREATE TABLE in SQL-2011 IWD
9075-2:201?(E) 11.3 <table definition>:

<table definition> ::=
  CREATE [ <table scope> ] TABLE <table name> <table contents source>
      [ WITH <system versioning clause> ]
      [ ON COMMIT <table commit action> ROWS ]
<table contents source> ::=
    <table element list>
  | <typed table clause>
  | <as subquery clause>

<table element list> ::=
  <left paren> <table element> [ { <comma> <table element> }... ] <right
paren>
<table element> ::=
    <column definition>
  | <table period definition>
  | <table constraint definition>
  | <like clause>

<like clause> ::=
  LIKE <table name> [ <like options> ]
<like options> ::=
  <like option>...
<like option> ::=
    <identity option>
  | <column default option>
  | <generation option>
<identity option> ::=
    INCLUDING IDENTITY
  | EXCLUDING IDENTITY
<column default option> ::=
    INCLUDING DEFAULTS
  | EXCLUDING DEFAULTS
<generation option> ::=
    INCLUDING GENERATED
  | EXCLUDING GENERATED

So, according to the standard SQL, we can have syntax as "CREATE TABLE t2
like t1 [like options]";
MySQL supports "CREATE TABLE new_tbl LIKE orig_tbl" but there is no WITH
options[1]. Basically the syntax "CREATE TABLE t like with (k1=v1, k2=v2)"
syntax has is totally not compatible with the SQL standard.

I agree with Fabian that the connector properties is important to the query
execution, we can change the error handling to throws if:

   - The hints was not recognized for Flink
   - The hint options are in invalid format

Both of these are pluggable.

If we go with the hint approach, it should be expressible in the

TableSourceFactory which properties are supported for hinting. Or do you

plan to offer those hints in a separate


We can limit the hint options of PROPERTIES to only support those we want
to support, if you think that is necessary, there is no need to distinguish
in TableSourceFactory for these attributes. And like you said, this is
totally about properties validation which is another story, we have this
problem even if we use the DDL way.

[1] https://dev.mysql.com/doc/refman/5.7/en/create-table-like.html

Timo Walther <[hidden email]> 于2020年3月10日周二 下午10:29写道:

> Hi Danny,
>
> compared to the hints, FLIP-110 is fully compliant to the SQL standard.
>
> I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH (k=v)` is
> too verbose or awkward for the power of basically changing the entire
> connector. Usually, this statement would just precede the query in a
> multiline file. So it can be change "in-place" like the hints you proposed.
>
> Many companies have a well-defined set of tables that should be used. It
> would be dangerous if users can change the path or topic in a hint. The
> catalog/catalog manager should be the entity that controls which tables
> exist and how they can be accessed.
>
>  > what’s the problem there if we user the table hints to support “start
> offset”?
>
> IMHO it violates the meaning of a hint. According to the dictionary, a
> hint is "a statement that expresses indirectly what one prefers not to
> say explicitly". But offsets are a property that are very explicit.
>
> If we go with the hint approach, it should be expressible in the
> TableSourceFactory which properties are supported for hinting. Or do you
> plan to offer those hints in a separate Map<String, String> that cannot
> overwrite existing properties? I think this would be a different story...
>
> Regards,
> Timo
>
>
> On 10.03.20 10:34, Danny Chan wrote:
> > Thanks Timo ~
> >
> > Personally I would say that offset > 0 and start offset = 10 does not
> have the same semantic, so from the SQL aspect, we can not implement a
> “starting offset” hint for query with such a syntax.
> >
> > And the CREATE TABLE LIKE syntax is a DDL which is just verbose for
> defining such dynamic parameters even if it could do that, shall we force
> users to define a temporal table for each query with dynamic params, I
> would say it’s an awkward solution.
> >
> > "Hints should give "hints" but not affect the actual produced result.”
> You mentioned that multiple times and could we give a reason, what’s the
> problem there if we user the table hints to support “start offset” ? From
> my side I saw some benefits for that:
> >
> >
> > • It’s very convent to set up these parameters, the syntax is very much
> like the DDL definition
> > • It’s scope is very clear, right on the table it attathed
> > • It does not affect the table schema, which means in order to specify
> the offset, there is no need to define an offset column which is weird
> actually, offset should never be a column, it’s more like a metadata or a
> start option.
> >
> > So in total, FLIP-110 uses the offset more like a Hive partition prune,
> we can do that if we have an offset column, but most of the case we do not
> define that, so there is actually no conflict or overlap.
> >
> > Best,
> > Danny Chan
> > 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
> >> Hi Danny,
> >>
> >> shouldn't FLIP-110[1] solve most of the problems we have around defining
> >> table properties more dynamically without manual schema work? Also
> >> offset definition is easier with such a syntax. They must not be defined
> >> in catalog but could be temporary tables that extend from the original
> >> table.
> >>
> >> In general, we should aim to keep the syntax concise and don't provide
> >> too many ways of doing the same thing. Hints should give "hints" but not
> >> affect the actual produced result.
> >>
> >> Some connector properties might also change the plan or schema in the
> >> future. E.g. they might also define whether a table source supports
> >> certain push-downs (e.g. predicate push-down).
> >>
> >> Dawid is currently working a draft that might makes it possible to
> >> expose a Kafka offset via the schema such that `SELECT * FROM Topic
> >> WHERE offset > 10` would become possible and could be pushed down. But
> >> this is of course, not planned initially.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> [1]
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> >>
> >>
> >>
> >> On 10.03.20 08:34, Danny Chan wrote:
> >>> Thanks Wenlong ~
> >>>
> >>> For PROPERTIES Hint Error handling
> >>>
> >>> Actually we have no way to figure out whether a error prone hint is a
> PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, we do
> not know if this hint is a PROPERTIES hint, what we know is that the hint
> name was not registered in our Flink.
> >>>
> >>> If the user writes the hint name correctly (i.e. PROPERTIES), we did
> can enforce the validation of the hint options though the pluggable
> HintOptionChecker.
> >>>
> >>> For PROPERTIES Hint Option Format
> >>>
> >>> For a key value style hint option, the key can be either a simple
> identifier or a string literal, which means that it’s compatible with our
> DDL syntax. We support simple identifier because many other hints do not
> have the component complex keys like the table properties, and we want to
> unify the parse block.
> >>>
> >>> Best,
> >>> Danny Chan
> >>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]>,写道:
> >>>> Hi Danny, thanks for the proposal. +1 for adding table hints, it is
> really
> >>>> a necessary feature for flink sql to integrate with a catalog.
> >>>>
> >>>> For error handling, I think it would be more natural to throw an
> >>>> exception when error table hint provided, because the properties in
> hint
> >>>> will be merged and used to find the table factory which would cause an
> >>>> exception when error properties provided, right? On the other hand,
> unlike
> >>>> other hints which just affect the way to execute the query, the
> property
> >>>> table hint actually affects the result of the query, we should never
> ignore
> >>>> the given property hints.
> >>>>
> >>>> For the format of property hints, currently, in sql client, we accept
> >>>> properties in format of string only in DDL: 'connector.type'='kafka',
> I
> >>>> think the format of properties in hint should be the same as the
> format we
> >>>> defined in ddl. What do you think?
> >>>>
> >>>> Bests,
> >>>> Wenlong Lyu
> >>>>
> >>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]>
> wrote:
> >>>>
> >>>>> To Weike: About the Error Handing
> >>>>>
> >>>>> To be consistent with other SQL vendors, the default is to log
> warnings
> >>>>> and if there is any error (invalid hint name or options), the hint
> is just
> >>>>> ignored. I have already addressed in the wiki.
> >>>>>
> >>>>> To Timo: About the PROPERTIES Table Hint
> >>>>>
> >>>>> • The properties hints is also optional, user can pass in an option
> to
> >>>>> override the table properties but this does not mean it is required.
> >>>>> • They should not include semantics: does the properties belong to
> >>>>> semantic ? I don't think so, the plan does not change right ? The
> result
> >>>>> set may be affected, but there are already some hints do so, for
> example,
> >>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> >>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
> >>>>> compared to the hints way(which is included in comments)
> >>>>> • I actually didn't found any vendors to support such grammar, and
> there
> >>>>> is no way to override table level properties dynamically. For normal
> RDBMS,
> >>>>> I think there are no requests for such dynamic parameters because
> all the
> >>>>> table have the same storage and computation and they are almost all
> batch
> >>>>> tables.
> >>>>> • While Flink as a computation engine has many connectors,
> especially for
> >>>>> some message queue like Kafka, we would have a start_offset which is
> >>>>> different each time we start the query, such parameters can not be
> >>>>> persisted to catalog, because it’s not static, this is actually the
> >>>>> background we propose the table hints to indicate such properties
> >>>>> dynamically.
> >>>>>
> >>>>>
> >>>>> To Jark and Jinsong: I have removed the query hints part and change
> the
> >>>>> title.
> >>>>>
> >>>>> [1]
> >>>>>
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> >>>>>
> >>>>> Best,
> >>>>> Danny Chan
> >>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> >>>>>> Hi Danny,
> >>>>>>
> >>>>>> thanks for the proposal. I agree with Jark and Jingsong. Planner
> hints
> >>>>>> and table hints are orthogonal topics that should be discussed
> >>>>> separately.
> >>>>>>
> >>>>>> I share Jingsong's opinion that we should not use planner hints for
> >>>>>> passing connector properties. Planner hints should be optional at
> any
> >>>>>> time. They should not include semantics but only affect execution
> time.
> >>>>>> Connector properties are an important part of the query itself.
> >>>>>>
> >>>>>> Have you thought about options such as `SELECT * FROM t(k=v, k=v)`?
> How
> >>>>>> are other vendors deal with this problem?
> >>>>>>
> >>>>>> Regards,
> >>>>>> Timo
> >>>>>>
> >>>>>>
> >>>>>> On 09.03.20 10:37, Jingsong Li wrote:
> >>>>>>> Hi Danny, +1 for table hints, thanks for driving.
> >>>>>>>
> >>>>>>> I took a look to FLIP, most of content are talking about query
> hints.
> >>>>> It is
> >>>>>>> hard to discussion and voting. So +1 to split it as Jark said.
> >>>>>>>
> >>>>>>> Another thing is configuration that suitable to config with table
> >>>>> hints:
> >>>>>>> "connector.path" and "connector.topic", Are they really suitable
> for
> >>>>> table
> >>>>>>> hints? Looks weird to me. Because I think these properties are the
> >>>>> core of
> >>>>>>> table.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Jingsong Lee
> >>>>>>>
> >>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
> >>>>>>>
> >>>>>>>> Thanks Danny for starting the discussion.
> >>>>>>>> +1 for this feature.
> >>>>>>>>
> >>>>>>>> If we just focus on the table hints not the query hints in this
> >>>>> release,
> >>>>>>>> could you split the FLIP into two FLIPs?
> >>>>>>>> Because it's hard to vote on partial part of a FLIP. You can keep
> >>>>> the table
> >>>>>>>> hints proposal in FLIP-113 and move query hints into another FLIP.
> >>>>>>>> So that we can focuse on the table hints in the FLIP.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Jark
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <[hidden email]
> >
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Danny,
> >>>>>>>>>
> >>>>>>>>> This is a nice feature, +1.
> >>>>>>>>>
> >>>>>>>>> One thing I am interested in but not mentioned in the proposal is
> >>>>> the
> >>>>>>>> error
> >>>>>>>>> handling, as it is quite common for users to write inappropriate
> >>>>> hints in
> >>>>>>>>> SQL code, if illegal or "bad" hints are given, would the system
> >>>>> simply
> >>>>>>>>> ignore them or throw exceptions?
> >>>>>>>>>
> >>>>>>>>> Thanks : )
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>> Weike
> >>>>>>>>>
> >>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <[hidden email]>
> >>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Note:
> >>>>>>>>>> we only plan to support table hints in Flink release 1.11, so
> >>>>> please
> >>>>>>>>> focus
> >>>>>>>>>> mainly on the table hints part and just ignore the planner
> >>>>> hints, sorry
> >>>>>>>>> for
> >>>>>>>>>> that mistake ~
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Danny Chan
> >>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]>,写道:
> >>>>>>>>>>> Hi, fellows ~
> >>>>>>>>>>>
> >>>>>>>>>>> I would like to propose the supports for SQL hints for our
> >>>>> Flink SQL.
> >>>>>>>>>>>
> >>>>>>>>>>> We would support hints syntax as following:
> >>>>>>>>>>>
> >>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> >>>>> parallelism='24') */
> >>>>>>>>>>> from
> >>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */
> >>>>>>>>>>> join
> >>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
> >>>>>>>>>>> on
> >>>>>>>>>>> emp.deptno = dept.deptno
> >>>>>>>>>>>
> >>>>>>>>>>> Basically we would support both query hints(after the SELECT
> >>>>> keyword)
> >>>>>>>>>> and table hints(after the referenced table name), for 1.11, we
> >>>>> plan to
> >>>>>>>>> only
> >>>>>>>>>> support table hints with a hint probably named PROPERTIES:
> >>>>>>>>>>>
> >>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> >>>>>>>>>>>
> >>>>>>>>>>> I am looking forward to your comments.
> >>>>>>>>>>>
> >>>>>>>>>>> You can access the FLIP here:
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>> Danny Chan
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Kurt Young
In reply to this post by Jark Wu-2
Good to have such lovely discussions. I also want to share some of my
opinions.

#1 Regarding to error handling: I also think ignore invalid hints would be
dangerous, maybe
the simplest solution is just throw an exception.

#2 Regarding to property replacement: I don't think we should constraint
ourself to
the meaning of the word "hint", and forbidden it modifying any properties
which can effect
query results. IMO `PROPERTIES` is one of the table hints, and a powerful
one. It can
modify properties located in DDL's WITH block. But I also see the harm that
if we make it
too flexible like change the kafka topic name with a hint. Such use case is
not common and
sounds very dangerous to me. I would propose we have a map of hintable
properties for each
connector, and should validate all passed in properties are actually
hintable. And combining with
#1 error handling, we can throw an exception once received invalid property.

#3 Regarding to global offset: I'm not sure it's feasible. Different
connectors will have totally
different properties to represent offset, some might be timestamps, some
might be string literals
like "earliest", and others might be just integers.

Best,
Kurt


On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <[hidden email]> wrote:

> Hi everyone,
>
> I want to jump in the discussion about the "dynamic start offset" problem.
> First of all, I share the same concern with Timo and Fabian, that the
> "start offset" affects the query semantics, i.e. the query result.
> But "hints" is just used for optimization which should affect the result?
>
> I think the "dynamic start offset" is an very important usability problem
> which will be faced by many streaming platforms.
> I also agree "CREATE TEMPORARY TABLE Temp (LIKE t) WITH
> ('connector.startup-timestamp-millis' = '1578538374471')" is verbose, what
> if we have 10 tables to join?
>
> However, what I want to propose (should be another thread) is a global
> configuration to reset start offsets of all the source connectors
> in the query session, e.g. "table.sources.start-offset". This is possible
> now because `TableSourceFactory.Context` has `getConfiguration`
> method to get the session configuration, and use it to create an adapted
> TableSource.
> Then we can also expose to SQL CLI via SET command, e.g. `SET
> 'table.sources.start-offset'='earliest';`, which is pretty simple and
> straightforward.
>
> This is very similar to KSQL's `SET 'auto.offset.reset'='earliest'` which
> is very helpful IMO.
>
> Best,
> Jark
>
>
> On Tue, 10 Mar 2020 at 22:29, Timo Walther <[hidden email]> wrote:
>
> > Hi Danny,
> >
> > compared to the hints, FLIP-110 is fully compliant to the SQL standard.
> >
> > I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH (k=v)` is
> > too verbose or awkward for the power of basically changing the entire
> > connector. Usually, this statement would just precede the query in a
> > multiline file. So it can be change "in-place" like the hints you
> proposed.
> >
> > Many companies have a well-defined set of tables that should be used. It
> > would be dangerous if users can change the path or topic in a hint. The
> > catalog/catalog manager should be the entity that controls which tables
> > exist and how they can be accessed.
> >
> >  > what’s the problem there if we user the table hints to support “start
> > offset”?
> >
> > IMHO it violates the meaning of a hint. According to the dictionary, a
> > hint is "a statement that expresses indirectly what one prefers not to
> > say explicitly". But offsets are a property that are very explicit.
> >
> > If we go with the hint approach, it should be expressible in the
> > TableSourceFactory which properties are supported for hinting. Or do you
> > plan to offer those hints in a separate Map<String, String> that cannot
> > overwrite existing properties? I think this would be a different story...
> >
> > Regards,
> > Timo
> >
> >
> > On 10.03.20 10:34, Danny Chan wrote:
> > > Thanks Timo ~
> > >
> > > Personally I would say that offset > 0 and start offset = 10 does not
> > have the same semantic, so from the SQL aspect, we can not implement a
> > “starting offset” hint for query with such a syntax.
> > >
> > > And the CREATE TABLE LIKE syntax is a DDL which is just verbose for
> > defining such dynamic parameters even if it could do that, shall we force
> > users to define a temporal table for each query with dynamic params, I
> > would say it’s an awkward solution.
> > >
> > > "Hints should give "hints" but not affect the actual produced result.”
> > You mentioned that multiple times and could we give a reason, what’s the
> > problem there if we user the table hints to support “start offset” ? From
> > my side I saw some benefits for that:
> > >
> > >
> > > • It’s very convent to set up these parameters, the syntax is very much
> > like the DDL definition
> > > • It’s scope is very clear, right on the table it attathed
> > > • It does not affect the table schema, which means in order to specify
> > the offset, there is no need to define an offset column which is weird
> > actually, offset should never be a column, it’s more like a metadata or a
> > start option.
> > >
> > > So in total, FLIP-110 uses the offset more like a Hive partition prune,
> > we can do that if we have an offset column, but most of the case we do
> not
> > define that, so there is actually no conflict or overlap.
> > >
> > > Best,
> > > Danny Chan
> > > 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
> > >> Hi Danny,
> > >>
> > >> shouldn't FLIP-110[1] solve most of the problems we have around
> defining
> > >> table properties more dynamically without manual schema work? Also
> > >> offset definition is easier with such a syntax. They must not be
> defined
> > >> in catalog but could be temporary tables that extend from the original
> > >> table.
> > >>
> > >> In general, we should aim to keep the syntax concise and don't provide
> > >> too many ways of doing the same thing. Hints should give "hints" but
> not
> > >> affect the actual produced result.
> > >>
> > >> Some connector properties might also change the plan or schema in the
> > >> future. E.g. they might also define whether a table source supports
> > >> certain push-downs (e.g. predicate push-down).
> > >>
> > >> Dawid is currently working a draft that might makes it possible to
> > >> expose a Kafka offset via the schema such that `SELECT * FROM Topic
> > >> WHERE offset > 10` would become possible and could be pushed down. But
> > >> this is of course, not planned initially.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> [1]
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> > >>
> > >>
> > >>
> > >> On 10.03.20 08:34, Danny Chan wrote:
> > >>> Thanks Wenlong ~
> > >>>
> > >>> For PROPERTIES Hint Error handling
> > >>>
> > >>> Actually we have no way to figure out whether a error prone hint is a
> > PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’, we
> do
> > not know if this hint is a PROPERTIES hint, what we know is that the hint
> > name was not registered in our Flink.
> > >>>
> > >>> If the user writes the hint name correctly (i.e. PROPERTIES), we did
> > can enforce the validation of the hint options though the pluggable
> > HintOptionChecker.
> > >>>
> > >>> For PROPERTIES Hint Option Format
> > >>>
> > >>> For a key value style hint option, the key can be either a simple
> > identifier or a string literal, which means that it’s compatible with our
> > DDL syntax. We support simple identifier because many other hints do not
> > have the component complex keys like the table properties, and we want to
> > unify the parse block.
> > >>>
> > >>> Best,
> > >>> Danny Chan
> > >>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]>,写道:
> > >>>> Hi Danny, thanks for the proposal. +1 for adding table hints, it is
> > really
> > >>>> a necessary feature for flink sql to integrate with a catalog.
> > >>>>
> > >>>> For error handling, I think it would be more natural to throw an
> > >>>> exception when error table hint provided, because the properties in
> > hint
> > >>>> will be merged and used to find the table factory which would cause
> an
> > >>>> exception when error properties provided, right? On the other hand,
> > unlike
> > >>>> other hints which just affect the way to execute the query, the
> > property
> > >>>> table hint actually affects the result of the query, we should never
> > ignore
> > >>>> the given property hints.
> > >>>>
> > >>>> For the format of property hints, currently, in sql client, we
> accept
> > >>>> properties in format of string only in DDL:
> 'connector.type'='kafka',
> > I
> > >>>> think the format of properties in hint should be the same as the
> > format we
> > >>>> defined in ddl. What do you think?
> > >>>>
> > >>>> Bests,
> > >>>> Wenlong Lyu
> > >>>>
> > >>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]>
> > wrote:
> > >>>>
> > >>>>> To Weike: About the Error Handing
> > >>>>>
> > >>>>> To be consistent with other SQL vendors, the default is to log
> > warnings
> > >>>>> and if there is any error (invalid hint name or options), the hint
> > is just
> > >>>>> ignored. I have already addressed in the wiki.
> > >>>>>
> > >>>>> To Timo: About the PROPERTIES Table Hint
> > >>>>>
> > >>>>> • The properties hints is also optional, user can pass in an option
> > to
> > >>>>> override the table properties but this does not mean it is
> required.
> > >>>>> • They should not include semantics: does the properties belong to
> > >>>>> semantic ? I don't think so, the plan does not change right ? The
> > result
> > >>>>> set may be affected, but there are already some hints do so, for
> > example,
> > >>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> > >>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL standard
> > >>>>> compared to the hints way(which is included in comments)
> > >>>>> • I actually didn't found any vendors to support such grammar, and
> > there
> > >>>>> is no way to override table level properties dynamically. For
> normal
> > RDBMS,
> > >>>>> I think there are no requests for such dynamic parameters because
> > all the
> > >>>>> table have the same storage and computation and they are almost all
> > batch
> > >>>>> tables.
> > >>>>> • While Flink as a computation engine has many connectors,
> > especially for
> > >>>>> some message queue like Kafka, we would have a start_offset which
> is
> > >>>>> different each time we start the query, such parameters can not be
> > >>>>> persisted to catalog, because it’s not static, this is actually the
> > >>>>> background we propose the table hints to indicate such properties
> > >>>>> dynamically.
> > >>>>>
> > >>>>>
> > >>>>> To Jark and Jinsong: I have removed the query hints part and change
> > the
> > >>>>> title.
> > >>>>>
> > >>>>> [1]
> > >>>>>
> >
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> > >>>>>
> > >>>>> Best,
> > >>>>> Danny Chan
> > >>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> > >>>>>> Hi Danny,
> > >>>>>>
> > >>>>>> thanks for the proposal. I agree with Jark and Jingsong. Planner
> > hints
> > >>>>>> and table hints are orthogonal topics that should be discussed
> > >>>>> separately.
> > >>>>>>
> > >>>>>> I share Jingsong's opinion that we should not use planner hints
> for
> > >>>>>> passing connector properties. Planner hints should be optional at
> > any
> > >>>>>> time. They should not include semantics but only affect execution
> > time.
> > >>>>>> Connector properties are an important part of the query itself.
> > >>>>>>
> > >>>>>> Have you thought about options such as `SELECT * FROM t(k=v,
> k=v)`?
> > How
> > >>>>>> are other vendors deal with this problem?
> > >>>>>>
> > >>>>>> Regards,
> > >>>>>> Timo
> > >>>>>>
> > >>>>>>
> > >>>>>> On 09.03.20 10:37, Jingsong Li wrote:
> > >>>>>>> Hi Danny, +1 for table hints, thanks for driving.
> > >>>>>>>
> > >>>>>>> I took a look to FLIP, most of content are talking about query
> > hints.
> > >>>>> It is
> > >>>>>>> hard to discussion and voting. So +1 to split it as Jark said.
> > >>>>>>>
> > >>>>>>> Another thing is configuration that suitable to config with table
> > >>>>> hints:
> > >>>>>>> "connector.path" and "connector.topic", Are they really suitable
> > for
> > >>>>> table
> > >>>>>>> hints? Looks weird to me. Because I think these properties are
> the
> > >>>>> core of
> > >>>>>>> table.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Jingsong Lee
> > >>>>>>>
> > >>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]> wrote:
> > >>>>>>>
> > >>>>>>>> Thanks Danny for starting the discussion.
> > >>>>>>>> +1 for this feature.
> > >>>>>>>>
> > >>>>>>>> If we just focus on the table hints not the query hints in this
> > >>>>> release,
> > >>>>>>>> could you split the FLIP into two FLIPs?
> > >>>>>>>> Because it's hard to vote on partial part of a FLIP. You can
> keep
> > >>>>> the table
> > >>>>>>>> hints proposal in FLIP-113 and move query hints into another
> FLIP.
> > >>>>>>>> So that we can focuse on the table hints in the FLIP.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Jark
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <
> [hidden email]
> > >
> > >>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Danny,
> > >>>>>>>>>
> > >>>>>>>>> This is a nice feature, +1.
> > >>>>>>>>>
> > >>>>>>>>> One thing I am interested in but not mentioned in the proposal
> is
> > >>>>> the
> > >>>>>>>> error
> > >>>>>>>>> handling, as it is quite common for users to write
> inappropriate
> > >>>>> hints in
> > >>>>>>>>> SQL code, if illegal or "bad" hints are given, would the system
> > >>>>> simply
> > >>>>>>>>> ignore them or throw exceptions?
> > >>>>>>>>>
> > >>>>>>>>> Thanks : )
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Weike
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <
> [hidden email]>
> > >>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Note:
> > >>>>>>>>>> we only plan to support table hints in Flink release 1.11, so
> > >>>>> please
> > >>>>>>>>> focus
> > >>>>>>>>>> mainly on the table hints part and just ignore the planner
> > >>>>> hints, sorry
> > >>>>>>>>> for
> > >>>>>>>>>> that mistake ~
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Danny Chan
> > >>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]
> >,写道:
> > >>>>>>>>>>> Hi, fellows ~
> > >>>>>>>>>>>
> > >>>>>>>>>>> I would like to propose the supports for SQL hints for our
> > >>>>> Flink SQL.
> > >>>>>>>>>>>
> > >>>>>>>>>>> We would support hints syntax as following:
> > >>>>>>>>>>>
> > >>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> > >>>>> parallelism='24') */
> > >>>>>>>>>>> from
> > >>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */
> > >>>>>>>>>>> join
> > >>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > >>>>>>>>>>> on
> > >>>>>>>>>>> emp.deptno = dept.deptno
> > >>>>>>>>>>>
> > >>>>>>>>>>> Basically we would support both query hints(after the SELECT
> > >>>>> keyword)
> > >>>>>>>>>> and table hints(after the referenced table name), for 1.11, we
> > >>>>> plan to
> > >>>>>>>>> only
> > >>>>>>>>>> support table hints with a hint probably named PROPERTIES:
> > >>>>>>>>>>>
> > >>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > >>>>>>>>>>>
> > >>>>>>>>>>> I am looking forward to your comments.
> > >>>>>>>>>>>
> > >>>>>>>>>>> You can access the FLIP here:
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > >>>>>>>>>>>
> > >>>>>>>>>>> Best,
> > >>>>>>>>>>> Danny Chan
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Jark Wu-2
Hi Kurt,

#3 Regarding to global offset:
I'm not saying to use the global configuration to override connector
properties by the planner.
But the connector should take this configuration and translate into their
client API.
AFAIK, almost all the message queues support eariliest and latest and a
timestamp value as start point.
So we can support 3 options for this configuration: "eariliest", "latest"
and a timestamp string value.
Of course, this can't solve 100% cases, but I guess can sovle 80% or 90%
cases.
And the remaining cases can be resolved by LIKE syntax which I guess is not
very common cases.

Best,
Jark


On Wed, 11 Mar 2020 at 10:33, Kurt Young <[hidden email]> wrote:

> Good to have such lovely discussions. I also want to share some of my
> opinions.
>
> #1 Regarding to error handling: I also think ignore invalid hints would be
> dangerous, maybe
> the simplest solution is just throw an exception.
>
> #2 Regarding to property replacement: I don't think we should constraint
> ourself to
> the meaning of the word "hint", and forbidden it modifying any properties
> which can effect
> query results. IMO `PROPERTIES` is one of the table hints, and a powerful
> one. It can
> modify properties located in DDL's WITH block. But I also see the harm that
> if we make it
> too flexible like change the kafka topic name with a hint. Such use case is
> not common and
> sounds very dangerous to me. I would propose we have a map of hintable
> properties for each
> connector, and should validate all passed in properties are actually
> hintable. And combining with
> #1 error handling, we can throw an exception once received invalid
> property.
>
> #3 Regarding to global offset: I'm not sure it's feasible. Different
> connectors will have totally
> different properties to represent offset, some might be timestamps, some
> might be string literals
> like "earliest", and others might be just integers.
>
> Best,
> Kurt
>
>
> On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > I want to jump in the discussion about the "dynamic start offset"
> problem.
> > First of all, I share the same concern with Timo and Fabian, that the
> > "start offset" affects the query semantics, i.e. the query result.
> > But "hints" is just used for optimization which should affect the result?
> >
> > I think the "dynamic start offset" is an very important usability problem
> > which will be faced by many streaming platforms.
> > I also agree "CREATE TEMPORARY TABLE Temp (LIKE t) WITH
> > ('connector.startup-timestamp-millis' = '1578538374471')" is verbose,
> what
> > if we have 10 tables to join?
> >
> > However, what I want to propose (should be another thread) is a global
> > configuration to reset start offsets of all the source connectors
> > in the query session, e.g. "table.sources.start-offset". This is possible
> > now because `TableSourceFactory.Context` has `getConfiguration`
> > method to get the session configuration, and use it to create an adapted
> > TableSource.
> > Then we can also expose to SQL CLI via SET command, e.g. `SET
> > 'table.sources.start-offset'='earliest';`, which is pretty simple and
> > straightforward.
> >
> > This is very similar to KSQL's `SET 'auto.offset.reset'='earliest'` which
> > is very helpful IMO.
> >
> > Best,
> > Jark
> >
> >
> > On Tue, 10 Mar 2020 at 22:29, Timo Walther <[hidden email]> wrote:
> >
> > > Hi Danny,
> > >
> > > compared to the hints, FLIP-110 is fully compliant to the SQL standard.
> > >
> > > I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH (k=v)` is
> > > too verbose or awkward for the power of basically changing the entire
> > > connector. Usually, this statement would just precede the query in a
> > > multiline file. So it can be change "in-place" like the hints you
> > proposed.
> > >
> > > Many companies have a well-defined set of tables that should be used.
> It
> > > would be dangerous if users can change the path or topic in a hint. The
> > > catalog/catalog manager should be the entity that controls which tables
> > > exist and how they can be accessed.
> > >
> > >  > what’s the problem there if we user the table hints to support
> “start
> > > offset”?
> > >
> > > IMHO it violates the meaning of a hint. According to the dictionary, a
> > > hint is "a statement that expresses indirectly what one prefers not to
> > > say explicitly". But offsets are a property that are very explicit.
> > >
> > > If we go with the hint approach, it should be expressible in the
> > > TableSourceFactory which properties are supported for hinting. Or do
> you
> > > plan to offer those hints in a separate Map<String, String> that cannot
> > > overwrite existing properties? I think this would be a different
> story...
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 10.03.20 10:34, Danny Chan wrote:
> > > > Thanks Timo ~
> > > >
> > > > Personally I would say that offset > 0 and start offset = 10 does not
> > > have the same semantic, so from the SQL aspect, we can not implement a
> > > “starting offset” hint for query with such a syntax.
> > > >
> > > > And the CREATE TABLE LIKE syntax is a DDL which is just verbose for
> > > defining such dynamic parameters even if it could do that, shall we
> force
> > > users to define a temporal table for each query with dynamic params, I
> > > would say it’s an awkward solution.
> > > >
> > > > "Hints should give "hints" but not affect the actual produced
> result.”
> > > You mentioned that multiple times and could we give a reason, what’s
> the
> > > problem there if we user the table hints to support “start offset” ?
> From
> > > my side I saw some benefits for that:
> > > >
> > > >
> > > > • It’s very convent to set up these parameters, the syntax is very
> much
> > > like the DDL definition
> > > > • It’s scope is very clear, right on the table it attathed
> > > > • It does not affect the table schema, which means in order to
> specify
> > > the offset, there is no need to define an offset column which is weird
> > > actually, offset should never be a column, it’s more like a metadata
> or a
> > > start option.
> > > >
> > > > So in total, FLIP-110 uses the offset more like a Hive partition
> prune,
> > > we can do that if we have an offset column, but most of the case we do
> > not
> > > define that, so there is actually no conflict or overlap.
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
> > > >> Hi Danny,
> > > >>
> > > >> shouldn't FLIP-110[1] solve most of the problems we have around
> > defining
> > > >> table properties more dynamically without manual schema work? Also
> > > >> offset definition is easier with such a syntax. They must not be
> > defined
> > > >> in catalog but could be temporary tables that extend from the
> original
> > > >> table.
> > > >>
> > > >> In general, we should aim to keep the syntax concise and don't
> provide
> > > >> too many ways of doing the same thing. Hints should give "hints" but
> > not
> > > >> affect the actual produced result.
> > > >>
> > > >> Some connector properties might also change the plan or schema in
> the
> > > >> future. E.g. they might also define whether a table source supports
> > > >> certain push-downs (e.g. predicate push-down).
> > > >>
> > > >> Dawid is currently working a draft that might makes it possible to
> > > >> expose a Kafka offset via the schema such that `SELECT * FROM Topic
> > > >> WHERE offset > 10` would become possible and could be pushed down.
> But
> > > >> this is of course, not planned initially.
> > > >>
> > > >> Regards,
> > > >> Timo
> > > >>
> > > >>
> > > >> [1]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> > > >>
> > > >>
> > > >>
> > > >> On 10.03.20 08:34, Danny Chan wrote:
> > > >>> Thanks Wenlong ~
> > > >>>
> > > >>> For PROPERTIES Hint Error handling
> > > >>>
> > > >>> Actually we have no way to figure out whether a error prone hint
> is a
> > > PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’,
> we
> > do
> > > not know if this hint is a PROPERTIES hint, what we know is that the
> hint
> > > name was not registered in our Flink.
> > > >>>
> > > >>> If the user writes the hint name correctly (i.e. PROPERTIES), we
> did
> > > can enforce the validation of the hint options though the pluggable
> > > HintOptionChecker.
> > > >>>
> > > >>> For PROPERTIES Hint Option Format
> > > >>>
> > > >>> For a key value style hint option, the key can be either a simple
> > > identifier or a string literal, which means that it’s compatible with
> our
> > > DDL syntax. We support simple identifier because many other hints do
> not
> > > have the component complex keys like the table properties, and we want
> to
> > > unify the parse block.
> > > >>>
> > > >>> Best,
> > > >>> Danny Chan
> > > >>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]
> >,写道:
> > > >>>> Hi Danny, thanks for the proposal. +1 for adding table hints, it
> is
> > > really
> > > >>>> a necessary feature for flink sql to integrate with a catalog.
> > > >>>>
> > > >>>> For error handling, I think it would be more natural to throw an
> > > >>>> exception when error table hint provided, because the properties
> in
> > > hint
> > > >>>> will be merged and used to find the table factory which would
> cause
> > an
> > > >>>> exception when error properties provided, right? On the other
> hand,
> > > unlike
> > > >>>> other hints which just affect the way to execute the query, the
> > > property
> > > >>>> table hint actually affects the result of the query, we should
> never
> > > ignore
> > > >>>> the given property hints.
> > > >>>>
> > > >>>> For the format of property hints, currently, in sql client, we
> > accept
> > > >>>> properties in format of string only in DDL:
> > 'connector.type'='kafka',
> > > I
> > > >>>> think the format of properties in hint should be the same as the
> > > format we
> > > >>>> defined in ddl. What do you think?
> > > >>>>
> > > >>>> Bests,
> > > >>>> Wenlong Lyu
> > > >>>>
> > > >>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]>
> > > wrote:
> > > >>>>
> > > >>>>> To Weike: About the Error Handing
> > > >>>>>
> > > >>>>> To be consistent with other SQL vendors, the default is to log
> > > warnings
> > > >>>>> and if there is any error (invalid hint name or options), the
> hint
> > > is just
> > > >>>>> ignored. I have already addressed in the wiki.
> > > >>>>>
> > > >>>>> To Timo: About the PROPERTIES Table Hint
> > > >>>>>
> > > >>>>> • The properties hints is also optional, user can pass in an
> option
> > > to
> > > >>>>> override the table properties but this does not mean it is
> > required.
> > > >>>>> • They should not include semantics: does the properties belong
> to
> > > >>>>> semantic ? I don't think so, the plan does not change right ? The
> > > result
> > > >>>>> set may be affected, but there are already some hints do so, for
> > > example,
> > > >>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> > > >>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL
> standard
> > > >>>>> compared to the hints way(which is included in comments)
> > > >>>>> • I actually didn't found any vendors to support such grammar,
> and
> > > there
> > > >>>>> is no way to override table level properties dynamically. For
> > normal
> > > RDBMS,
> > > >>>>> I think there are no requests for such dynamic parameters because
> > > all the
> > > >>>>> table have the same storage and computation and they are almost
> all
> > > batch
> > > >>>>> tables.
> > > >>>>> • While Flink as a computation engine has many connectors,
> > > especially for
> > > >>>>> some message queue like Kafka, we would have a start_offset which
> > is
> > > >>>>> different each time we start the query, such parameters can not
> be
> > > >>>>> persisted to catalog, because it’s not static, this is actually
> the
> > > >>>>> background we propose the table hints to indicate such properties
> > > >>>>> dynamically.
> > > >>>>>
> > > >>>>>
> > > >>>>> To Jark and Jinsong: I have removed the query hints part and
> change
> > > the
> > > >>>>> title.
> > > >>>>>
> > > >>>>> [1]
> > > >>>>>
> > >
> >
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Danny Chan
> > > >>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> > > >>>>>> Hi Danny,
> > > >>>>>>
> > > >>>>>> thanks for the proposal. I agree with Jark and Jingsong. Planner
> > > hints
> > > >>>>>> and table hints are orthogonal topics that should be discussed
> > > >>>>> separately.
> > > >>>>>>
> > > >>>>>> I share Jingsong's opinion that we should not use planner hints
> > for
> > > >>>>>> passing connector properties. Planner hints should be optional
> at
> > > any
> > > >>>>>> time. They should not include semantics but only affect
> execution
> > > time.
> > > >>>>>> Connector properties are an important part of the query itself.
> > > >>>>>>
> > > >>>>>> Have you thought about options such as `SELECT * FROM t(k=v,
> > k=v)`?
> > > How
> > > >>>>>> are other vendors deal with this problem?
> > > >>>>>>
> > > >>>>>> Regards,
> > > >>>>>> Timo
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On 09.03.20 10:37, Jingsong Li wrote:
> > > >>>>>>> Hi Danny, +1 for table hints, thanks for driving.
> > > >>>>>>>
> > > >>>>>>> I took a look to FLIP, most of content are talking about query
> > > hints.
> > > >>>>> It is
> > > >>>>>>> hard to discussion and voting. So +1 to split it as Jark said.
> > > >>>>>>>
> > > >>>>>>> Another thing is configuration that suitable to config with
> table
> > > >>>>> hints:
> > > >>>>>>> "connector.path" and "connector.topic", Are they really
> suitable
> > > for
> > > >>>>> table
> > > >>>>>>> hints? Looks weird to me. Because I think these properties are
> > the
> > > >>>>> core of
> > > >>>>>>> table.
> > > >>>>>>>
> > > >>>>>>> Best,
> > > >>>>>>> Jingsong Lee
> > > >>>>>>>
> > > >>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]>
> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks Danny for starting the discussion.
> > > >>>>>>>> +1 for this feature.
> > > >>>>>>>>
> > > >>>>>>>> If we just focus on the table hints not the query hints in
> this
> > > >>>>> release,
> > > >>>>>>>> could you split the FLIP into two FLIPs?
> > > >>>>>>>> Because it's hard to vote on partial part of a FLIP. You can
> > keep
> > > >>>>> the table
> > > >>>>>>>> hints proposal in FLIP-113 and move query hints into another
> > FLIP.
> > > >>>>>>>> So that we can focuse on the table hints in the FLIP.
> > > >>>>>>>>
> > > >>>>>>>> Thanks,
> > > >>>>>>>> Jark
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <
> > [hidden email]
> > > >
> > > >>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi Danny,
> > > >>>>>>>>>
> > > >>>>>>>>> This is a nice feature, +1.
> > > >>>>>>>>>
> > > >>>>>>>>> One thing I am interested in but not mentioned in the
> proposal
> > is
> > > >>>>> the
> > > >>>>>>>> error
> > > >>>>>>>>> handling, as it is quite common for users to write
> > inappropriate
> > > >>>>> hints in
> > > >>>>>>>>> SQL code, if illegal or "bad" hints are given, would the
> system
> > > >>>>> simply
> > > >>>>>>>>> ignore them or throw exceptions?
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks : )
> > > >>>>>>>>>
> > > >>>>>>>>> Best,
> > > >>>>>>>>> Weike
> > > >>>>>>>>>
> > > >>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <
> > [hidden email]>
> > > >>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Note:
> > > >>>>>>>>>> we only plan to support table hints in Flink release 1.11,
> so
> > > >>>>> please
> > > >>>>>>>>> focus
> > > >>>>>>>>>> mainly on the table hints part and just ignore the planner
> > > >>>>> hints, sorry
> > > >>>>>>>>> for
> > > >>>>>>>>>> that mistake ~
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Danny Chan
> > > >>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]
> > >,写道:
> > > >>>>>>>>>>> Hi, fellows ~
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I would like to propose the supports for SQL hints for our
> > > >>>>> Flink SQL.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> We would support hints syntax as following:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> > > >>>>> parallelism='24') */
> > > >>>>>>>>>>> from
> > > >>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */
> > > >>>>>>>>>>> join
> > > >>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > >>>>>>>>>>> on
> > > >>>>>>>>>>> emp.deptno = dept.deptno
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Basically we would support both query hints(after the
> SELECT
> > > >>>>> keyword)
> > > >>>>>>>>>> and table hints(after the referenced table name), for 1.11,
> we
> > > >>>>> plan to
> > > >>>>>>>>> only
> > > >>>>>>>>>> support table hints with a hint probably named PROPERTIES:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I am looking forward to your comments.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> You can access the FLIP here:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Best,
> > > >>>>>>>>>>> Danny Chan
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Kurt Young
If a specific connector want to have such parameter and read if out of
configuration, then that's fine.
If we are talking about a configuration for all kinds of sources, I would
be super careful about that.
It's true it can solve maybe 80% cases, but it will also make the left 20%
feels weird.

Best,
Kurt


On Wed, Mar 11, 2020 at 11:00 AM Jark Wu <[hidden email]> wrote:

> Hi Kurt,
>
> #3 Regarding to global offset:
> I'm not saying to use the global configuration to override connector
> properties by the planner.
> But the connector should take this configuration and translate into their
> client API.
> AFAIK, almost all the message queues support eariliest and latest and a
> timestamp value as start point.
> So we can support 3 options for this configuration: "eariliest", "latest"
> and a timestamp string value.
> Of course, this can't solve 100% cases, but I guess can sovle 80% or 90%
> cases.
> And the remaining cases can be resolved by LIKE syntax which I guess is not
> very common cases.
>
> Best,
> Jark
>
>
> On Wed, 11 Mar 2020 at 10:33, Kurt Young <[hidden email]> wrote:
>
> > Good to have such lovely discussions. I also want to share some of my
> > opinions.
> >
> > #1 Regarding to error handling: I also think ignore invalid hints would
> be
> > dangerous, maybe
> > the simplest solution is just throw an exception.
> >
> > #2 Regarding to property replacement: I don't think we should constraint
> > ourself to
> > the meaning of the word "hint", and forbidden it modifying any properties
> > which can effect
> > query results. IMO `PROPERTIES` is one of the table hints, and a powerful
> > one. It can
> > modify properties located in DDL's WITH block. But I also see the harm
> that
> > if we make it
> > too flexible like change the kafka topic name with a hint. Such use case
> is
> > not common and
> > sounds very dangerous to me. I would propose we have a map of hintable
> > properties for each
> > connector, and should validate all passed in properties are actually
> > hintable. And combining with
> > #1 error handling, we can throw an exception once received invalid
> > property.
> >
> > #3 Regarding to global offset: I'm not sure it's feasible. Different
> > connectors will have totally
> > different properties to represent offset, some might be timestamps, some
> > might be string literals
> > like "earliest", and others might be just integers.
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <[hidden email]> wrote:
> >
> > > Hi everyone,
> > >
> > > I want to jump in the discussion about the "dynamic start offset"
> > problem.
> > > First of all, I share the same concern with Timo and Fabian, that the
> > > "start offset" affects the query semantics, i.e. the query result.
> > > But "hints" is just used for optimization which should affect the
> result?
> > >
> > > I think the "dynamic start offset" is an very important usability
> problem
> > > which will be faced by many streaming platforms.
> > > I also agree "CREATE TEMPORARY TABLE Temp (LIKE t) WITH
> > > ('connector.startup-timestamp-millis' = '1578538374471')" is verbose,
> > what
> > > if we have 10 tables to join?
> > >
> > > However, what I want to propose (should be another thread) is a global
> > > configuration to reset start offsets of all the source connectors
> > > in the query session, e.g. "table.sources.start-offset". This is
> possible
> > > now because `TableSourceFactory.Context` has `getConfiguration`
> > > method to get the session configuration, and use it to create an
> adapted
> > > TableSource.
> > > Then we can also expose to SQL CLI via SET command, e.g. `SET
> > > 'table.sources.start-offset'='earliest';`, which is pretty simple and
> > > straightforward.
> > >
> > > This is very similar to KSQL's `SET 'auto.offset.reset'='earliest'`
> which
> > > is very helpful IMO.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Tue, 10 Mar 2020 at 22:29, Timo Walther <[hidden email]> wrote:
> > >
> > > > Hi Danny,
> > > >
> > > > compared to the hints, FLIP-110 is fully compliant to the SQL
> standard.
> > > >
> > > > I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH (k=v)`
> is
> > > > too verbose or awkward for the power of basically changing the entire
> > > > connector. Usually, this statement would just precede the query in a
> > > > multiline file. So it can be change "in-place" like the hints you
> > > proposed.
> > > >
> > > > Many companies have a well-defined set of tables that should be used.
> > It
> > > > would be dangerous if users can change the path or topic in a hint.
> The
> > > > catalog/catalog manager should be the entity that controls which
> tables
> > > > exist and how they can be accessed.
> > > >
> > > >  > what’s the problem there if we user the table hints to support
> > “start
> > > > offset”?
> > > >
> > > > IMHO it violates the meaning of a hint. According to the dictionary,
> a
> > > > hint is "a statement that expresses indirectly what one prefers not
> to
> > > > say explicitly". But offsets are a property that are very explicit.
> > > >
> > > > If we go with the hint approach, it should be expressible in the
> > > > TableSourceFactory which properties are supported for hinting. Or do
> > you
> > > > plan to offer those hints in a separate Map<String, String> that
> cannot
> > > > overwrite existing properties? I think this would be a different
> > story...
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 10.03.20 10:34, Danny Chan wrote:
> > > > > Thanks Timo ~
> > > > >
> > > > > Personally I would say that offset > 0 and start offset = 10 does
> not
> > > > have the same semantic, so from the SQL aspect, we can not implement
> a
> > > > “starting offset” hint for query with such a syntax.
> > > > >
> > > > > And the CREATE TABLE LIKE syntax is a DDL which is just verbose for
> > > > defining such dynamic parameters even if it could do that, shall we
> > force
> > > > users to define a temporal table for each query with dynamic params,
> I
> > > > would say it’s an awkward solution.
> > > > >
> > > > > "Hints should give "hints" but not affect the actual produced
> > result.”
> > > > You mentioned that multiple times and could we give a reason, what’s
> > the
> > > > problem there if we user the table hints to support “start offset” ?
> > From
> > > > my side I saw some benefits for that:
> > > > >
> > > > >
> > > > > • It’s very convent to set up these parameters, the syntax is very
> > much
> > > > like the DDL definition
> > > > > • It’s scope is very clear, right on the table it attathed
> > > > > • It does not affect the table schema, which means in order to
> > specify
> > > > the offset, there is no need to define an offset column which is
> weird
> > > > actually, offset should never be a column, it’s more like a metadata
> > or a
> > > > start option.
> > > > >
> > > > > So in total, FLIP-110 uses the offset more like a Hive partition
> > prune,
> > > > we can do that if we have an offset column, but most of the case we
> do
> > > not
> > > > define that, so there is actually no conflict or overlap.
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > > 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
> > > > >> Hi Danny,
> > > > >>
> > > > >> shouldn't FLIP-110[1] solve most of the problems we have around
> > > defining
> > > > >> table properties more dynamically without manual schema work? Also
> > > > >> offset definition is easier with such a syntax. They must not be
> > > defined
> > > > >> in catalog but could be temporary tables that extend from the
> > original
> > > > >> table.
> > > > >>
> > > > >> In general, we should aim to keep the syntax concise and don't
> > provide
> > > > >> too many ways of doing the same thing. Hints should give "hints"
> but
> > > not
> > > > >> affect the actual produced result.
> > > > >>
> > > > >> Some connector properties might also change the plan or schema in
> > the
> > > > >> future. E.g. they might also define whether a table source
> supports
> > > > >> certain push-downs (e.g. predicate push-down).
> > > > >>
> > > > >> Dawid is currently working a draft that might makes it possible to
> > > > >> expose a Kafka offset via the schema such that `SELECT * FROM
> Topic
> > > > >> WHERE offset > 10` would become possible and could be pushed down.
> > But
> > > > >> this is of course, not planned initially.
> > > > >>
> > > > >> Regards,
> > > > >> Timo
> > > > >>
> > > > >>
> > > > >> [1]
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> > > > >>
> > > > >>
> > > > >>
> > > > >> On 10.03.20 08:34, Danny Chan wrote:
> > > > >>> Thanks Wenlong ~
> > > > >>>
> > > > >>> For PROPERTIES Hint Error handling
> > > > >>>
> > > > >>> Actually we have no way to figure out whether a error prone hint
> > is a
> > > > PROPERTIES hint, for example, if use writes a hint like ‘PROPERTIAS’,
> > we
> > > do
> > > > not know if this hint is a PROPERTIES hint, what we know is that the
> > hint
> > > > name was not registered in our Flink.
> > > > >>>
> > > > >>> If the user writes the hint name correctly (i.e. PROPERTIES), we
> > did
> > > > can enforce the validation of the hint options though the pluggable
> > > > HintOptionChecker.
> > > > >>>
> > > > >>> For PROPERTIES Hint Option Format
> > > > >>>
> > > > >>> For a key value style hint option, the key can be either a simple
> > > > identifier or a string literal, which means that it’s compatible with
> > our
> > > > DDL syntax. We support simple identifier because many other hints do
> > not
> > > > have the component complex keys like the table properties, and we
> want
> > to
> > > > unify the parse block.
> > > > >>>
> > > > >>> Best,
> > > > >>> Danny Chan
> > > > >>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]
> > >,写道:
> > > > >>>> Hi Danny, thanks for the proposal. +1 for adding table hints, it
> > is
> > > > really
> > > > >>>> a necessary feature for flink sql to integrate with a catalog.
> > > > >>>>
> > > > >>>> For error handling, I think it would be more natural to throw an
> > > > >>>> exception when error table hint provided, because the properties
> > in
> > > > hint
> > > > >>>> will be merged and used to find the table factory which would
> > cause
> > > an
> > > > >>>> exception when error properties provided, right? On the other
> > hand,
> > > > unlike
> > > > >>>> other hints which just affect the way to execute the query, the
> > > > property
> > > > >>>> table hint actually affects the result of the query, we should
> > never
> > > > ignore
> > > > >>>> the given property hints.
> > > > >>>>
> > > > >>>> For the format of property hints, currently, in sql client, we
> > > accept
> > > > >>>> properties in format of string only in DDL:
> > > 'connector.type'='kafka',
> > > > I
> > > > >>>> think the format of properties in hint should be the same as the
> > > > format we
> > > > >>>> defined in ddl. What do you think?
> > > > >>>>
> > > > >>>> Bests,
> > > > >>>> Wenlong Lyu
> > > > >>>>
> > > > >>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <[hidden email]>
> > > > wrote:
> > > > >>>>
> > > > >>>>> To Weike: About the Error Handing
> > > > >>>>>
> > > > >>>>> To be consistent with other SQL vendors, the default is to log
> > > > warnings
> > > > >>>>> and if there is any error (invalid hint name or options), the
> > hint
> > > > is just
> > > > >>>>> ignored. I have already addressed in the wiki.
> > > > >>>>>
> > > > >>>>> To Timo: About the PROPERTIES Table Hint
> > > > >>>>>
> > > > >>>>> • The properties hints is also optional, user can pass in an
> > option
> > > > to
> > > > >>>>> override the table properties but this does not mean it is
> > > required.
> > > > >>>>> • They should not include semantics: does the properties belong
> > to
> > > > >>>>> semantic ? I don't think so, the plan does not change right ?
> The
> > > > result
> > > > >>>>> set may be affected, but there are already some hints do so,
> for
> > > > example,
> > > > >>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> > > > >>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL
> > standard
> > > > >>>>> compared to the hints way(which is included in comments)
> > > > >>>>> • I actually didn't found any vendors to support such grammar,
> > and
> > > > there
> > > > >>>>> is no way to override table level properties dynamically. For
> > > normal
> > > > RDBMS,
> > > > >>>>> I think there are no requests for such dynamic parameters
> because
> > > > all the
> > > > >>>>> table have the same storage and computation and they are almost
> > all
> > > > batch
> > > > >>>>> tables.
> > > > >>>>> • While Flink as a computation engine has many connectors,
> > > > especially for
> > > > >>>>> some message queue like Kafka, we would have a start_offset
> which
> > > is
> > > > >>>>> different each time we start the query, such parameters can not
> > be
> > > > >>>>> persisted to catalog, because it’s not static, this is actually
> > the
> > > > >>>>> background we propose the table hints to indicate such
> properties
> > > > >>>>> dynamically.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> To Jark and Jinsong: I have removed the query hints part and
> > change
> > > > the
> > > > >>>>> title.
> > > > >>>>>
> > > > >>>>> [1]
> > > > >>>>>
> > > >
> > >
> >
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>> Danny Chan
> > > > >>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]>,写道:
> > > > >>>>>> Hi Danny,
> > > > >>>>>>
> > > > >>>>>> thanks for the proposal. I agree with Jark and Jingsong.
> Planner
> > > > hints
> > > > >>>>>> and table hints are orthogonal topics that should be discussed
> > > > >>>>> separately.
> > > > >>>>>>
> > > > >>>>>> I share Jingsong's opinion that we should not use planner
> hints
> > > for
> > > > >>>>>> passing connector properties. Planner hints should be optional
> > at
> > > > any
> > > > >>>>>> time. They should not include semantics but only affect
> > execution
> > > > time.
> > > > >>>>>> Connector properties are an important part of the query
> itself.
> > > > >>>>>>
> > > > >>>>>> Have you thought about options such as `SELECT * FROM t(k=v,
> > > k=v)`?
> > > > How
> > > > >>>>>> are other vendors deal with this problem?
> > > > >>>>>>
> > > > >>>>>> Regards,
> > > > >>>>>> Timo
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On 09.03.20 10:37, Jingsong Li wrote:
> > > > >>>>>>> Hi Danny, +1 for table hints, thanks for driving.
> > > > >>>>>>>
> > > > >>>>>>> I took a look to FLIP, most of content are talking about
> query
> > > > hints.
> > > > >>>>> It is
> > > > >>>>>>> hard to discussion and voting. So +1 to split it as Jark
> said.
> > > > >>>>>>>
> > > > >>>>>>> Another thing is configuration that suitable to config with
> > table
> > > > >>>>> hints:
> > > > >>>>>>> "connector.path" and "connector.topic", Are they really
> > suitable
> > > > for
> > > > >>>>> table
> > > > >>>>>>> hints? Looks weird to me. Because I think these properties
> are
> > > the
> > > > >>>>> core of
> > > > >>>>>>> table.
> > > > >>>>>>>
> > > > >>>>>>> Best,
> > > > >>>>>>> Jingsong Lee
> > > > >>>>>>>
> > > > >>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]>
> > wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Thanks Danny for starting the discussion.
> > > > >>>>>>>> +1 for this feature.
> > > > >>>>>>>>
> > > > >>>>>>>> If we just focus on the table hints not the query hints in
> > this
> > > > >>>>> release,
> > > > >>>>>>>> could you split the FLIP into two FLIPs?
> > > > >>>>>>>> Because it's hard to vote on partial part of a FLIP. You can
> > > keep
> > > > >>>>> the table
> > > > >>>>>>>> hints proposal in FLIP-113 and move query hints into another
> > > FLIP.
> > > > >>>>>>>> So that we can focuse on the table hints in the FLIP.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks,
> > > > >>>>>>>> Jark
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <
> > > [hidden email]
> > > > >
> > > > >>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Danny,
> > > > >>>>>>>>>
> > > > >>>>>>>>> This is a nice feature, +1.
> > > > >>>>>>>>>
> > > > >>>>>>>>> One thing I am interested in but not mentioned in the
> > proposal
> > > is
> > > > >>>>> the
> > > > >>>>>>>> error
> > > > >>>>>>>>> handling, as it is quite common for users to write
> > > inappropriate
> > > > >>>>> hints in
> > > > >>>>>>>>> SQL code, if illegal or "bad" hints are given, would the
> > system
> > > > >>>>> simply
> > > > >>>>>>>>> ignore them or throw exceptions?
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thanks : )
> > > > >>>>>>>>>
> > > > >>>>>>>>> Best,
> > > > >>>>>>>>> Weike
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <
> > > [hidden email]>
> > > > >>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Note:
> > > > >>>>>>>>>> we only plan to support table hints in Flink release 1.11,
> > so
> > > > >>>>> please
> > > > >>>>>>>>> focus
> > > > >>>>>>>>>> mainly on the table hints part and just ignore the planner
> > > > >>>>> hints, sorry
> > > > >>>>>>>>> for
> > > > >>>>>>>>>> that mistake ~
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Danny Chan
> > > > >>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <[hidden email]
> > > >,写道:
> > > > >>>>>>>>>>> Hi, fellows ~
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I would like to propose the supports for SQL hints for
> our
> > > > >>>>> Flink SQL.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> We would support hints syntax as following:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> > > > >>>>> parallelism='24') */
> > > > >>>>>>>>>>> from
> > > > >>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */
> > > > >>>>>>>>>>> join
> > > > >>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > >>>>>>>>>>> on
> > > > >>>>>>>>>>> emp.deptno = dept.deptno
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Basically we would support both query hints(after the
> > SELECT
> > > > >>>>> keyword)
> > > > >>>>>>>>>> and table hints(after the referenced table name), for
> 1.11,
> > we
> > > > >>>>> plan to
> > > > >>>>>>>>> only
> > > > >>>>>>>>>> support table hints with a hint probably named PROPERTIES:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I am looking forward to your comments.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> You can access the FLIP here:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Best,
> > > > >>>>>>>>>>> Danny Chan
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

bowen.li
Thanks Danny for kicking off the effort

The root cause of too much manual work is Flink DDL has mixed 3 types of
params together and doesn't handle each of them very well. Below are how I
categorize them and corresponding solutions in my mind:

- type 1: Metadata of external data, like external endpoint/url,
username/pwd, schemas, formats.

Such metadata are mostly already accessible in external system as long as
endpoints and credentials are provided. Flink can get it thru catalogs, but
we haven't had many catalogs yet and thus Flink just hasn't been able to
leverage that. So the solution should be building more catalogs. Such
params should be part of a Flink table DDL/definition, and not overridable
in any means.


- type 2: Runtime params, like jdbc connector's fetch size, elasticsearch
connector's bulk flush size.

Such params don't affect query results, but affect how results are produced
(eg. fast or slow, aka performance) - they are essentially execution and
implementation details. They change often in exploration or development
stages, but not quite frequently in well-defined long-running pipelines.
They should always have default values and can be missing in query. They
can be part of a table DDL/definition, but should also be replaceable in a
query - *this is what table "hints" in FLIP-113 should cover*.


- type 3: Semantic params, like kafka connector's start offset.

Such params affect query results - the semantics. They'd better be as
filter conditions in WHERE clause that can be pushed down. They change
almost every time a query starts and have nothing to do with metadata, thus
should not be part of table definition/DDL, nor be persisted in catalogs.
If they will, users should create views to keep such params around (note
this is different from variable substitution).


Take Flink-Kafka as an example. Once we get these params right, here're the
steps users need to do to develop and run a Flink job:
- configure a Flink ConfluentSchemaRegistry with url, username, and password
- run "SELECT * FROM mykafka WHERE offset > 12pm yesterday" (simplified
timestamp) in SQL CLI, Flink automatically retrieves all metadata of
schema, file format, etc and start the job
- users want to make the job read Kafka topic faster, so it goes as "SELECT
* FROM mykafka /* faster_read_key=value*/ WHERE offset > 12pm yesterday"
- done and satisfied, users submit it to production


Regarding "CREATE TABLE t LIKE with (k1=v1, k2=v2), I think it's a
nice-to-have feature, but not a strategically critical, long-term solution,
because
1) It may seem promising at the current stage to solve the
too-much-manual-work problem, but that's only because Flink hasn't
leveraged catalogs well and handled the 3 types of params above properly.
Once we get the params types right, the LIKE syntax won't be that
important, and will be just an easier way to create tables without retyping
long fields like username and pwd.
2) Note that only some rare type of catalog can store k-v property pair, so
table created this way often cannot be persisted. In the foreseeable
future, such catalog will only be HiveCatalog, and not everyone has a Hive
metastore. To be honest, without persistence, recreating tables every time
this way is still a lot of keyboard typing.

Cheers,
Bowen

On Tue, Mar 10, 2020 at 8:07 PM Kurt Young <[hidden email]> wrote:

> If a specific connector want to have such parameter and read if out of
> configuration, then that's fine.
> If we are talking about a configuration for all kinds of sources, I would
> be super careful about that.
> It's true it can solve maybe 80% cases, but it will also make the left 20%
> feels weird.
>
> Best,
> Kurt
>
>
> On Wed, Mar 11, 2020 at 11:00 AM Jark Wu <[hidden email]> wrote:
>
> > Hi Kurt,
> >
> > #3 Regarding to global offset:
> > I'm not saying to use the global configuration to override connector
> > properties by the planner.
> > But the connector should take this configuration and translate into their
> > client API.
> > AFAIK, almost all the message queues support eariliest and latest and a
> > timestamp value as start point.
> > So we can support 3 options for this configuration: "eariliest", "latest"
> > and a timestamp string value.
> > Of course, this can't solve 100% cases, but I guess can sovle 80% or 90%
> > cases.
> > And the remaining cases can be resolved by LIKE syntax which I guess is
> not
> > very common cases.
> >
> > Best,
> > Jark
> >
> >
> > On Wed, 11 Mar 2020 at 10:33, Kurt Young <[hidden email]> wrote:
> >
> > > Good to have such lovely discussions. I also want to share some of my
> > > opinions.
> > >
> > > #1 Regarding to error handling: I also think ignore invalid hints would
> > be
> > > dangerous, maybe
> > > the simplest solution is just throw an exception.
> > >
> > > #2 Regarding to property replacement: I don't think we should
> constraint
> > > ourself to
> > > the meaning of the word "hint", and forbidden it modifying any
> properties
> > > which can effect
> > > query results. IMO `PROPERTIES` is one of the table hints, and a
> powerful
> > > one. It can
> > > modify properties located in DDL's WITH block. But I also see the harm
> > that
> > > if we make it
> > > too flexible like change the kafka topic name with a hint. Such use
> case
> > is
> > > not common and
> > > sounds very dangerous to me. I would propose we have a map of hintable
> > > properties for each
> > > connector, and should validate all passed in properties are actually
> > > hintable. And combining with
> > > #1 error handling, we can throw an exception once received invalid
> > > property.
> > >
> > > #3 Regarding to global offset: I'm not sure it's feasible. Different
> > > connectors will have totally
> > > different properties to represent offset, some might be timestamps,
> some
> > > might be string literals
> > > like "earliest", and others might be just integers.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <[hidden email]> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I want to jump in the discussion about the "dynamic start offset"
> > > problem.
> > > > First of all, I share the same concern with Timo and Fabian, that the
> > > > "start offset" affects the query semantics, i.e. the query result.
> > > > But "hints" is just used for optimization which should affect the
> > result?
> > > >
> > > > I think the "dynamic start offset" is an very important usability
> > problem
> > > > which will be faced by many streaming platforms.
> > > > I also agree "CREATE TEMPORARY TABLE Temp (LIKE t) WITH
> > > > ('connector.startup-timestamp-millis' = '1578538374471')" is verbose,
> > > what
> > > > if we have 10 tables to join?
> > > >
> > > > However, what I want to propose (should be another thread) is a
> global
> > > > configuration to reset start offsets of all the source connectors
> > > > in the query session, e.g. "table.sources.start-offset". This is
> > possible
> > > > now because `TableSourceFactory.Context` has `getConfiguration`
> > > > method to get the session configuration, and use it to create an
> > adapted
> > > > TableSource.
> > > > Then we can also expose to SQL CLI via SET command, e.g. `SET
> > > > 'table.sources.start-offset'='earliest';`, which is pretty simple and
> > > > straightforward.
> > > >
> > > > This is very similar to KSQL's `SET 'auto.offset.reset'='earliest'`
> > which
> > > > is very helpful IMO.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > On Tue, 10 Mar 2020 at 22:29, Timo Walther <[hidden email]>
> wrote:
> > > >
> > > > > Hi Danny,
> > > > >
> > > > > compared to the hints, FLIP-110 is fully compliant to the SQL
> > standard.
> > > > >
> > > > > I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH
> (k=v)`
> > is
> > > > > too verbose or awkward for the power of basically changing the
> entire
> > > > > connector. Usually, this statement would just precede the query in
> a
> > > > > multiline file. So it can be change "in-place" like the hints you
> > > > proposed.
> > > > >
> > > > > Many companies have a well-defined set of tables that should be
> used.
> > > It
> > > > > would be dangerous if users can change the path or topic in a hint.
> > The
> > > > > catalog/catalog manager should be the entity that controls which
> > tables
> > > > > exist and how they can be accessed.
> > > > >
> > > > >  > what’s the problem there if we user the table hints to support
> > > “start
> > > > > offset”?
> > > > >
> > > > > IMHO it violates the meaning of a hint. According to the
> dictionary,
> > a
> > > > > hint is "a statement that expresses indirectly what one prefers not
> > to
> > > > > say explicitly". But offsets are a property that are very explicit.
> > > > >
> > > > > If we go with the hint approach, it should be expressible in the
> > > > > TableSourceFactory which properties are supported for hinting. Or
> do
> > > you
> > > > > plan to offer those hints in a separate Map<String, String> that
> > cannot
> > > > > overwrite existing properties? I think this would be a different
> > > story...
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > >
> > > > > On 10.03.20 10:34, Danny Chan wrote:
> > > > > > Thanks Timo ~
> > > > > >
> > > > > > Personally I would say that offset > 0 and start offset = 10 does
> > not
> > > > > have the same semantic, so from the SQL aspect, we can not
> implement
> > a
> > > > > “starting offset” hint for query with such a syntax.
> > > > > >
> > > > > > And the CREATE TABLE LIKE syntax is a DDL which is just verbose
> for
> > > > > defining such dynamic parameters even if it could do that, shall we
> > > force
> > > > > users to define a temporal table for each query with dynamic
> params,
> > I
> > > > > would say it’s an awkward solution.
> > > > > >
> > > > > > "Hints should give "hints" but not affect the actual produced
> > > result.”
> > > > > You mentioned that multiple times and could we give a reason,
> what’s
> > > the
> > > > > problem there if we user the table hints to support “start offset”
> ?
> > > From
> > > > > my side I saw some benefits for that:
> > > > > >
> > > > > >
> > > > > > • It’s very convent to set up these parameters, the syntax is
> very
> > > much
> > > > > like the DDL definition
> > > > > > • It’s scope is very clear, right on the table it attathed
> > > > > > • It does not affect the table schema, which means in order to
> > > specify
> > > > > the offset, there is no need to define an offset column which is
> > weird
> > > > > actually, offset should never be a column, it’s more like a
> metadata
> > > or a
> > > > > start option.
> > > > > >
> > > > > > So in total, FLIP-110 uses the offset more like a Hive partition
> > > prune,
> > > > > we can do that if we have an offset column, but most of the case we
> > do
> > > > not
> > > > > define that, so there is actually no conflict or overlap.
> > > > > >
> > > > > > Best,
> > > > > > Danny Chan
> > > > > > 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
> > > > > >> Hi Danny,
> > > > > >>
> > > > > >> shouldn't FLIP-110[1] solve most of the problems we have around
> > > > defining
> > > > > >> table properties more dynamically without manual schema work?
> Also
> > > > > >> offset definition is easier with such a syntax. They must not be
> > > > defined
> > > > > >> in catalog but could be temporary tables that extend from the
> > > original
> > > > > >> table.
> > > > > >>
> > > > > >> In general, we should aim to keep the syntax concise and don't
> > > provide
> > > > > >> too many ways of doing the same thing. Hints should give "hints"
> > but
> > > > not
> > > > > >> affect the actual produced result.
> > > > > >>
> > > > > >> Some connector properties might also change the plan or schema
> in
> > > the
> > > > > >> future. E.g. they might also define whether a table source
> > supports
> > > > > >> certain push-downs (e.g. predicate push-down).
> > > > > >>
> > > > > >> Dawid is currently working a draft that might makes it possible
> to
> > > > > >> expose a Kafka offset via the schema such that `SELECT * FROM
> > Topic
> > > > > >> WHERE offset > 10` would become possible and could be pushed
> down.
> > > But
> > > > > >> this is of course, not planned initially.
> > > > > >>
> > > > > >> Regards,
> > > > > >> Timo
> > > > > >>
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> On 10.03.20 08:34, Danny Chan wrote:
> > > > > >>> Thanks Wenlong ~
> > > > > >>>
> > > > > >>> For PROPERTIES Hint Error handling
> > > > > >>>
> > > > > >>> Actually we have no way to figure out whether a error prone
> hint
> > > is a
> > > > > PROPERTIES hint, for example, if use writes a hint like
> ‘PROPERTIAS’,
> > > we
> > > > do
> > > > > not know if this hint is a PROPERTIES hint, what we know is that
> the
> > > hint
> > > > > name was not registered in our Flink.
> > > > > >>>
> > > > > >>> If the user writes the hint name correctly (i.e. PROPERTIES),
> we
> > > did
> > > > > can enforce the validation of the hint options though the pluggable
> > > > > HintOptionChecker.
> > > > > >>>
> > > > > >>> For PROPERTIES Hint Option Format
> > > > > >>>
> > > > > >>> For a key value style hint option, the key can be either a
> simple
> > > > > identifier or a string literal, which means that it’s compatible
> with
> > > our
> > > > > DDL syntax. We support simple identifier because many other hints
> do
> > > not
> > > > > have the component complex keys like the table properties, and we
> > want
> > > to
> > > > > unify the parse block.
> > > > > >>>
> > > > > >>> Best,
> > > > > >>> Danny Chan
> > > > > >>> 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]
> > > >,写道:
> > > > > >>>> Hi Danny, thanks for the proposal. +1 for adding table hints,
> it
> > > is
> > > > > really
> > > > > >>>> a necessary feature for flink sql to integrate with a catalog.
> > > > > >>>>
> > > > > >>>> For error handling, I think it would be more natural to throw
> an
> > > > > >>>> exception when error table hint provided, because the
> properties
> > > in
> > > > > hint
> > > > > >>>> will be merged and used to find the table factory which would
> > > cause
> > > > an
> > > > > >>>> exception when error properties provided, right? On the other
> > > hand,
> > > > > unlike
> > > > > >>>> other hints which just affect the way to execute the query,
> the
> > > > > property
> > > > > >>>> table hint actually affects the result of the query, we should
> > > never
> > > > > ignore
> > > > > >>>> the given property hints.
> > > > > >>>>
> > > > > >>>> For the format of property hints, currently, in sql client, we
> > > > accept
> > > > > >>>> properties in format of string only in DDL:
> > > > 'connector.type'='kafka',
> > > > > I
> > > > > >>>> think the format of properties in hint should be the same as
> the
> > > > > format we
> > > > > >>>> defined in ddl. What do you think?
> > > > > >>>>
> > > > > >>>> Bests,
> > > > > >>>> Wenlong Lyu
> > > > > >>>>
> > > > > >>>> On Tue, 10 Mar 2020 at 14:22, Danny Chan <
> [hidden email]>
> > > > > wrote:
> > > > > >>>>
> > > > > >>>>> To Weike: About the Error Handing
> > > > > >>>>>
> > > > > >>>>> To be consistent with other SQL vendors, the default is to
> log
> > > > > warnings
> > > > > >>>>> and if there is any error (invalid hint name or options), the
> > > hint
> > > > > is just
> > > > > >>>>> ignored. I have already addressed in the wiki.
> > > > > >>>>>
> > > > > >>>>> To Timo: About the PROPERTIES Table Hint
> > > > > >>>>>
> > > > > >>>>> • The properties hints is also optional, user can pass in an
> > > option
> > > > > to
> > > > > >>>>> override the table properties but this does not mean it is
> > > > required.
> > > > > >>>>> • They should not include semantics: does the properties
> belong
> > > to
> > > > > >>>>> semantic ? I don't think so, the plan does not change right ?
> > The
> > > > > result
> > > > > >>>>> set may be affected, but there are already some hints do so,
> > for
> > > > > example,
> > > > > >>>>> MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> > > > > >>>>> • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL
> > > standard
> > > > > >>>>> compared to the hints way(which is included in comments)
> > > > > >>>>> • I actually didn't found any vendors to support such
> grammar,
> > > and
> > > > > there
> > > > > >>>>> is no way to override table level properties dynamically. For
> > > > normal
> > > > > RDBMS,
> > > > > >>>>> I think there are no requests for such dynamic parameters
> > because
> > > > > all the
> > > > > >>>>> table have the same storage and computation and they are
> almost
> > > all
> > > > > batch
> > > > > >>>>> tables.
> > > > > >>>>> • While Flink as a computation engine has many connectors,
> > > > > especially for
> > > > > >>>>> some message queue like Kafka, we would have a start_offset
> > which
> > > > is
> > > > > >>>>> different each time we start the query, such parameters can
> not
> > > be
> > > > > >>>>> persisted to catalog, because it’s not static, this is
> actually
> > > the
> > > > > >>>>> background we propose the table hints to indicate such
> > properties
> > > > > >>>>> dynamically.
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> To Jark and Jinsong: I have removed the query hints part and
> > > change
> > > > > the
> > > > > >>>>> title.
> > > > > >>>>>
> > > > > >>>>> [1]
> > > > > >>>>>
> > > > >
> > > >
> > >
> >
> https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> > > > > >>>>>
> > > > > >>>>> Best,
> > > > > >>>>> Danny Chan
> > > > > >>>>> 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]
> >,写道:
> > > > > >>>>>> Hi Danny,
> > > > > >>>>>>
> > > > > >>>>>> thanks for the proposal. I agree with Jark and Jingsong.
> > Planner
> > > > > hints
> > > > > >>>>>> and table hints are orthogonal topics that should be
> discussed
> > > > > >>>>> separately.
> > > > > >>>>>>
> > > > > >>>>>> I share Jingsong's opinion that we should not use planner
> > hints
> > > > for
> > > > > >>>>>> passing connector properties. Planner hints should be
> optional
> > > at
> > > > > any
> > > > > >>>>>> time. They should not include semantics but only affect
> > > execution
> > > > > time.
> > > > > >>>>>> Connector properties are an important part of the query
> > itself.
> > > > > >>>>>>
> > > > > >>>>>> Have you thought about options such as `SELECT * FROM t(k=v,
> > > > k=v)`?
> > > > > How
> > > > > >>>>>> are other vendors deal with this problem?
> > > > > >>>>>>
> > > > > >>>>>> Regards,
> > > > > >>>>>> Timo
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On 09.03.20 10:37, Jingsong Li wrote:
> > > > > >>>>>>> Hi Danny, +1 for table hints, thanks for driving.
> > > > > >>>>>>>
> > > > > >>>>>>> I took a look to FLIP, most of content are talking about
> > query
> > > > > hints.
> > > > > >>>>> It is
> > > > > >>>>>>> hard to discussion and voting. So +1 to split it as Jark
> > said.
> > > > > >>>>>>>
> > > > > >>>>>>> Another thing is configuration that suitable to config with
> > > table
> > > > > >>>>> hints:
> > > > > >>>>>>> "connector.path" and "connector.topic", Are they really
> > > suitable
> > > > > for
> > > > > >>>>> table
> > > > > >>>>>>> hints? Looks weird to me. Because I think these properties
> > are
> > > > the
> > > > > >>>>> core of
> > > > > >>>>>>> table.
> > > > > >>>>>>>
> > > > > >>>>>>> Best,
> > > > > >>>>>>> Jingsong Lee
> > > > > >>>>>>>
> > > > > >>>>>>> On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]>
> > > wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Thanks Danny for starting the discussion.
> > > > > >>>>>>>> +1 for this feature.
> > > > > >>>>>>>>
> > > > > >>>>>>>> If we just focus on the table hints not the query hints in
> > > this
> > > > > >>>>> release,
> > > > > >>>>>>>> could you split the FLIP into two FLIPs?
> > > > > >>>>>>>> Because it's hard to vote on partial part of a FLIP. You
> can
> > > > keep
> > > > > >>>>> the table
> > > > > >>>>>>>> hints proposal in FLIP-113 and move query hints into
> another
> > > > FLIP.
> > > > > >>>>>>>> So that we can focuse on the table hints in the FLIP.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Thanks,
> > > > > >>>>>>>> Jark
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> On Mon, 9 Mar 2020 at 17:14, DONG, Weike <
> > > > [hidden email]
> > > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Danny,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> This is a nice feature, +1.
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> One thing I am interested in but not mentioned in the
> > > proposal
> > > > is
> > > > > >>>>> the
> > > > > >>>>>>>> error
> > > > > >>>>>>>>> handling, as it is quite common for users to write
> > > > inappropriate
> > > > > >>>>> hints in
> > > > > >>>>>>>>> SQL code, if illegal or "bad" hints are given, would the
> > > system
> > > > > >>>>> simply
> > > > > >>>>>>>>> ignore them or throw exceptions?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Thanks : )
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Best,
> > > > > >>>>>>>>> Weike
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <
> > > > [hidden email]>
> > > > > >>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> Note:
> > > > > >>>>>>>>>> we only plan to support table hints in Flink release
> 1.11,
> > > so
> > > > > >>>>> please
> > > > > >>>>>>>>> focus
> > > > > >>>>>>>>>> mainly on the table hints part and just ignore the
> planner
> > > > > >>>>> hints, sorry
> > > > > >>>>>>>>> for
> > > > > >>>>>>>>>> that mistake ~
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Best,
> > > > > >>>>>>>>>> Danny Chan
> > > > > >>>>>>>>>> 在 2020年3月9日 +0800 PM4:36,Danny Chan <
> [hidden email]
> > > > >,写道:
> > > > > >>>>>>>>>>> Hi, fellows ~
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I would like to propose the supports for SQL hints for
> > our
> > > > > >>>>> Flink SQL.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> We would support hints syntax as following:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> > > > > >>>>> parallelism='24') */
> > > > > >>>>>>>>>>> from
> > > > > >>>>>>>>>>> emp /*+ INDEX(idx1, idx2) */
> > > > > >>>>>>>>>>> join
> > > > > >>>>>>>>>>> dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > > >>>>>>>>>>> on
> > > > > >>>>>>>>>>> emp.deptno = dept.deptno
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Basically we would support both query hints(after the
> > > SELECT
> > > > > >>>>> keyword)
> > > > > >>>>>>>>>> and table hints(after the referenced table name), for
> > 1.11,
> > > we
> > > > > >>>>> plan to
> > > > > >>>>>>>>> only
> > > > > >>>>>>>>>> support table hints with a hint probably named
> PROPERTIES:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I am looking forward to your comments.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> You can access the FLIP here:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Best,
> > > > > >>>>>>>>>>> Danny Chan
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

Danny Chan
Thanks Bowen ~

I agree we should somehow categorize our connector parameters.

For type1, I’m already preparing a solution like the Confluent schema registry + Avro schema inference thing, so this may not be a problem in the near future.

For type3, I have some questions:

> "SELECT * FROM mykafka WHERE offset > 12pm yesterday”

Where does the offset column come from, a virtual column from the table schema, you said that

>They change
almost every time a query starts and have nothing to do with metadata, thus
should not be part of table definition/DDL

But why you can reference it in the query, I’m confused for that, can you elaborate a little ?

Best,
Danny Chan
在 2020年3月11日 +0800 PM12:52,Bowen Li <[hidden email]>,写道:

> Thanks Danny for kicking off the effort
>
> The root cause of too much manual work is Flink DDL has mixed 3 types of
> params together and doesn't handle each of them very well. Below are how I
> categorize them and corresponding solutions in my mind:
>
> - type 1: Metadata of external data, like external endpoint/url,
> username/pwd, schemas, formats.
>
> Such metadata are mostly already accessible in external system as long as
> endpoints and credentials are provided. Flink can get it thru catalogs, but
> we haven't had many catalogs yet and thus Flink just hasn't been able to
> leverage that. So the solution should be building more catalogs. Such
> params should be part of a Flink table DDL/definition, and not overridable
> in any means.
>
>
> - type 2: Runtime params, like jdbc connector's fetch size, elasticsearch
> connector's bulk flush size.
>
> Such params don't affect query results, but affect how results are produced
> (eg. fast or slow, aka performance) - they are essentially execution and
> implementation details. They change often in exploration or development
> stages, but not quite frequently in well-defined long-running pipelines.
> They should always have default values and can be missing in query. They
> can be part of a table DDL/definition, but should also be replaceable in a
> query - *this is what table "hints" in FLIP-113 should cover*.
>
>
> - type 3: Semantic params, like kafka connector's start offset.
>
> Such params affect query results - the semantics. They'd better be as
> filter conditions in WHERE clause that can be pushed down. They change
> almost every time a query starts and have nothing to do with metadata, thus
> should not be part of table definition/DDL, nor be persisted in catalogs.
> If they will, users should create views to keep such params around (note
> this is different from variable substitution).
>
>
> Take Flink-Kafka as an example. Once we get these params right, here're the
> steps users need to do to develop and run a Flink job:
> - configure a Flink ConfluentSchemaRegistry with url, username, and password
> - run "SELECT * FROM mykafka WHERE offset > 12pm yesterday" (simplified
> timestamp) in SQL CLI, Flink automatically retrieves all metadata of
> schema, file format, etc and start the job
> - users want to make the job read Kafka topic faster, so it goes as "SELECT
> * FROM mykafka /* faster_read_key=value*/ WHERE offset > 12pm yesterday"
> - done and satisfied, users submit it to production
>
>
> Regarding "CREATE TABLE t LIKE with (k1=v1, k2=v2), I think it's a
> nice-to-have feature, but not a strategically critical, long-term solution,
> because
> 1) It may seem promising at the current stage to solve the
> too-much-manual-work problem, but that's only because Flink hasn't
> leveraged catalogs well and handled the 3 types of params above properly.
> Once we get the params types right, the LIKE syntax won't be that
> important, and will be just an easier way to create tables without retyping
> long fields like username and pwd.
> 2) Note that only some rare type of catalog can store k-v property pair, so
> table created this way often cannot be persisted. In the foreseeable
> future, such catalog will only be HiveCatalog, and not everyone has a Hive
> metastore. To be honest, without persistence, recreating tables every time
> this way is still a lot of keyboard typing.
>
> Cheers,
> Bowen
>
> On Tue, Mar 10, 2020 at 8:07 PM Kurt Young <[hidden email]> wrote:
>
> > If a specific connector want to have such parameter and read if out of
> > configuration, then that's fine.
> > If we are talking about a configuration for all kinds of sources, I would
> > be super careful about that.
> > It's true it can solve maybe 80% cases, but it will also make the left 20%
> > feels weird.
> >
> > Best,
> > Kurt
> >
> >
> > On Wed, Mar 11, 2020 at 11:00 AM Jark Wu <[hidden email]> wrote:
> >
> > > Hi Kurt,
> > >
> > > #3 Regarding to global offset:
> > > I'm not saying to use the global configuration to override connector
> > > properties by the planner.
> > > But the connector should take this configuration and translate into their
> > > client API.
> > > AFAIK, almost all the message queues support eariliest and latest and a
> > > timestamp value as start point.
> > > So we can support 3 options for this configuration: "eariliest", "latest"
> > > and a timestamp string value.
> > > Of course, this can't solve 100% cases, but I guess can sovle 80% or 90%
> > > cases.
> > > And the remaining cases can be resolved by LIKE syntax which I guess is
> > not
> > > very common cases.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > > On Wed, 11 Mar 2020 at 10:33, Kurt Young <[hidden email]> wrote:
> > >
> > > > Good to have such lovely discussions. I also want to share some of my
> > > > opinions.
> > > >
> > > > #1 Regarding to error handling: I also think ignore invalid hints would
> > > be
> > > > dangerous, maybe
> > > > the simplest solution is just throw an exception.
> > > >
> > > > #2 Regarding to property replacement: I don't think we should
> > constraint
> > > > ourself to
> > > > the meaning of the word "hint", and forbidden it modifying any
> > properties
> > > > which can effect
> > > > query results. IMO `PROPERTIES` is one of the table hints, and a
> > powerful
> > > > one. It can
> > > > modify properties located in DDL's WITH block. But I also see the harm
> > > that
> > > > if we make it
> > > > too flexible like change the kafka topic name with a hint. Such use
> > case
> > > is
> > > > not common and
> > > > sounds very dangerous to me. I would propose we have a map of hintable
> > > > properties for each
> > > > connector, and should validate all passed in properties are actually
> > > > hintable. And combining with
> > > > #1 error handling, we can throw an exception once received invalid
> > > > property.
> > > >
> > > > #3 Regarding to global offset: I'm not sure it's feasible. Different
> > > > connectors will have totally
> > > > different properties to represent offset, some might be timestamps,
> > some
> > > > might be string literals
> > > > like "earliest", and others might be just integers.
> > > >
> > > > Best,
> > > > Kurt
> > > >
> > > >
> > > > On Tue, Mar 10, 2020 at 11:46 PM Jark Wu <[hidden email]> wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I want to jump in the discussion about the "dynamic start offset"
> > > > problem.
> > > > > First of all, I share the same concern with Timo and Fabian, that the
> > > > > "start offset" affects the query semantics, i.e. the query result.
> > > > > But "hints" is just used for optimization which should affect the
> > > result?
> > > > >
> > > > > I think the "dynamic start offset" is an very important usability
> > > problem
> > > > > which will be faced by many streaming platforms.
> > > > > I also agree "CREATE TEMPORARY TABLE Temp (LIKE t) WITH
> > > > > ('connector.startup-timestamp-millis' = '1578538374471')" is verbose,
> > > > what
> > > > > if we have 10 tables to join?
> > > > >
> > > > > However, what I want to propose (should be another thread) is a
> > global
> > > > > configuration to reset start offsets of all the source connectors
> > > > > in the query session, e.g. "table.sources.start-offset". This is
> > > possible
> > > > > now because `TableSourceFactory.Context` has `getConfiguration`
> > > > > method to get the session configuration, and use it to create an
> > > adapted
> > > > > TableSource.
> > > > > Then we can also expose to SQL CLI via SET command, e.g. `SET
> > > > > 'table.sources.start-offset'='earliest';`, which is pretty simple and
> > > > > straightforward.
> > > > >
> > > > > This is very similar to KSQL's `SET 'auto.offset.reset'='earliest'`
> > > which
> > > > > is very helpful IMO.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > > On Tue, 10 Mar 2020 at 22:29, Timo Walther <[hidden email]>
> > wrote:
> > > > >
> > > > > > Hi Danny,
> > > > > >
> > > > > > compared to the hints, FLIP-110 is fully compliant to the SQL
> > > standard.
> > > > > >
> > > > > > I don't think that `CREATE TEMPORARY TABLE Temp (LIKE t) WITH
> > (k=v)`
> > > is
> > > > > > too verbose or awkward for the power of basically changing the
> > entire
> > > > > > connector. Usually, this statement would just precede the query in
> > a
> > > > > > multiline file. So it can be change "in-place" like the hints you
> > > > > proposed.
> > > > > >
> > > > > > Many companies have a well-defined set of tables that should be
> > used.
> > > > It
> > > > > > would be dangerous if users can change the path or topic in a hint.
> > > The
> > > > > > catalog/catalog manager should be the entity that controls which
> > > tables
> > > > > > exist and how they can be accessed.
> > > > > >
> > > > > > > what’s the problem there if we user the table hints to support
> > > > “start
> > > > > > offset”?
> > > > > >
> > > > > > IMHO it violates the meaning of a hint. According to the
> > dictionary,
> > > a
> > > > > > hint is "a statement that expresses indirectly what one prefers not
> > > to
> > > > > > say explicitly". But offsets are a property that are very explicit.
> > > > > >
> > > > > > If we go with the hint approach, it should be expressible in the
> > > > > > TableSourceFactory which properties are supported for hinting. Or
> > do
> > > > you
> > > > > > plan to offer those hints in a separate Map<String, String> that
> > > cannot
> > > > > > overwrite existing properties? I think this would be a different
> > > > story...
> > > > > >
> > > > > > Regards,
> > > > > > Timo
> > > > > >
> > > > > >
> > > > > > On 10.03.20 10:34, Danny Chan wrote:
> > > > > > > Thanks Timo ~
> > > > > > >
> > > > > > > Personally I would say that offset > 0 and start offset = 10 does
> > > not
> > > > > > have the same semantic, so from the SQL aspect, we can not
> > implement
> > > a
> > > > > > “starting offset” hint for query with such a syntax.
> > > > > > >
> > > > > > > And the CREATE TABLE LIKE syntax is a DDL which is just verbose
> > for
> > > > > > defining such dynamic parameters even if it could do that, shall we
> > > > force
> > > > > > users to define a temporal table for each query with dynamic
> > params,
> > > I
> > > > > > would say it’s an awkward solution.
> > > > > > >
> > > > > > > "Hints should give "hints" but not affect the actual produced
> > > > result.”
> > > > > > You mentioned that multiple times and could we give a reason,
> > what’s
> > > > the
> > > > > > problem there if we user the table hints to support “start offset”
> > ?
> > > > From
> > > > > > my side I saw some benefits for that:
> > > > > > >
> > > > > > >
> > > > > > > • It’s very convent to set up these parameters, the syntax is
> > very
> > > > much
> > > > > > like the DDL definition
> > > > > > > • It’s scope is very clear, right on the table it attathed
> > > > > > > • It does not affect the table schema, which means in order to
> > > > specify
> > > > > > the offset, there is no need to define an offset column which is
> > > weird
> > > > > > actually, offset should never be a column, it’s more like a
> > metadata
> > > > or a
> > > > > > start option.
> > > > > > >
> > > > > > > So in total, FLIP-110 uses the offset more like a Hive partition
> > > > prune,
> > > > > > we can do that if we have an offset column, but most of the case we
> > > do
> > > > > not
> > > > > > define that, so there is actually no conflict or overlap.
> > > > > > >
> > > > > > > Best,
> > > > > > > Danny Chan
> > > > > > > 在 2020年3月10日 +0800 PM4:28,Timo Walther <[hidden email]>,写道:
> > > > > > > > Hi Danny,
> > > > > > > >
> > > > > > > > shouldn't FLIP-110[1] solve most of the problems we have around
> > > > > defining
> > > > > > > > table properties more dynamically without manual schema work?
> > Also
> > > > > > > > offset definition is easier with such a syntax. They must not be
> > > > > defined
> > > > > > > > in catalog but could be temporary tables that extend from the
> > > > original
> > > > > > > > table.
> > > > > > > >
> > > > > > > > In general, we should aim to keep the syntax concise and don't
> > > > provide
> > > > > > > > too many ways of doing the same thing. Hints should give "hints"
> > > but
> > > > > not
> > > > > > > > affect the actual produced result.
> > > > > > > >
> > > > > > > > Some connector properties might also change the plan or schema
> > in
> > > > the
> > > > > > > > future. E.g. they might also define whether a table source
> > > supports
> > > > > > > > certain push-downs (e.g. predicate push-down).
> > > > > > > >
> > > > > > > > Dawid is currently working a draft that might makes it possible
> > to
> > > > > > > > expose a Kafka offset via the schema such that `SELECT * FROM
> > > Topic
> > > > > > > > WHERE offset > 10` would become possible and could be pushed
> > down.
> > > > But
> > > > > > > > this is of course, not planned initially.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Timo
> > > > > > > >
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 10.03.20 08:34, Danny Chan wrote:
> > > > > > > > > Thanks Wenlong ~
> > > > > > > > >
> > > > > > > > > For PROPERTIES Hint Error handling
> > > > > > > > >
> > > > > > > > > Actually we have no way to figure out whether a error prone
> > hint
> > > > is a
> > > > > > PROPERTIES hint, for example, if use writes a hint like
> > ‘PROPERTIAS’,
> > > > we
> > > > > do
> > > > > > not know if this hint is a PROPERTIES hint, what we know is that
> > the
> > > > hint
> > > > > > name was not registered in our Flink.
> > > > > > > > >
> > > > > > > > > If the user writes the hint name correctly (i.e. PROPERTIES),
> > we
> > > > did
> > > > > > can enforce the validation of the hint options though the pluggable
> > > > > > HintOptionChecker.
> > > > > > > > >
> > > > > > > > > For PROPERTIES Hint Option Format
> > > > > > > > >
> > > > > > > > > For a key value style hint option, the key can be either a
> > simple
> > > > > > identifier or a string literal, which means that it’s compatible
> > with
> > > > our
> > > > > > DDL syntax. We support simple identifier because many other hints
> > do
> > > > not
> > > > > > have the component complex keys like the table properties, and we
> > > want
> > > > to
> > > > > > unify the parse block.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Danny Chan
> > > > > > > > > 在 2020年3月10日 +0800 PM3:19,wenlong.lwl <[hidden email]
> > > > > ,写道:
> > > > > > > > > > Hi Danny, thanks for the proposal. +1 for adding table hints,
> > it
> > > > is
> > > > > > really
> > > > > > > > > > a necessary feature for flink sql to integrate with a catalog.
> > > > > > > > > >
> > > > > > > > > > For error handling, I think it would be more natural to throw
> > an
> > > > > > > > > > exception when error table hint provided, because the
> > properties
> > > > in
> > > > > > hint
> > > > > > > > > > will be merged and used to find the table factory which would
> > > > cause
> > > > > an
> > > > > > > > > > exception when error properties provided, right? On the other
> > > > hand,
> > > > > > unlike
> > > > > > > > > > other hints which just affect the way to execute the query,
> > the
> > > > > > property
> > > > > > > > > > table hint actually affects the result of the query, we should
> > > > never
> > > > > > ignore
> > > > > > > > > > the given property hints.
> > > > > > > > > >
> > > > > > > > > > For the format of property hints, currently, in sql client, we
> > > > > accept
> > > > > > > > > > properties in format of string only in DDL:
> > > > > 'connector.type'='kafka',
> > > > > > I
> > > > > > > > > > think the format of properties in hint should be the same as
> > the
> > > > > > format we
> > > > > > > > > > defined in ddl. What do you think?
> > > > > > > > > >
> > > > > > > > > > Bests,
> > > > > > > > > > Wenlong Lyu
> > > > > > > > > >
> > > > > > > > > > On Tue, 10 Mar 2020 at 14:22, Danny Chan <
> > [hidden email]>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > To Weike: About the Error Handing
> > > > > > > > > > >
> > > > > > > > > > > To be consistent with other SQL vendors, the default is to
> > log
> > > > > > warnings
> > > > > > > > > > > and if there is any error (invalid hint name or options), the
> > > > hint
> > > > > > is just
> > > > > > > > > > > ignored. I have already addressed in the wiki.
> > > > > > > > > > >
> > > > > > > > > > > To Timo: About the PROPERTIES Table Hint
> > > > > > > > > > >
> > > > > > > > > > > • The properties hints is also optional, user can pass in an
> > > > option
> > > > > > to
> > > > > > > > > > > override the table properties but this does not mean it is
> > > > > required.
> > > > > > > > > > > • They should not include semantics: does the properties
> > belong
> > > > to
> > > > > > > > > > > semantic ? I don't think so, the plan does not change right ?
> > > The
> > > > > > result
> > > > > > > > > > > set may be affected, but there are already some hints do so,
> > > for
> > > > > > example,
> > > > > > > > > > > MS-SQL MAXRECURSION and SNAPSHOT hint [1]
> > > > > > > > > > > • `SELECT * FROM t(k=v, k=v)`: this grammar breaks the SQL
> > > > standard
> > > > > > > > > > > compared to the hints way(which is included in comments)
> > > > > > > > > > > • I actually didn't found any vendors to support such
> > grammar,
> > > > and
> > > > > > there
> > > > > > > > > > > is no way to override table level properties dynamically. For
> > > > > normal
> > > > > > RDBMS,
> > > > > > > > > > > I think there are no requests for such dynamic parameters
> > > because
> > > > > > all the
> > > > > > > > > > > table have the same storage and computation and they are
> > almost
> > > > all
> > > > > > batch
> > > > > > > > > > > tables.
> > > > > > > > > > > • While Flink as a computation engine has many connectors,
> > > > > > especially for
> > > > > > > > > > > some message queue like Kafka, we would have a start_offset
> > > which
> > > > > is
> > > > > > > > > > > different each time we start the query, such parameters can
> > not
> > > > be
> > > > > > > > > > > persisted to catalog, because it’s not static, this is
> > actually
> > > > the
> > > > > > > > > > > background we propose the table hints to indicate such
> > > properties
> > > > > > > > > > > dynamically.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > To Jark and Jinsong: I have removed the query hints part and
> > > > change
> > > > > > the
> > > > > > > > > > > title.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://docs.microsoft.com/en-us/sql/t-sql/queries/hints-transact-sql-query?view=sql-server-ver15
> > > > > > > > > > >
> > > > > > > > > > > Best,
> > > > > > > > > > > Danny Chan
> > > > > > > > > > > 在 2020年3月9日 +0800 PM5:46,Timo Walther <[hidden email]
> > > ,写道:
> > > > > > > > > > > > Hi Danny,
> > > > > > > > > > > >
> > > > > > > > > > > > thanks for the proposal. I agree with Jark and Jingsong.
> > > Planner
> > > > > > hints
> > > > > > > > > > > > and table hints are orthogonal topics that should be
> > discussed
> > > > > > > > > > > separately.
> > > > > > > > > > > >
> > > > > > > > > > > > I share Jingsong's opinion that we should not use planner
> > > hints
> > > > > for
> > > > > > > > > > > > passing connector properties. Planner hints should be
> > optional
> > > > at
> > > > > > any
> > > > > > > > > > > > time. They should not include semantics but only affect
> > > > execution
> > > > > > time.
> > > > > > > > > > > > Connector properties are an important part of the query
> > > itself.
> > > > > > > > > > > >
> > > > > > > > > > > > Have you thought about options such as `SELECT * FROM t(k=v,
> > > > > k=v)`?
> > > > > > How
> > > > > > > > > > > > are other vendors deal with this problem?
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > > Timo
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On 09.03.20 10:37, Jingsong Li wrote:
> > > > > > > > > > > > > Hi Danny, +1 for table hints, thanks for driving.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I took a look to FLIP, most of content are talking about
> > > query
> > > > > > hints.
> > > > > > > > > > > It is
> > > > > > > > > > > > > hard to discussion and voting. So +1 to split it as Jark
> > > said.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Another thing is configuration that suitable to config with
> > > > table
> > > > > > > > > > > hints:
> > > > > > > > > > > > > "connector.path" and "connector.topic", Are they really
> > > > suitable
> > > > > > for
> > > > > > > > > > > table
> > > > > > > > > > > > > hints? Looks weird to me. Because I think these properties
> > > are
> > > > > the
> > > > > > > > > > > core of
> > > > > > > > > > > > > table.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Jingsong Lee
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 9, 2020 at 5:30 PM Jark Wu <[hidden email]>
> > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks Danny for starting the discussion.
> > > > > > > > > > > > > > +1 for this feature.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If we just focus on the table hints not the query hints in
> > > > this
> > > > > > > > > > > release,
> > > > > > > > > > > > > > could you split the FLIP into two FLIPs?
> > > > > > > > > > > > > > Because it's hard to vote on partial part of a FLIP. You
> > can
> > > > > keep
> > > > > > > > > > > the table
> > > > > > > > > > > > > > hints proposal in FLIP-113 and move query hints into
> > another
> > > > > FLIP.
> > > > > > > > > > > > > > So that we can focuse on the table hints in the FLIP.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Jark
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, 9 Mar 2020 at 17:14, DONG, Weike <
> > > > > [hidden email]
> > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Danny,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This is a nice feature, +1.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > One thing I am interested in but not mentioned in the
> > > > proposal
> > > > > is
> > > > > > > > > > > the
> > > > > > > > > > > > > > error
> > > > > > > > > > > > > > > handling, as it is quite common for users to write
> > > > > inappropriate
> > > > > > > > > > > hints in
> > > > > > > > > > > > > > > SQL code, if illegal or "bad" hints are given, would the
> > > > system
> > > > > > > > > > > simply
> > > > > > > > > > > > > > > ignore them or throw exceptions?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks : )
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > Weike
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Mon, Mar 9, 2020 at 5:02 PM Danny Chan <
> > > > > [hidden email]>
> > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Note:
> > > > > > > > > > > > > > > > we only plan to support table hints in Flink release
> > 1.11,
> > > > so
> > > > > > > > > > > please
> > > > > > > > > > > > > > > focus
> > > > > > > > > > > > > > > > mainly on the table hints part and just ignore the
> > planner
> > > > > > > > > > > hints, sorry
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > that mistake ~
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Danny Chan
> > > > > > > > > > > > > > > > 在 2020年3月9日 +0800 PM4:36,Danny Chan <
> > [hidden email]
> > > > > > ,写道:
> > > > > > > > > > > > > > > > > Hi, fellows ~
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I would like to propose the supports for SQL hints for
> > > our
> > > > > > > > > > > Flink SQL.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We would support hints syntax as following:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > select /*+ NO_HASH_JOIN, RESOURCE(mem='128mb',
> > > > > > > > > > > parallelism='24') */
> > > > > > > > > > > > > > > > > from
> > > > > > > > > > > > > > > > > emp /*+ INDEX(idx1, idx2) */
> > > > > > > > > > > > > > > > > join
> > > > > > > > > > > > > > > > > dept /*+ PROPERTIES(k1='v1', k2='v2') */
> > > > > > > > > > > > > > > > > on
> > > > > > > > > > > > > > > > > emp.deptno = dept.deptno
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Basically we would support both query hints(after the
> > > > SELECT
> > > > > > > > > > > keyword)
> > > > > > > > > > > > > > > > and table hints(after the referenced table name), for
> > > 1.11,
> > > > we
> > > > > > > > > > > plan to
> > > > > > > > > > > > > > > only
> > > > > > > > > > > > > > > > support table hints with a hint probably named
> > PROPERTIES:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > table_name /*+ PROPERTIES(k1='v1', k2='v2') *+/
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I am looking forward to your comments.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > You can access the FLIP here:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+SQL+and+Planner+Hints
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Danny Chan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
1234