[DISCUSS]FLIP-163: SQL Client Improvements

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

[DISCUSS]FLIP-163: SQL Client Improvements

Shengkai Fang
Hi devs,

Jark and I want to start a discussion about FLIP-163:SQL Client
Improvements.

Many users have complained about the problems of the sql client. For
example, users can not register the table proposed by FLIP-95.

The main changes in this FLIP:

- use -i parameter to specify the sql file to initialize the table
environment and deprecated YAML file;
- add -f to submit sql file and deprecated '-u' parameter;
- add more interactive commands, e.g ADD JAR;
- support statement set syntax;


For more detailed changes, please refer to FLIP-163[1].

Look forward to your feedback.


Best,
Shengkai

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Sebastian Liu
Hi Shengkai,

Glad to see this improvement. And I have some additional suggestions:

#1. Unify the TableEnvironment in ExecutionContext to
StreamTableEnvironment for both streaming and batch sql.
#2. Improve the way of results retrieval: sql client collect the results
locally all at once using accumulators at present,
      which may have memory issues in JM or Local for the big query result.
Accumulator is only suitable for testing purpose.
      We may change to use SelectTableSink, which is based
on CollectSinkOperatorCoordinator.
#3. Do we need to consider Flink SQL gateway which is in FLIP-91. Seems
that this FLIP has not moved forward for a long time.
      Provide a long running service out of the box to facilitate the sql
submission is necessary.

What do you think of these?

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway


Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:

> Hi devs,
>
> Jark and I want to start a discussion about FLIP-163:SQL Client
> Improvements.
>
> Many users have complained about the problems of the sql client. For
> example, users can not register the table proposed by FLIP-95.
>
> The main changes in this FLIP:
>
> - use -i parameter to specify the sql file to initialize the table
> environment and deprecated YAML file;
> - add -f to submit sql file and deprecated '-u' parameter;
> - add more interactive commands, e.g ADD JAR;
> - support statement set syntax;
>
>
> For more detailed changes, please refer to FLIP-163[1].
>
> Look forward to your feedback.
>
>
> Best,
> Shengkai
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>


--

*With kind regards
------------------------------------------------------------
Sebastian Liu 刘洋
Institute of Computing Technology, Chinese Academy of Science
Mobile\WeChat: +86—15201613655
E-mail: [hidden email] <[hidden email]>
QQ: 3239559*
Reply | Threaded
Open this post in threaded view
|

Fwd: [DISCUSS]FLIP-163: SQL Client Improvements

Shengkai Fang
---------- Forwarded message ---------
发件人: Shengkai Fang <[hidden email]>
Date: 2021年1月29日周五 下午2:46
Subject: Re: [DISCUSS]FLIP-163: SQL Client Improvements
To: Sebastian Liu <[hidden email]>


Hi Sebastian,

Thanks for your suggestions.

For the suggestion 1, 2:
Yes. That's what we want. By the way, Godfrey has already the PR about
suggestion 2. I think we can continue based on his work[1].

For the suggestion 3: We currently don't have any plan about the gateway in
the FLINK-1.13.

Best,
Shengkai

[1] https://github.com/apache/flink/pull/12872

Sebastian Liu <[hidden email]> 于2021年1月29日周五 上午10:46写道:

> Hi Shengkai,
>
> Glad to see this improvement. And I have some additional suggestions:
>
> #1. Unify the TableEnvironment in ExecutionContext to
> StreamTableEnvironment for both streaming and batch sql.
> #2. Improve the way of results retrieval: sql client collect the results
> locally all at once using accumulators at present,
>       which may have memory issues in JM or Local for the big query
> result. Accumulator is only suitable for testing purpose.
>       We may change to use SelectTableSink, which is based
> on CollectSinkOperatorCoordinator.
> #3. Do we need to consider Flink SQL gateway which is in FLIP-91. Seems
> that this FLIP has not moved forward for a long time.
>       Provide a long running service out of the box to facilitate the sql
> submission is necessary.
>
> What do you think of these?
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>
>
> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>
>> Hi devs,
>>
>> Jark and I want to start a discussion about FLIP-163:SQL Client
>> Improvements.
>>
>> Many users have complained about the problems of the sql client. For
>> example, users can not register the table proposed by FLIP-95.
>>
>> The main changes in this FLIP:
>>
>> - use -i parameter to specify the sql file to initialize the table
>> environment and deprecated YAML file;
>> - add -f to submit sql file and deprecated '-u' parameter;
>> - add more interactive commands, e.g ADD JAR;
>> - support statement set syntax;
>>
>>
>> For more detailed changes, please refer to FLIP-163[1].
>>
>> Look forward to your feedback.
>>
>>
>> Best,
>> Shengkai
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>
>
>
> --
>
> *With kind regards
> ------------------------------------------------------------
> Sebastian Liu 刘洋
> Institute of Computing Technology, Chinese Academy of Science
> Mobile\WeChat: +86—15201613655
> E-mail: [hidden email] <[hidden email]>
> QQ: 3239559*
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Rui Li
In reply to this post by Sebastian Liu
Thanks Shengkai for bringing up this discussion. I think it covers a lot of
useful features which will dramatically improve the usability of our SQL
Client. I have two questions regarding the FLIP.

1. Do you think we can let users set arbitrary configurations via the SET
command? A connector may have its own configurations and we don't have a
way to dynamically change such configurations in SQL Client. For example,
users may want to be able to change hive conf when using hive connector [1].
2. Any reason why we have to forbid queries in SQL files specified with the
-f option? Hive supports a similar -f option but allows queries in the
file. And a common use case is to run some query and redirect the results
to a file. So I think maybe flink users would like to do the same,
especially in batch scenarios.

[1] https://issues.apache.org/jira/browse/FLINK-20590

On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <[hidden email]>
wrote:

> Hi Shengkai,
>
> Glad to see this improvement. And I have some additional suggestions:
>
> #1. Unify the TableEnvironment in ExecutionContext to
> StreamTableEnvironment for both streaming and batch sql.
> #2. Improve the way of results retrieval: sql client collect the results
> locally all at once using accumulators at present,
>       which may have memory issues in JM or Local for the big query result.
> Accumulator is only suitable for testing purpose.
>       We may change to use SelectTableSink, which is based
> on CollectSinkOperatorCoordinator.
> #3. Do we need to consider Flink SQL gateway which is in FLIP-91. Seems
> that this FLIP has not moved forward for a long time.
>       Provide a long running service out of the box to facilitate the sql
> submission is necessary.
>
> What do you think of these?
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>
>
> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>
> > Hi devs,
> >
> > Jark and I want to start a discussion about FLIP-163:SQL Client
> > Improvements.
> >
> > Many users have complained about the problems of the sql client. For
> > example, users can not register the table proposed by FLIP-95.
> >
> > The main changes in this FLIP:
> >
> > - use -i parameter to specify the sql file to initialize the table
> > environment and deprecated YAML file;
> > - add -f to submit sql file and deprecated '-u' parameter;
> > - add more interactive commands, e.g ADD JAR;
> > - support statement set syntax;
> >
> >
> > For more detailed changes, please refer to FLIP-163[1].
> >
> > Look forward to your feedback.
> >
> >
> > Best,
> > Shengkai
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >
>
>
> --
>
> *With kind regards
> ------------------------------------------------------------
> Sebastian Liu 刘洋
> Institute of Computing Technology, Chinese Academy of Science
> Mobile\WeChat: +86—15201613655
> E-mail: [hidden email] <[hidden email]>
> QQ: 3239559*
>


--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Shengkai Fang
Hi Rui,
Thanks for your feedback. I agree with your suggestions.

For the suggestion 1: Yes. we are plan to strengthen the set command. In
the implementation, it will just put the key-value into the
`Configuration`, which will be used to generate the table config. If hive
supports to read the setting from the table config, users are able to set
the hive-related settings.

For the suggestion 2: The -f parameter will submit the job and exit. If the
queries never end, users have to cancel the job by themselves, which is not
reliable(people may forget their jobs). In most case, queries are used to
analyze the data. Users should use queries in the interactive mode.

Best,
Shengkai

Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:

> Thanks Shengkai for bringing up this discussion. I think it covers a lot
> of useful features which will dramatically improve the usability of our SQL
> Client. I have two questions regarding the FLIP.
>
> 1. Do you think we can let users set arbitrary configurations via the SET
> command? A connector may have its own configurations and we don't have a
> way to dynamically change such configurations in SQL Client. For example,
> users may want to be able to change hive conf when using hive connector [1].
> 2. Any reason why we have to forbid queries in SQL files specified with
> the -f option? Hive supports a similar -f option but allows queries in the
> file. And a common use case is to run some query and redirect the results
> to a file. So I think maybe flink users would like to do the same,
> especially in batch scenarios.
>
> [1] https://issues.apache.org/jira/browse/FLINK-20590
>
> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <[hidden email]>
> wrote:
>
>> Hi Shengkai,
>>
>> Glad to see this improvement. And I have some additional suggestions:
>>
>> #1. Unify the TableEnvironment in ExecutionContext to
>> StreamTableEnvironment for both streaming and batch sql.
>> #2. Improve the way of results retrieval: sql client collect the results
>> locally all at once using accumulators at present,
>>       which may have memory issues in JM or Local for the big query
>> result.
>> Accumulator is only suitable for testing purpose.
>>       We may change to use SelectTableSink, which is based
>> on CollectSinkOperatorCoordinator.
>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91. Seems
>> that this FLIP has not moved forward for a long time.
>>       Provide a long running service out of the box to facilitate the sql
>> submission is necessary.
>>
>> What do you think of these?
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>
>>
>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>>
>> > Hi devs,
>> >
>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>> > Improvements.
>> >
>> > Many users have complained about the problems of the sql client. For
>> > example, users can not register the table proposed by FLIP-95.
>> >
>> > The main changes in this FLIP:
>> >
>> > - use -i parameter to specify the sql file to initialize the table
>> > environment and deprecated YAML file;
>> > - add -f to submit sql file and deprecated '-u' parameter;
>> > - add more interactive commands, e.g ADD JAR;
>> > - support statement set syntax;
>> >
>> >
>> > For more detailed changes, please refer to FLIP-163[1].
>> >
>> > Look forward to your feedback.
>> >
>> >
>> > Best,
>> > Shengkai
>> >
>> > [1]
>> >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>> >
>>
>>
>> --
>>
>> *With kind regards
>> ------------------------------------------------------------
>> Sebastian Liu 刘洋
>> Institute of Computing Technology, Chinese Academy of Science
>> Mobile\WeChat: +86—15201613655
>> E-mail: [hidden email] <[hidden email]>
>> QQ: 3239559*
>>
>
>
> --
> Best regards!
> Rui Li
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Rui Li
Hi Shengkai,

Regarding #2, maybe the -f options in flink and hive have different
implications, and we should clarify the behavior. For example, if the
client just submits the job and exits, what happens if the file contains
two INSERT statements? I don't think we should treat them as a statement
set, because users should explicitly write BEGIN STATEMENT SET in that
case. And the client shouldn't asynchronously submit the two jobs, because
the 2nd may depend on the 1st, right?

On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]> wrote:

> Hi Rui,
> Thanks for your feedback. I agree with your suggestions.
>
> For the suggestion 1: Yes. we are plan to strengthen the set command. In
> the implementation, it will just put the key-value into the
> `Configuration`, which will be used to generate the table config. If hive
> supports to read the setting from the table config, users are able to set
> the hive-related settings.
>
> For the suggestion 2: The -f parameter will submit the job and exit. If
> the queries never end, users have to cancel the job by themselves, which is
> not reliable(people may forget their jobs). In most case, queries are used
> to analyze the data. Users should use queries in the interactive mode.
>
> Best,
> Shengkai
>
> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
>
>> Thanks Shengkai for bringing up this discussion. I think it covers a lot
>> of useful features which will dramatically improve the usability of our SQL
>> Client. I have two questions regarding the FLIP.
>>
>> 1. Do you think we can let users set arbitrary configurations via the SET
>> command? A connector may have its own configurations and we don't have a
>> way to dynamically change such configurations in SQL Client. For example,
>> users may want to be able to change hive conf when using hive connector [1].
>> 2. Any reason why we have to forbid queries in SQL files specified with
>> the -f option? Hive supports a similar -f option but allows queries in the
>> file. And a common use case is to run some query and redirect the results
>> to a file. So I think maybe flink users would like to do the same,
>> especially in batch scenarios.
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>
>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <[hidden email]>
>> wrote:
>>
>>> Hi Shengkai,
>>>
>>> Glad to see this improvement. And I have some additional suggestions:
>>>
>>> #1. Unify the TableEnvironment in ExecutionContext to
>>> StreamTableEnvironment for both streaming and batch sql.
>>> #2. Improve the way of results retrieval: sql client collect the results
>>> locally all at once using accumulators at present,
>>>       which may have memory issues in JM or Local for the big query
>>> result.
>>> Accumulator is only suitable for testing purpose.
>>>       We may change to use SelectTableSink, which is based
>>> on CollectSinkOperatorCoordinator.
>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91. Seems
>>> that this FLIP has not moved forward for a long time.
>>>       Provide a long running service out of the box to facilitate the sql
>>> submission is necessary.
>>>
>>> What do you think of these?
>>>
>>> [1]
>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>
>>>
>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>>>
>>> > Hi devs,
>>> >
>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>>> > Improvements.
>>> >
>>> > Many users have complained about the problems of the sql client. For
>>> > example, users can not register the table proposed by FLIP-95.
>>> >
>>> > The main changes in this FLIP:
>>> >
>>> > - use -i parameter to specify the sql file to initialize the table
>>> > environment and deprecated YAML file;
>>> > - add -f to submit sql file and deprecated '-u' parameter;
>>> > - add more interactive commands, e.g ADD JAR;
>>> > - support statement set syntax;
>>> >
>>> >
>>> > For more detailed changes, please refer to FLIP-163[1].
>>> >
>>> > Look forward to your feedback.
>>> >
>>> >
>>> > Best,
>>> > Shengkai
>>> >
>>> > [1]
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>> >
>>>
>>>
>>> --
>>>
>>> *With kind regards
>>> ------------------------------------------------------------
>>> Sebastian Liu 刘洋
>>> Institute of Computing Technology, Chinese Academy of Science
>>> Mobile\WeChat: +86—15201613655
>>> E-mail: [hidden email] <[hidden email]>
>>> QQ: 3239559*
>>>
>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>

--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Shengkai Fang
Hi, Rui.
You are right. I have already modified the FLIP.

The main changes:

# -f parameter has no restriction about the statement type.
Sometimes, users use the pipe to redirect the result of queries to debug
when submitting job by -f parameter. It's much convenient comparing to
writing INSERT INTO statements.

# Add a new sql client option `sql-client.job.detach` .
Users prefer to execute jobs one by one in the batch mode. Users can set
this option false and the client will process the next job until the
current job finishes. The default value of this option is false, which
means the client will execute the next job when the current job is
submitted.

Best,
Shengkai



Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:

> Hi Shengkai,
>
> Regarding #2, maybe the -f options in flink and hive have different
> implications, and we should clarify the behavior. For example, if the
> client just submits the job and exits, what happens if the file contains
> two INSERT statements? I don't think we should treat them as a statement
> set, because users should explicitly write BEGIN STATEMENT SET in that
> case. And the client shouldn't asynchronously submit the two jobs, because
> the 2nd may depend on the 1st, right?
>
> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]> wrote:
>
>> Hi Rui,
>> Thanks for your feedback. I agree with your suggestions.
>>
>> For the suggestion 1: Yes. we are plan to strengthen the set command. In
>> the implementation, it will just put the key-value into the
>> `Configuration`, which will be used to generate the table config. If hive
>> supports to read the setting from the table config, users are able to set
>> the hive-related settings.
>>
>> For the suggestion 2: The -f parameter will submit the job and exit. If
>> the queries never end, users have to cancel the job by themselves, which is
>> not reliable(people may forget their jobs). In most case, queries are used
>> to analyze the data. Users should use queries in the interactive mode.
>>
>> Best,
>> Shengkai
>>
>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
>>
>>> Thanks Shengkai for bringing up this discussion. I think it covers a lot
>>> of useful features which will dramatically improve the usability of our SQL
>>> Client. I have two questions regarding the FLIP.
>>>
>>> 1. Do you think we can let users set arbitrary configurations via the
>>> SET command? A connector may have its own configurations and we don't have
>>> a way to dynamically change such configurations in SQL Client. For example,
>>> users may want to be able to change hive conf when using hive connector [1].
>>> 2. Any reason why we have to forbid queries in SQL files specified with
>>> the -f option? Hive supports a similar -f option but allows queries in the
>>> file. And a common use case is to run some query and redirect the results
>>> to a file. So I think maybe flink users would like to do the same,
>>> especially in batch scenarios.
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>
>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <[hidden email]>
>>> wrote:
>>>
>>>> Hi Shengkai,
>>>>
>>>> Glad to see this improvement. And I have some additional suggestions:
>>>>
>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>> StreamTableEnvironment for both streaming and batch sql.
>>>> #2. Improve the way of results retrieval: sql client collect the results
>>>> locally all at once using accumulators at present,
>>>>       which may have memory issues in JM or Local for the big query
>>>> result.
>>>> Accumulator is only suitable for testing purpose.
>>>>       We may change to use SelectTableSink, which is based
>>>> on CollectSinkOperatorCoordinator.
>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91. Seems
>>>> that this FLIP has not moved forward for a long time.
>>>>       Provide a long running service out of the box to facilitate the
>>>> sql
>>>> submission is necessary.
>>>>
>>>> What do you think of these?
>>>>
>>>> [1]
>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>
>>>>
>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>>>>
>>>> > Hi devs,
>>>> >
>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>>>> > Improvements.
>>>> >
>>>> > Many users have complained about the problems of the sql client. For
>>>> > example, users can not register the table proposed by FLIP-95.
>>>> >
>>>> > The main changes in this FLIP:
>>>> >
>>>> > - use -i parameter to specify the sql file to initialize the table
>>>> > environment and deprecated YAML file;
>>>> > - add -f to submit sql file and deprecated '-u' parameter;
>>>> > - add more interactive commands, e.g ADD JAR;
>>>> > - support statement set syntax;
>>>> >
>>>> >
>>>> > For more detailed changes, please refer to FLIP-163[1].
>>>> >
>>>> > Look forward to your feedback.
>>>> >
>>>> >
>>>> > Best,
>>>> > Shengkai
>>>> >
>>>> > [1]
>>>> >
>>>> >
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>> >
>>>>
>>>>
>>>> --
>>>>
>>>> *With kind regards
>>>> ------------------------------------------------------------
>>>> Sebastian Liu 刘洋
>>>> Institute of Computing Technology, Chinese Academy of Science
>>>> Mobile\WeChat: +86—15201613655
>>>> E-mail: [hidden email] <[hidden email]>
>>>> QQ: 3239559*
>>>>
>>>
>>>
>>> --
>>> Best regards!
>>> Rui Li
>>>
>>
>
> --
> Best regards!
> Rui Li
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Rui Li
Thanks Shengkai for the update! The proposed changes look good to me.

On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]> wrote:

> Hi, Rui.
> You are right. I have already modified the FLIP.
>
> The main changes:
>
> # -f parameter has no restriction about the statement type.
> Sometimes, users use the pipe to redirect the result of queries to debug
> when submitting job by -f parameter. It's much convenient comparing to
> writing INSERT INTO statements.
>
> # Add a new sql client option `sql-client.job.detach` .
> Users prefer to execute jobs one by one in the batch mode. Users can set
> this option false and the client will process the next job until the
> current job finishes. The default value of this option is false, which
> means the client will execute the next job when the current job is
> submitted.
>
> Best,
> Shengkai
>
>
>
> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
>
>> Hi Shengkai,
>>
>> Regarding #2, maybe the -f options in flink and hive have different
>> implications, and we should clarify the behavior. For example, if the
>> client just submits the job and exits, what happens if the file contains
>> two INSERT statements? I don't think we should treat them as a statement
>> set, because users should explicitly write BEGIN STATEMENT SET in that
>> case. And the client shouldn't asynchronously submit the two jobs, because
>> the 2nd may depend on the 1st, right?
>>
>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]> wrote:
>>
>>> Hi Rui,
>>> Thanks for your feedback. I agree with your suggestions.
>>>
>>> For the suggestion 1: Yes. we are plan to strengthen the set command. In
>>> the implementation, it will just put the key-value into the
>>> `Configuration`, which will be used to generate the table config. If hive
>>> supports to read the setting from the table config, users are able to set
>>> the hive-related settings.
>>>
>>> For the suggestion 2: The -f parameter will submit the job and exit. If
>>> the queries never end, users have to cancel the job by themselves, which is
>>> not reliable(people may forget their jobs). In most case, queries are used
>>> to analyze the data. Users should use queries in the interactive mode.
>>>
>>> Best,
>>> Shengkai
>>>
>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
>>>
>>>> Thanks Shengkai for bringing up this discussion. I think it covers a
>>>> lot of useful features which will dramatically improve the usability of our
>>>> SQL Client. I have two questions regarding the FLIP.
>>>>
>>>> 1. Do you think we can let users set arbitrary configurations via the
>>>> SET command? A connector may have its own configurations and we don't have
>>>> a way to dynamically change such configurations in SQL Client. For example,
>>>> users may want to be able to change hive conf when using hive connector [1].
>>>> 2. Any reason why we have to forbid queries in SQL files specified with
>>>> the -f option? Hive supports a similar -f option but allows queries in the
>>>> file. And a common use case is to run some query and redirect the results
>>>> to a file. So I think maybe flink users would like to do the same,
>>>> especially in batch scenarios.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>>
>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <[hidden email]>
>>>> wrote:
>>>>
>>>>> Hi Shengkai,
>>>>>
>>>>> Glad to see this improvement. And I have some additional suggestions:
>>>>>
>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>>> StreamTableEnvironment for both streaming and batch sql.
>>>>> #2. Improve the way of results retrieval: sql client collect the
>>>>> results
>>>>> locally all at once using accumulators at present,
>>>>>       which may have memory issues in JM or Local for the big query
>>>>> result.
>>>>> Accumulator is only suitable for testing purpose.
>>>>>       We may change to use SelectTableSink, which is based
>>>>> on CollectSinkOperatorCoordinator.
>>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91. Seems
>>>>> that this FLIP has not moved forward for a long time.
>>>>>       Provide a long running service out of the box to facilitate the
>>>>> sql
>>>>> submission is necessary.
>>>>>
>>>>> What do you think of these?
>>>>>
>>>>> [1]
>>>>>
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>
>>>>>
>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>>>>>
>>>>> > Hi devs,
>>>>> >
>>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>>>>> > Improvements.
>>>>> >
>>>>> > Many users have complained about the problems of the sql client. For
>>>>> > example, users can not register the table proposed by FLIP-95.
>>>>> >
>>>>> > The main changes in this FLIP:
>>>>> >
>>>>> > - use -i parameter to specify the sql file to initialize the table
>>>>> > environment and deprecated YAML file;
>>>>> > - add -f to submit sql file and deprecated '-u' parameter;
>>>>> > - add more interactive commands, e.g ADD JAR;
>>>>> > - support statement set syntax;
>>>>> >
>>>>> >
>>>>> > For more detailed changes, please refer to FLIP-163[1].
>>>>> >
>>>>> > Look forward to your feedback.
>>>>> >
>>>>> >
>>>>> > Best,
>>>>> > Shengkai
>>>>> >
>>>>> > [1]
>>>>> >
>>>>> >
>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *With kind regards
>>>>> ------------------------------------------------------------
>>>>> Sebastian Liu 刘洋
>>>>> Institute of Computing Technology, Chinese Academy of Science
>>>>> Mobile\WeChat: +86—15201613655
>>>>> E-mail: [hidden email] <[hidden email]>
>>>>> QQ: 3239559*
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>>
>>>
>>
>> --
>> Best regards!
>> Rui Li
>>
>

--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Jingsong Li
Thanks for the proposal, yes, sql-client is too outdated. +1 for improving
it.

About "SET"  and "RESET", Why not be "SET" and "UNSET"?

Best,
Jingsong

On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]> wrote:

> Thanks Shengkai for the update! The proposed changes look good to me.
>
> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]> wrote:
>
> > Hi, Rui.
> > You are right. I have already modified the FLIP.
> >
> > The main changes:
> >
> > # -f parameter has no restriction about the statement type.
> > Sometimes, users use the pipe to redirect the result of queries to debug
> > when submitting job by -f parameter. It's much convenient comparing to
> > writing INSERT INTO statements.
> >
> > # Add a new sql client option `sql-client.job.detach` .
> > Users prefer to execute jobs one by one in the batch mode. Users can set
> > this option false and the client will process the next job until the
> > current job finishes. The default value of this option is false, which
> > means the client will execute the next job when the current job is
> > submitted.
> >
> > Best,
> > Shengkai
> >
> >
> >
> > Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
> >
> >> Hi Shengkai,
> >>
> >> Regarding #2, maybe the -f options in flink and hive have different
> >> implications, and we should clarify the behavior. For example, if the
> >> client just submits the job and exits, what happens if the file contains
> >> two INSERT statements? I don't think we should treat them as a statement
> >> set, because users should explicitly write BEGIN STATEMENT SET in that
> >> case. And the client shouldn't asynchronously submit the two jobs,
> because
> >> the 2nd may depend on the 1st, right?
> >>
> >> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]>
> wrote:
> >>
> >>> Hi Rui,
> >>> Thanks for your feedback. I agree with your suggestions.
> >>>
> >>> For the suggestion 1: Yes. we are plan to strengthen the set command.
> In
> >>> the implementation, it will just put the key-value into the
> >>> `Configuration`, which will be used to generate the table config. If
> hive
> >>> supports to read the setting from the table config, users are able to
> set
> >>> the hive-related settings.
> >>>
> >>> For the suggestion 2: The -f parameter will submit the job and exit. If
> >>> the queries never end, users have to cancel the job by themselves,
> which is
> >>> not reliable(people may forget their jobs). In most case, queries are
> used
> >>> to analyze the data. Users should use queries in the interactive mode.
> >>>
> >>> Best,
> >>> Shengkai
> >>>
> >>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
> >>>
> >>>> Thanks Shengkai for bringing up this discussion. I think it covers a
> >>>> lot of useful features which will dramatically improve the usability
> of our
> >>>> SQL Client. I have two questions regarding the FLIP.
> >>>>
> >>>> 1. Do you think we can let users set arbitrary configurations via the
> >>>> SET command? A connector may have its own configurations and we don't
> have
> >>>> a way to dynamically change such configurations in SQL Client. For
> example,
> >>>> users may want to be able to change hive conf when using hive
> connector [1].
> >>>> 2. Any reason why we have to forbid queries in SQL files specified
> with
> >>>> the -f option? Hive supports a similar -f option but allows queries
> in the
> >>>> file. And a common use case is to run some query and redirect the
> results
> >>>> to a file. So I think maybe flink users would like to do the same,
> >>>> especially in batch scenarios.
> >>>>
> >>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> >>>>
> >>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <[hidden email]
> >
> >>>> wrote:
> >>>>
> >>>>> Hi Shengkai,
> >>>>>
> >>>>> Glad to see this improvement. And I have some additional suggestions:
> >>>>>
> >>>>> #1. Unify the TableEnvironment in ExecutionContext to
> >>>>> StreamTableEnvironment for both streaming and batch sql.
> >>>>> #2. Improve the way of results retrieval: sql client collect the
> >>>>> results
> >>>>> locally all at once using accumulators at present,
> >>>>>       which may have memory issues in JM or Local for the big query
> >>>>> result.
> >>>>> Accumulator is only suitable for testing purpose.
> >>>>>       We may change to use SelectTableSink, which is based
> >>>>> on CollectSinkOperatorCoordinator.
> >>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
> Seems
> >>>>> that this FLIP has not moved forward for a long time.
> >>>>>       Provide a long running service out of the box to facilitate the
> >>>>> sql
> >>>>> submission is necessary.
> >>>>>
> >>>>> What do you think of these?
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>
> >>>>>
> >>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
> >>>>>
> >>>>> > Hi devs,
> >>>>> >
> >>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
> >>>>> > Improvements.
> >>>>> >
> >>>>> > Many users have complained about the problems of the sql client.
> For
> >>>>> > example, users can not register the table proposed by FLIP-95.
> >>>>> >
> >>>>> > The main changes in this FLIP:
> >>>>> >
> >>>>> > - use -i parameter to specify the sql file to initialize the table
> >>>>> > environment and deprecated YAML file;
> >>>>> > - add -f to submit sql file and deprecated '-u' parameter;
> >>>>> > - add more interactive commands, e.g ADD JAR;
> >>>>> > - support statement set syntax;
> >>>>> >
> >>>>> >
> >>>>> > For more detailed changes, please refer to FLIP-163[1].
> >>>>> >
> >>>>> > Look forward to your feedback.
> >>>>> >
> >>>>> >
> >>>>> > Best,
> >>>>> > Shengkai
> >>>>> >
> >>>>> > [1]
> >>>>> >
> >>>>> >
> >>>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>> >
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> *With kind regards
> >>>>> ------------------------------------------------------------
> >>>>> Sebastian Liu 刘洋
> >>>>> Institute of Computing Technology, Chinese Academy of Science
> >>>>> Mobile\WeChat: +86—15201613655
> >>>>> E-mail: [hidden email] <[hidden email]>
> >>>>> QQ: 3239559*
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards!
> >>>> Rui Li
> >>>>
> >>>
> >>
> >> --
> >> Best regards!
> >> Rui Li
> >>
> >
>
> --
> Best regards!
> Rui Li
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Shengkai Fang
Hi, Jingsong.

Thanks for your reply. I think `UNSET` is much better.

1. We don't need to introduce another command `UNSET`. `RESET` is supported
in the current sql client now. Our proposal just extends its grammar and
allow users to reset the specified keys.
2. Hive beeline also uses `RESET` to set the key to the default value[1]. I
think it is more friendly for batch users.

Best,
Shengkai

[1] https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:

> Thanks for the proposal, yes, sql-client is too outdated. +1 for improving
> it.
>
> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>
> Best,
> Jingsong
>
> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]> wrote:
>
>> Thanks Shengkai for the update! The proposed changes look good to me.
>>
>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]> wrote:
>>
>> > Hi, Rui.
>> > You are right. I have already modified the FLIP.
>> >
>> > The main changes:
>> >
>> > # -f parameter has no restriction about the statement type.
>> > Sometimes, users use the pipe to redirect the result of queries to debug
>> > when submitting job by -f parameter. It's much convenient comparing to
>> > writing INSERT INTO statements.
>> >
>> > # Add a new sql client option `sql-client.job.detach` .
>> > Users prefer to execute jobs one by one in the batch mode. Users can set
>> > this option false and the client will process the next job until the
>> > current job finishes. The default value of this option is false, which
>> > means the client will execute the next job when the current job is
>> > submitted.
>> >
>> > Best,
>> > Shengkai
>> >
>> >
>> >
>> > Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
>> >
>> >> Hi Shengkai,
>> >>
>> >> Regarding #2, maybe the -f options in flink and hive have different
>> >> implications, and we should clarify the behavior. For example, if the
>> >> client just submits the job and exits, what happens if the file
>> contains
>> >> two INSERT statements? I don't think we should treat them as a
>> statement
>> >> set, because users should explicitly write BEGIN STATEMENT SET in that
>> >> case. And the client shouldn't asynchronously submit the two jobs,
>> because
>> >> the 2nd may depend on the 1st, right?
>> >>
>> >> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]>
>> wrote:
>> >>
>> >>> Hi Rui,
>> >>> Thanks for your feedback. I agree with your suggestions.
>> >>>
>> >>> For the suggestion 1: Yes. we are plan to strengthen the set command.
>> In
>> >>> the implementation, it will just put the key-value into the
>> >>> `Configuration`, which will be used to generate the table config. If
>> hive
>> >>> supports to read the setting from the table config, users are able to
>> set
>> >>> the hive-related settings.
>> >>>
>> >>> For the suggestion 2: The -f parameter will submit the job and exit.
>> If
>> >>> the queries never end, users have to cancel the job by themselves,
>> which is
>> >>> not reliable(people may forget their jobs). In most case, queries are
>> used
>> >>> to analyze the data. Users should use queries in the interactive mode.
>> >>>
>> >>> Best,
>> >>> Shengkai
>> >>>
>> >>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
>> >>>
>> >>>> Thanks Shengkai for bringing up this discussion. I think it covers a
>> >>>> lot of useful features which will dramatically improve the usability
>> of our
>> >>>> SQL Client. I have two questions regarding the FLIP.
>> >>>>
>> >>>> 1. Do you think we can let users set arbitrary configurations via the
>> >>>> SET command? A connector may have its own configurations and we
>> don't have
>> >>>> a way to dynamically change such configurations in SQL Client. For
>> example,
>> >>>> users may want to be able to change hive conf when using hive
>> connector [1].
>> >>>> 2. Any reason why we have to forbid queries in SQL files specified
>> with
>> >>>> the -f option? Hive supports a similar -f option but allows queries
>> in the
>> >>>> file. And a common use case is to run some query and redirect the
>> results
>> >>>> to a file. So I think maybe flink users would like to do the same,
>> >>>> especially in batch scenarios.
>> >>>>
>> >>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>> >>>>
>> >>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>> [hidden email]>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi Shengkai,
>> >>>>>
>> >>>>> Glad to see this improvement. And I have some additional
>> suggestions:
>> >>>>>
>> >>>>> #1. Unify the TableEnvironment in ExecutionContext to
>> >>>>> StreamTableEnvironment for both streaming and batch sql.
>> >>>>> #2. Improve the way of results retrieval: sql client collect the
>> >>>>> results
>> >>>>> locally all at once using accumulators at present,
>> >>>>>       which may have memory issues in JM or Local for the big query
>> >>>>> result.
>> >>>>> Accumulator is only suitable for testing purpose.
>> >>>>>       We may change to use SelectTableSink, which is based
>> >>>>> on CollectSinkOperatorCoordinator.
>> >>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
>> Seems
>> >>>>> that this FLIP has not moved forward for a long time.
>> >>>>>       Provide a long running service out of the box to facilitate
>> the
>> >>>>> sql
>> >>>>> submission is necessary.
>> >>>>>
>> >>>>> What do you think of these?
>> >>>>>
>> >>>>> [1]
>> >>>>>
>> >>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>> >>>>>
>> >>>>>
>> >>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>> >>>>>
>> >>>>> > Hi devs,
>> >>>>> >
>> >>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>> >>>>> > Improvements.
>> >>>>> >
>> >>>>> > Many users have complained about the problems of the sql client.
>> For
>> >>>>> > example, users can not register the table proposed by FLIP-95.
>> >>>>> >
>> >>>>> > The main changes in this FLIP:
>> >>>>> >
>> >>>>> > - use -i parameter to specify the sql file to initialize the table
>> >>>>> > environment and deprecated YAML file;
>> >>>>> > - add -f to submit sql file and deprecated '-u' parameter;
>> >>>>> > - add more interactive commands, e.g ADD JAR;
>> >>>>> > - support statement set syntax;
>> >>>>> >
>> >>>>> >
>> >>>>> > For more detailed changes, please refer to FLIP-163[1].
>> >>>>> >
>> >>>>> > Look forward to your feedback.
>> >>>>> >
>> >>>>> >
>> >>>>> > Best,
>> >>>>> > Shengkai
>> >>>>> >
>> >>>>> > [1]
>> >>>>> >
>> >>>>> >
>> >>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>> >>>>> >
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> *With kind regards
>> >>>>> ------------------------------------------------------------
>> >>>>> Sebastian Liu 刘洋
>> >>>>> Institute of Computing Technology, Chinese Academy of Science
>> >>>>> Mobile\WeChat: +86—15201613655
>> >>>>> E-mail: [hidden email] <[hidden email]>
>> >>>>> QQ: 3239559*
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Best regards!
>> >>>> Rui Li
>> >>>>
>> >>>
>> >>
>> >> --
>> >> Best regards!
>> >> Rui Li
>> >>
>> >
>>
>> --
>> Best regards!
>> Rui Li
>>
>
>
> --
> Best, Jingsong Lee
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Shengkai Fang
Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.

Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:

> Hi, Jingsong.
>
> Thanks for your reply. I think `UNSET` is much better.
>
> 1. We don't need to introduce another command `UNSET`. `RESET` is
> supported in the current sql client now. Our proposal just extends its
> grammar and allow users to reset the specified keys.
> 2. Hive beeline also uses `RESET` to set the key to the default value[1].
> I think it is more friendly for batch users.
>
> Best,
> Shengkai
>
> [1] https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>
> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
>
>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
>> improving it.
>>
>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>
>> Best,
>> Jingsong
>>
>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]> wrote:
>>
>>> Thanks Shengkai for the update! The proposed changes look good to me.
>>>
>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]> wrote:
>>>
>>> > Hi, Rui.
>>> > You are right. I have already modified the FLIP.
>>> >
>>> > The main changes:
>>> >
>>> > # -f parameter has no restriction about the statement type.
>>> > Sometimes, users use the pipe to redirect the result of queries to
>>> debug
>>> > when submitting job by -f parameter. It's much convenient comparing to
>>> > writing INSERT INTO statements.
>>> >
>>> > # Add a new sql client option `sql-client.job.detach` .
>>> > Users prefer to execute jobs one by one in the batch mode. Users can
>>> set
>>> > this option false and the client will process the next job until the
>>> > current job finishes. The default value of this option is false, which
>>> > means the client will execute the next job when the current job is
>>> > submitted.
>>> >
>>> > Best,
>>> > Shengkai
>>> >
>>> >
>>> >
>>> > Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
>>> >
>>> >> Hi Shengkai,
>>> >>
>>> >> Regarding #2, maybe the -f options in flink and hive have different
>>> >> implications, and we should clarify the behavior. For example, if the
>>> >> client just submits the job and exits, what happens if the file
>>> contains
>>> >> two INSERT statements? I don't think we should treat them as a
>>> statement
>>> >> set, because users should explicitly write BEGIN STATEMENT SET in that
>>> >> case. And the client shouldn't asynchronously submit the two jobs,
>>> because
>>> >> the 2nd may depend on the 1st, right?
>>> >>
>>> >> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]>
>>> wrote:
>>> >>
>>> >>> Hi Rui,
>>> >>> Thanks for your feedback. I agree with your suggestions.
>>> >>>
>>> >>> For the suggestion 1: Yes. we are plan to strengthen the set
>>> command. In
>>> >>> the implementation, it will just put the key-value into the
>>> >>> `Configuration`, which will be used to generate the table config. If
>>> hive
>>> >>> supports to read the setting from the table config, users are able
>>> to set
>>> >>> the hive-related settings.
>>> >>>
>>> >>> For the suggestion 2: The -f parameter will submit the job and exit.
>>> If
>>> >>> the queries never end, users have to cancel the job by themselves,
>>> which is
>>> >>> not reliable(people may forget their jobs). In most case, queries
>>> are used
>>> >>> to analyze the data. Users should use queries in the interactive
>>> mode.
>>> >>>
>>> >>> Best,
>>> >>> Shengkai
>>> >>>
>>> >>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
>>> >>>
>>> >>>> Thanks Shengkai for bringing up this discussion. I think it covers a
>>> >>>> lot of useful features which will dramatically improve the
>>> usability of our
>>> >>>> SQL Client. I have two questions regarding the FLIP.
>>> >>>>
>>> >>>> 1. Do you think we can let users set arbitrary configurations via
>>> the
>>> >>>> SET command? A connector may have its own configurations and we
>>> don't have
>>> >>>> a way to dynamically change such configurations in SQL Client. For
>>> example,
>>> >>>> users may want to be able to change hive conf when using hive
>>> connector [1].
>>> >>>> 2. Any reason why we have to forbid queries in SQL files specified
>>> with
>>> >>>> the -f option? Hive supports a similar -f option but allows queries
>>> in the
>>> >>>> file. And a common use case is to run some query and redirect the
>>> results
>>> >>>> to a file. So I think maybe flink users would like to do the same,
>>> >>>> especially in batch scenarios.
>>> >>>>
>>> >>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>> >>>>
>>> >>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>> [hidden email]>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Hi Shengkai,
>>> >>>>>
>>> >>>>> Glad to see this improvement. And I have some additional
>>> suggestions:
>>> >>>>>
>>> >>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>> >>>>> StreamTableEnvironment for both streaming and batch sql.
>>> >>>>> #2. Improve the way of results retrieval: sql client collect the
>>> >>>>> results
>>> >>>>> locally all at once using accumulators at present,
>>> >>>>>       which may have memory issues in JM or Local for the big query
>>> >>>>> result.
>>> >>>>> Accumulator is only suitable for testing purpose.
>>> >>>>>       We may change to use SelectTableSink, which is based
>>> >>>>> on CollectSinkOperatorCoordinator.
>>> >>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
>>> Seems
>>> >>>>> that this FLIP has not moved forward for a long time.
>>> >>>>>       Provide a long running service out of the box to facilitate
>>> the
>>> >>>>> sql
>>> >>>>> submission is necessary.
>>> >>>>>
>>> >>>>> What do you think of these?
>>> >>>>>
>>> >>>>> [1]
>>> >>>>>
>>> >>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>> >>>>>
>>> >>>>>
>>> >>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>>> >>>>>
>>> >>>>> > Hi devs,
>>> >>>>> >
>>> >>>>> > Jark and I want to start a discussion about FLIP-163:SQL Client
>>> >>>>> > Improvements.
>>> >>>>> >
>>> >>>>> > Many users have complained about the problems of the sql client.
>>> For
>>> >>>>> > example, users can not register the table proposed by FLIP-95.
>>> >>>>> >
>>> >>>>> > The main changes in this FLIP:
>>> >>>>> >
>>> >>>>> > - use -i parameter to specify the sql file to initialize the
>>> table
>>> >>>>> > environment and deprecated YAML file;
>>> >>>>> > - add -f to submit sql file and deprecated '-u' parameter;
>>> >>>>> > - add more interactive commands, e.g ADD JAR;
>>> >>>>> > - support statement set syntax;
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > For more detailed changes, please refer to FLIP-163[1].
>>> >>>>> >
>>> >>>>> > Look forward to your feedback.
>>> >>>>> >
>>> >>>>> >
>>> >>>>> > Best,
>>> >>>>> > Shengkai
>>> >>>>> >
>>> >>>>> > [1]
>>> >>>>> >
>>> >>>>> >
>>> >>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>> >>>>> >
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>>
>>> >>>>> *With kind regards
>>> >>>>> ------------------------------------------------------------
>>> >>>>> Sebastian Liu 刘洋
>>> >>>>> Institute of Computing Technology, Chinese Academy of Science
>>> >>>>> Mobile\WeChat: +86—15201613655
>>> >>>>> E-mail: [hidden email] <[hidden email]>
>>> >>>>> QQ: 3239559*
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Best regards!
>>> >>>> Rui Li
>>> >>>>
>>> >>>
>>> >>
>>> >> --
>>> >> Best regards!
>>> >> Rui Li
>>> >>
>>> >
>>>
>>> --
>>> Best regards!
>>> Rui Li
>>>
>>
>>
>> --
>> Best, Jingsong Lee
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Timo Walther-2
Thanks for this great proposal Shengkai. This will give the SQL Client a
very good update and make it production ready.

Here is some feedback from my side:

1) SQL client specific options

I don't think that `sql-client.planner` and `sql-client.execution.mode`
are SQL Client specific. Similar to `StreamExecutionEnvironment` and
`ExecutionConfig#configure` that have been added recently, we should
offer a possibility for TableEnvironment. How about we offer
`TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
`table.execution-mode` to
`org.apache.flink.table.api.config.TableConfigOptions`?

2) Execution file

Did you have a look at the Appendix of FLIP-84 [1] including the mailing
list thread at that time? Could you further elaborate how the
multi-statement execution should work for a unified batch/streaming
story? According to our past discussions, each line in an execution file
should be executed blocking which means a streaming query needs a
statement set to execute multiple INSERT INTO statement, correct? We
should also offer this functionality in
`TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
SQL Client specific needs to be determined, it could also be a general
`table.multi-sql-async` option?

3) DELETE JAR

Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
actively deleting the JAR in the corresponding path.

4) LIST JAR

This should be `SHOW JARS` according to other SQL commands such as `SHOW
CATALOGS`, `SHOW TABLES`, etc. [2].

5) EXPLAIN [ExplainDetail[, ExplainDetail]*]

We should keep the details in sync with
`org.apache.flink.table.api.ExplainDetail` and avoid confusion about
differently named ExplainDetails. I would vote for `ESTIMATED_COST`
instead of `COST`. I'm sure the original author had a reason why to call
it that way.

6) Implementation details

It would be nice to understand how we plan to implement the given
features. Most of the commands and config options should go into
TableEnvironment and SqlParser directly, correct? This way users have a
unified way of using Flink SQL. TableEnvironment would provide a similar
user experience in notebooks or interactive programs than the SQL Client.

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
[2]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html

Regards,
Timo


On 02.02.21 10:13, Shengkai Fang wrote:

> Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
>
> Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
>
>> Hi, Jingsong.
>>
>> Thanks for your reply. I think `UNSET` is much better.
>>
>> 1. We don't need to introduce another command `UNSET`. `RESET` is
>> supported in the current sql client now. Our proposal just extends its
>> grammar and allow users to reset the specified keys.
>> 2. Hive beeline also uses `RESET` to set the key to the default value[1].
>> I think it is more friendly for batch users.
>>
>> Best,
>> Shengkai
>>
>> [1] https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>
>> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
>>
>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
>>> improving it.
>>>
>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>>
>>> Best,
>>> Jingsong
>>>
>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]> wrote:
>>>
>>>> Thanks Shengkai for the update! The proposed changes look good to me.
>>>>
>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]> wrote:
>>>>
>>>>> Hi, Rui.
>>>>> You are right. I have already modified the FLIP.
>>>>>
>>>>> The main changes:
>>>>>
>>>>> # -f parameter has no restriction about the statement type.
>>>>> Sometimes, users use the pipe to redirect the result of queries to
>>>> debug
>>>>> when submitting job by -f parameter. It's much convenient comparing to
>>>>> writing INSERT INTO statements.
>>>>>
>>>>> # Add a new sql client option `sql-client.job.detach` .
>>>>> Users prefer to execute jobs one by one in the batch mode. Users can
>>>> set
>>>>> this option false and the client will process the next job until the
>>>>> current job finishes. The default value of this option is false, which
>>>>> means the client will execute the next job when the current job is
>>>>> submitted.
>>>>>
>>>>> Best,
>>>>> Shengkai
>>>>>
>>>>>
>>>>>
>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
>>>>>
>>>>>> Hi Shengkai,
>>>>>>
>>>>>> Regarding #2, maybe the -f options in flink and hive have different
>>>>>> implications, and we should clarify the behavior. For example, if the
>>>>>> client just submits the job and exits, what happens if the file
>>>> contains
>>>>>> two INSERT statements? I don't think we should treat them as a
>>>> statement
>>>>>> set, because users should explicitly write BEGIN STATEMENT SET in that
>>>>>> case. And the client shouldn't asynchronously submit the two jobs,
>>>> because
>>>>>> the 2nd may depend on the 1st, right?
>>>>>>
>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]>
>>>> wrote:
>>>>>>
>>>>>>> Hi Rui,
>>>>>>> Thanks for your feedback. I agree with your suggestions.
>>>>>>>
>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
>>>> command. In
>>>>>>> the implementation, it will just put the key-value into the
>>>>>>> `Configuration`, which will be used to generate the table config. If
>>>> hive
>>>>>>> supports to read the setting from the table config, users are able
>>>> to set
>>>>>>> the hive-related settings.
>>>>>>>
>>>>>>> For the suggestion 2: The -f parameter will submit the job and exit.
>>>> If
>>>>>>> the queries never end, users have to cancel the job by themselves,
>>>> which is
>>>>>>> not reliable(people may forget their jobs). In most case, queries
>>>> are used
>>>>>>> to analyze the data. Users should use queries in the interactive
>>>> mode.
>>>>>>>
>>>>>>> Best,
>>>>>>> Shengkai
>>>>>>>
>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
>>>>>>>
>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it covers a
>>>>>>>> lot of useful features which will dramatically improve the
>>>> usability of our
>>>>>>>> SQL Client. I have two questions regarding the FLIP.
>>>>>>>>
>>>>>>>> 1. Do you think we can let users set arbitrary configurations via
>>>> the
>>>>>>>> SET command? A connector may have its own configurations and we
>>>> don't have
>>>>>>>> a way to dynamically change such configurations in SQL Client. For
>>>> example,
>>>>>>>> users may want to be able to change hive conf when using hive
>>>> connector [1].
>>>>>>>> 2. Any reason why we have to forbid queries in SQL files specified
>>>> with
>>>>>>>> the -f option? Hive supports a similar -f option but allows queries
>>>> in the
>>>>>>>> file. And a common use case is to run some query and redirect the
>>>> results
>>>>>>>> to a file. So I think maybe flink users would like to do the same,
>>>>>>>> especially in batch scenarios.
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>>>>>>
>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>>> [hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Shengkai,
>>>>>>>>>
>>>>>>>>> Glad to see this improvement. And I have some additional
>>>> suggestions:
>>>>>>>>>
>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
>>>>>>>>> #2. Improve the way of results retrieval: sql client collect the
>>>>>>>>> results
>>>>>>>>> locally all at once using accumulators at present,
>>>>>>>>>        which may have memory issues in JM or Local for the big query
>>>>>>>>> result.
>>>>>>>>> Accumulator is only suitable for testing purpose.
>>>>>>>>>        We may change to use SelectTableSink, which is based
>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
>>>> Seems
>>>>>>>>> that this FLIP has not moved forward for a long time.
>>>>>>>>>        Provide a long running service out of the box to facilitate
>>>> the
>>>>>>>>> sql
>>>>>>>>> submission is necessary.
>>>>>>>>>
>>>>>>>>> What do you think of these?
>>>>>>>>>
>>>>>>>>> [1]
>>>>>>>>>
>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>>>>>>>>>
>>>>>>>>>> Hi devs,
>>>>>>>>>>
>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL Client
>>>>>>>>>> Improvements.
>>>>>>>>>>
>>>>>>>>>> Many users have complained about the problems of the sql client.
>>>> For
>>>>>>>>>> example, users can not register the table proposed by FLIP-95.
>>>>>>>>>>
>>>>>>>>>> The main changes in this FLIP:
>>>>>>>>>>
>>>>>>>>>> - use -i parameter to specify the sql file to initialize the
>>>> table
>>>>>>>>>> environment and deprecated YAML file;
>>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>>>>>>>>> - support statement set syntax;
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
>>>>>>>>>>
>>>>>>>>>> Look forward to your feedback.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Shengkai
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>> *With kind regards
>>>>>>>>> ------------------------------------------------------------
>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>> E-mail: [hidden email] <[hidden email]>
>>>>>>>>> QQ: 3239559*
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards!
>>>>>>>> Rui Li
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards!
>>>>>> Rui Li
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Best regards!
>>>> Rui Li
>>>>
>>>
>>>
>>> --
>>> Best, Jingsong Lee
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Shengkai Fang
Hi, Timo.
Thanks for your detailed feedback. I have some thoughts about your feedback.

*Regarding #1*: I think the main problem is whether the table environment
has the ability to update itself. Let's take a simple program as an example.


```
TableEnvironment tEnv = TableEnvironment.create(...);

tEnv.getConfig.getConfiguration.setString("table.planner", "old");


tEnv.executeSql("...");

```

If we regard this option as a table option, users don't have to create
another table environment manually. In that case, tEnv needs to check
whether the current mode and planner are the same as before when executeSql
or explainSql. I don't think it's easy work for the table environment,
especially if users have a StreamExecutionEnvironment but set old planner
and batch mode. But when we make this option as a sql client option, users
only use the SET command to change the setting. We can rebuild a new table
environment when set successes.


*Regarding #2*: I think we need to discuss the implementation before
continuing this topic. In the sql client, we will maintain two parsers. The
first parser(client parser) will only match the sql client commands. If the
client parser can't parse the statement, we will leverage the power of the
table environment to execute. According to our blueprint,
TableEnvironment#executeSql is enough for the sql client. Therefore,
TableEnvironment#executeMultiSql is out-of-scope for this FLIP.

But if we need to introduce the `TableEnvironment.executeMultiSql` in the
future, I think it's OK to use the option `table.multi-sql-async` rather
than option `sql-client.job.detach`. But we think the name is not suitable
because the name is confusing for others. When setting the option false, we
just mean it will block the execution of the INSERT INTO statement, not DDL
or others(other sql statements are always executed synchronously). So how
about `table.job.async`? It only works for the sql-client and the
executeMultiSql. If we set this value false, the table environment will
return the result until the job finishes.


*Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
because HIVE also uses these commands to add the jar into the classpath or
delete the jar. If we use  such commands, it can reduce our work for hive
compatibility.

For SHOW JAR, I think the main concern is the jars are not maintained by
the Catalog. If we really needs to keep consistent with SQL grammar, maybe
we should use

`ADD JAR` -> `CREATE JAR`,
`DELETE JAR` -> `DROP JAR`,
`LIST JAR` -> `SHOW JAR`.

*Regarding #5*: I agree with you that we'd better keep consistent.

*Regarding #6*: Yes. Most of the commands should belong to the table
environment. In the Summary section, I use the <NOTE> tag to identify which
commands should belong to the sql client and which commands should belong
to the table environment. I also add a new section about implementation
details in the FLIP.

Best,
Shengkai

Timo Walther <[hidden email]> 于2021年2月2日周二 下午6:43写道:

> Thanks for this great proposal Shengkai. This will give the SQL Client a
> very good update and make it production ready.
>
> Here is some feedback from my side:
>
> 1) SQL client specific options
>
> I don't think that `sql-client.planner` and `sql-client.execution.mode`
> are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> `ExecutionConfig#configure` that have been added recently, we should
> offer a possibility for TableEnvironment. How about we offer
> `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
> `table.execution-mode` to
> `org.apache.flink.table.api.config.TableConfigOptions`?
>
> 2) Execution file
>
> Did you have a look at the Appendix of FLIP-84 [1] including the mailing
> list thread at that time? Could you further elaborate how the
> multi-statement execution should work for a unified batch/streaming
> story? According to our past discussions, each line in an execution file
> should be executed blocking which means a streaming query needs a
> statement set to execute multiple INSERT INTO statement, correct? We
> should also offer this functionality in
> `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
> SQL Client specific needs to be determined, it could also be a general
> `table.multi-sql-async` option?
>
> 3) DELETE JAR
>
> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
> actively deleting the JAR in the corresponding path.
>
> 4) LIST JAR
>
> This should be `SHOW JARS` according to other SQL commands such as `SHOW
> CATALOGS`, `SHOW TABLES`, etc. [2].
>
> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>
> We should keep the details in sync with
> `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> instead of `COST`. I'm sure the original author had a reason why to call
> it that way.
>
> 6) Implementation details
>
> It would be nice to understand how we plan to implement the given
> features. Most of the commands and config options should go into
> TableEnvironment and SqlParser directly, correct? This way users have a
> unified way of using Flink SQL. TableEnvironment would provide a similar
> user experience in notebooks or interactive programs than the SQL Client.
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> [2]
>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>
> Regards,
> Timo
>
>
> On 02.02.21 10:13, Shengkai Fang wrote:
> > Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
> >
> > Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
> >
> >> Hi, Jingsong.
> >>
> >> Thanks for your reply. I think `UNSET` is much better.
> >>
> >> 1. We don't need to introduce another command `UNSET`. `RESET` is
> >> supported in the current sql client now. Our proposal just extends its
> >> grammar and allow users to reset the specified keys.
> >> 2. Hive beeline also uses `RESET` to set the key to the default
> value[1].
> >> I think it is more friendly for batch users.
> >>
> >> Best,
> >> Shengkai
> >>
> >> [1]
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> >>
> >> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
> >>
> >>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> >>> improving it.
> >>>
> >>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> >>>
> >>> Best,
> >>> Jingsong
> >>>
> >>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]> wrote:
> >>>
> >>>> Thanks Shengkai for the update! The proposed changes look good to me.
> >>>>
> >>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]>
> wrote:
> >>>>
> >>>>> Hi, Rui.
> >>>>> You are right. I have already modified the FLIP.
> >>>>>
> >>>>> The main changes:
> >>>>>
> >>>>> # -f parameter has no restriction about the statement type.
> >>>>> Sometimes, users use the pipe to redirect the result of queries to
> >>>> debug
> >>>>> when submitting job by -f parameter. It's much convenient comparing
> to
> >>>>> writing INSERT INTO statements.
> >>>>>
> >>>>> # Add a new sql client option `sql-client.job.detach` .
> >>>>> Users prefer to execute jobs one by one in the batch mode. Users can
> >>>> set
> >>>>> this option false and the client will process the next job until the
> >>>>> current job finishes. The default value of this option is false,
> which
> >>>>> means the client will execute the next job when the current job is
> >>>>> submitted.
> >>>>>
> >>>>> Best,
> >>>>> Shengkai
> >>>>>
> >>>>>
> >>>>>
> >>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
> >>>>>
> >>>>>> Hi Shengkai,
> >>>>>>
> >>>>>> Regarding #2, maybe the -f options in flink and hive have different
> >>>>>> implications, and we should clarify the behavior. For example, if
> the
> >>>>>> client just submits the job and exits, what happens if the file
> >>>> contains
> >>>>>> two INSERT statements? I don't think we should treat them as a
> >>>> statement
> >>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
> that
> >>>>>> case. And the client shouldn't asynchronously submit the two jobs,
> >>>> because
> >>>>>> the 2nd may depend on the 1st, right?
> >>>>>>
> >>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]>
> >>>> wrote:
> >>>>>>
> >>>>>>> Hi Rui,
> >>>>>>> Thanks for your feedback. I agree with your suggestions.
> >>>>>>>
> >>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> >>>> command. In
> >>>>>>> the implementation, it will just put the key-value into the
> >>>>>>> `Configuration`, which will be used to generate the table config.
> If
> >>>> hive
> >>>>>>> supports to read the setting from the table config, users are able
> >>>> to set
> >>>>>>> the hive-related settings.
> >>>>>>>
> >>>>>>> For the suggestion 2: The -f parameter will submit the job and
> exit.
> >>>> If
> >>>>>>> the queries never end, users have to cancel the job by themselves,
> >>>> which is
> >>>>>>> not reliable(people may forget their jobs). In most case, queries
> >>>> are used
> >>>>>>> to analyze the data. Users should use queries in the interactive
> >>>> mode.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Shengkai
> >>>>>>>
> >>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
> >>>>>>>
> >>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> covers a
> >>>>>>>> lot of useful features which will dramatically improve the
> >>>> usability of our
> >>>>>>>> SQL Client. I have two questions regarding the FLIP.
> >>>>>>>>
> >>>>>>>> 1. Do you think we can let users set arbitrary configurations via
> >>>> the
> >>>>>>>> SET command? A connector may have its own configurations and we
> >>>> don't have
> >>>>>>>> a way to dynamically change such configurations in SQL Client. For
> >>>> example,
> >>>>>>>> users may want to be able to change hive conf when using hive
> >>>> connector [1].
> >>>>>>>> 2. Any reason why we have to forbid queries in SQL files specified
> >>>> with
> >>>>>>>> the -f option? Hive supports a similar -f option but allows
> queries
> >>>> in the
> >>>>>>>> file. And a common use case is to run some query and redirect the
> >>>> results
> >>>>>>>> to a file. So I think maybe flink users would like to do the same,
> >>>>>>>> especially in batch scenarios.
> >>>>>>>>
> >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> >>>>>>>>
> >>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> >>>> [hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Shengkai,
> >>>>>>>>>
> >>>>>>>>> Glad to see this improvement. And I have some additional
> >>>> suggestions:
> >>>>>>>>>
> >>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> >>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> >>>>>>>>> #2. Improve the way of results retrieval: sql client collect the
> >>>>>>>>> results
> >>>>>>>>> locally all at once using accumulators at present,
> >>>>>>>>>        which may have memory issues in JM or Local for the big
> query
> >>>>>>>>> result.
> >>>>>>>>> Accumulator is only suitable for testing purpose.
> >>>>>>>>>        We may change to use SelectTableSink, which is based
> >>>>>>>>> on CollectSinkOperatorCoordinator.
> >>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
> >>>> Seems
> >>>>>>>>> that this FLIP has not moved forward for a long time.
> >>>>>>>>>        Provide a long running service out of the box to
> facilitate
> >>>> the
> >>>>>>>>> sql
> >>>>>>>>> submission is necessary.
> >>>>>>>>>
> >>>>>>>>> What do you think of these?
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
> >>>>>>>>>
> >>>>>>>>>> Hi devs,
> >>>>>>>>>>
> >>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL Client
> >>>>>>>>>> Improvements.
> >>>>>>>>>>
> >>>>>>>>>> Many users have complained about the problems of the sql client.
> >>>> For
> >>>>>>>>>> example, users can not register the table proposed by FLIP-95.
> >>>>>>>>>>
> >>>>>>>>>> The main changes in this FLIP:
> >>>>>>>>>>
> >>>>>>>>>> - use -i parameter to specify the sql file to initialize the
> >>>> table
> >>>>>>>>>> environment and deprecated YAML file;
> >>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> >>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> >>>>>>>>>> - support statement set syntax;
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> >>>>>>>>>>
> >>>>>>>>>> Look forward to your feedback.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Shengkai
> >>>>>>>>>>
> >>>>>>>>>> [1]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>>
> >>>>>>>>> *With kind regards
> >>>>>>>>> ------------------------------------------------------------
> >>>>>>>>> Sebastian Liu 刘洋
> >>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
> >>>>>>>>> Mobile\WeChat: +86—15201613655
> >>>>>>>>> E-mail: [hidden email] <[hidden email]>
> >>>>>>>>> QQ: 3239559*
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best regards!
> >>>>>>>> Rui Li
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best regards!
> >>>>>> Rui Li
> >>>>>>
> >>>>>
> >>>>
> >>>> --
> >>>> Best regards!
> >>>> Rui Li
> >>>>
> >>>
> >>>
> >>> --
> >>> Best, Jingsong Lee
> >>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Jark Wu-2
Hi Timo,

I will respond some of the questions:

1) SQL client specific options

Whether it starts with "table" or "sql-client" depends on where the
configuration takes effect.
If it is a table configuration, we should make clear what's the behavior
when users change
the configuration in the lifecycle of TableEnvironment.

I agree with Shengkai `sql-client.planner` and `sql-client.execution.mode`
are something special
that can't be changed after TableEnvironment has been initialized. You can
see
`StreamExecutionEnvironment` provides `configure()`  method to override
configuration after
StreamExecutionEnvironment has been initialized.

Therefore, I think it would be better to still use  `sql-client.planner`
and `sql-client.execution.mode`.

2) Execution file

From my point of view, there is a big difference between
`sql-client.job.detach` and
`TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
affect every single DML statement
in the terminal, not only the statements in SQL files. I think the single
DML statement in the interactive
terminal is something like tEnv#executeSql() instead of
tEnv#executeMultiSql.
So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`.
I just find that runtime provides a configuration called
"execution.attached" [1] which is false by default
which specifies if the pipeline is submitted in attached or detached mode.
It provides exactly the same
functionality of `sql-client.job.detach`. What do you think about using
this option?

If we also want to support this config in TableEnvironment, I think it
should also affect the DML execution
 of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
Therefore, the behavior may look like this:

val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by default
tableResult.await()   ==> manually block until finish
tEnv.getConfig().getConfiguration().setString("execution.attached", "true")
val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't need
to wait on the TableResult
tEnv.executeMultiSql(
"""
CREATE TABLE ....  ==> always sync
INSERT INTO ...  => sync, because we set configuration above
SET execution.attached = false;
INSERT INTO ...  => async
""")

On the other hand, I think `sql-client.job.detach`
and `TableEnvironment.executeMultiSql()` should be two separate topics,
as Shengkai mentioned above, SQL CLI only depends on
`TableEnvironment#executeSql()` to support multi-line statements.
I'm fine with making `executeMultiSql()` clear but don't want it to block
this FLIP, maybe we can discuss this in another thread.


Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached

On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <[hidden email]> wrote:

> Hi, Timo.
> Thanks for your detailed feedback. I have some thoughts about your
> feedback.
>
> *Regarding #1*: I think the main problem is whether the table environment
> has the ability to update itself. Let's take a simple program as an
> example.
>
>
> ```
> TableEnvironment tEnv = TableEnvironment.create(...);
>
> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
>
>
> tEnv.executeSql("...");
>
> ```
>
> If we regard this option as a table option, users don't have to create
> another table environment manually. In that case, tEnv needs to check
> whether the current mode and planner are the same as before when executeSql
> or explainSql. I don't think it's easy work for the table environment,
> especially if users have a StreamExecutionEnvironment but set old planner
> and batch mode. But when we make this option as a sql client option, users
> only use the SET command to change the setting. We can rebuild a new table
> environment when set successes.
>
>
> *Regarding #2*: I think we need to discuss the implementation before
> continuing this topic. In the sql client, we will maintain two parsers. The
> first parser(client parser) will only match the sql client commands. If the
> client parser can't parse the statement, we will leverage the power of the
> table environment to execute. According to our blueprint,
> TableEnvironment#executeSql is enough for the sql client. Therefore,
> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>
> But if we need to introduce the `TableEnvironment.executeMultiSql` in the
> future, I think it's OK to use the option `table.multi-sql-async` rather
> than option `sql-client.job.detach`. But we think the name is not suitable
> because the name is confusing for others. When setting the option false, we
> just mean it will block the execution of the INSERT INTO statement, not DDL
> or others(other sql statements are always executed synchronously). So how
> about `table.job.async`? It only works for the sql-client and the
> executeMultiSql. If we set this value false, the table environment will
> return the result until the job finishes.
>
>
> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
> because HIVE also uses these commands to add the jar into the classpath or
> delete the jar. If we use  such commands, it can reduce our work for hive
> compatibility.
>
> For SHOW JAR, I think the main concern is the jars are not maintained by
> the Catalog. If we really needs to keep consistent with SQL grammar, maybe
> we should use
>
> `ADD JAR` -> `CREATE JAR`,
> `DELETE JAR` -> `DROP JAR`,
> `LIST JAR` -> `SHOW JAR`.
>
> *Regarding #5*: I agree with you that we'd better keep consistent.
>
> *Regarding #6*: Yes. Most of the commands should belong to the table
> environment. In the Summary section, I use the <NOTE> tag to identify which
> commands should belong to the sql client and which commands should belong
> to the table environment. I also add a new section about implementation
> details in the FLIP.
>
> Best,
> Shengkai
>
> Timo Walther <[hidden email]> 于2021年2月2日周二 下午6:43写道:
>
> > Thanks for this great proposal Shengkai. This will give the SQL Client a
> > very good update and make it production ready.
> >
> > Here is some feedback from my side:
> >
> > 1) SQL client specific options
> >
> > I don't think that `sql-client.planner` and `sql-client.execution.mode`
> > are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> > `ExecutionConfig#configure` that have been added recently, we should
> > offer a possibility for TableEnvironment. How about we offer
> > `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
> > `table.execution-mode` to
> > `org.apache.flink.table.api.config.TableConfigOptions`?
> >
> > 2) Execution file
> >
> > Did you have a look at the Appendix of FLIP-84 [1] including the mailing
> > list thread at that time? Could you further elaborate how the
> > multi-statement execution should work for a unified batch/streaming
> > story? According to our past discussions, each line in an execution file
> > should be executed blocking which means a streaming query needs a
> > statement set to execute multiple INSERT INTO statement, correct? We
> > should also offer this functionality in
> > `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
> > SQL Client specific needs to be determined, it could also be a general
> > `table.multi-sql-async` option?
> >
> > 3) DELETE JAR
> >
> > Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
> > actively deleting the JAR in the corresponding path.
> >
> > 4) LIST JAR
> >
> > This should be `SHOW JARS` according to other SQL commands such as `SHOW
> > CATALOGS`, `SHOW TABLES`, etc. [2].
> >
> > 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >
> > We should keep the details in sync with
> > `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> > differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> > instead of `COST`. I'm sure the original author had a reason why to call
> > it that way.
> >
> > 6) Implementation details
> >
> > It would be nice to understand how we plan to implement the given
> > features. Most of the commands and config options should go into
> > TableEnvironment and SqlParser directly, correct? This way users have a
> > unified way of using Flink SQL. TableEnvironment would provide a similar
> > user experience in notebooks or interactive programs than the SQL Client.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > [2]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >
> > Regards,
> > Timo
> >
> >
> > On 02.02.21 10:13, Shengkai Fang wrote:
> > > Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
> > >
> > > Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
> > >
> > >> Hi, Jingsong.
> > >>
> > >> Thanks for your reply. I think `UNSET` is much better.
> > >>
> > >> 1. We don't need to introduce another command `UNSET`. `RESET` is
> > >> supported in the current sql client now. Our proposal just extends its
> > >> grammar and allow users to reset the specified keys.
> > >> 2. Hive beeline also uses `RESET` to set the key to the default
> > value[1].
> > >> I think it is more friendly for batch users.
> > >>
> > >> Best,
> > >> Shengkai
> > >>
> > >> [1]
> > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > >>
> > >> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
> > >>
> > >>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> > >>> improving it.
> > >>>
> > >>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > >>>
> > >>> Best,
> > >>> Jingsong
> > >>>
> > >>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]> wrote:
> > >>>
> > >>>> Thanks Shengkai for the update! The proposed changes look good to
> me.
> > >>>>
> > >>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]>
> > wrote:
> > >>>>
> > >>>>> Hi, Rui.
> > >>>>> You are right. I have already modified the FLIP.
> > >>>>>
> > >>>>> The main changes:
> > >>>>>
> > >>>>> # -f parameter has no restriction about the statement type.
> > >>>>> Sometimes, users use the pipe to redirect the result of queries to
> > >>>> debug
> > >>>>> when submitting job by -f parameter. It's much convenient comparing
> > to
> > >>>>> writing INSERT INTO statements.
> > >>>>>
> > >>>>> # Add a new sql client option `sql-client.job.detach` .
> > >>>>> Users prefer to execute jobs one by one in the batch mode. Users
> can
> > >>>> set
> > >>>>> this option false and the client will process the next job until
> the
> > >>>>> current job finishes. The default value of this option is false,
> > which
> > >>>>> means the client will execute the next job when the current job is
> > >>>>> submitted.
> > >>>>>
> > >>>>> Best,
> > >>>>> Shengkai
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
> > >>>>>
> > >>>>>> Hi Shengkai,
> > >>>>>>
> > >>>>>> Regarding #2, maybe the -f options in flink and hive have
> different
> > >>>>>> implications, and we should clarify the behavior. For example, if
> > the
> > >>>>>> client just submits the job and exits, what happens if the file
> > >>>> contains
> > >>>>>> two INSERT statements? I don't think we should treat them as a
> > >>>> statement
> > >>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
> > that
> > >>>>>> case. And the client shouldn't asynchronously submit the two jobs,
> > >>>> because
> > >>>>>> the 2nd may depend on the 1st, right?
> > >>>>>>
> > >>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]>
> > >>>> wrote:
> > >>>>>>
> > >>>>>>> Hi Rui,
> > >>>>>>> Thanks for your feedback. I agree with your suggestions.
> > >>>>>>>
> > >>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> > >>>> command. In
> > >>>>>>> the implementation, it will just put the key-value into the
> > >>>>>>> `Configuration`, which will be used to generate the table config.
> > If
> > >>>> hive
> > >>>>>>> supports to read the setting from the table config, users are
> able
> > >>>> to set
> > >>>>>>> the hive-related settings.
> > >>>>>>>
> > >>>>>>> For the suggestion 2: The -f parameter will submit the job and
> > exit.
> > >>>> If
> > >>>>>>> the queries never end, users have to cancel the job by
> themselves,
> > >>>> which is
> > >>>>>>> not reliable(people may forget their jobs). In most case, queries
> > >>>> are used
> > >>>>>>> to analyze the data. Users should use queries in the interactive
> > >>>> mode.
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Shengkai
> > >>>>>>>
> > >>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
> > >>>>>>>
> > >>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> > covers a
> > >>>>>>>> lot of useful features which will dramatically improve the
> > >>>> usability of our
> > >>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > >>>>>>>>
> > >>>>>>>> 1. Do you think we can let users set arbitrary configurations
> via
> > >>>> the
> > >>>>>>>> SET command? A connector may have its own configurations and we
> > >>>> don't have
> > >>>>>>>> a way to dynamically change such configurations in SQL Client.
> For
> > >>>> example,
> > >>>>>>>> users may want to be able to change hive conf when using hive
> > >>>> connector [1].
> > >>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> specified
> > >>>> with
> > >>>>>>>> the -f option? Hive supports a similar -f option but allows
> > queries
> > >>>> in the
> > >>>>>>>> file. And a common use case is to run some query and redirect
> the
> > >>>> results
> > >>>>>>>> to a file. So I think maybe flink users would like to do the
> same,
> > >>>>>>>> especially in batch scenarios.
> > >>>>>>>>
> > >>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > >>>>>>>>
> > >>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > >>>> [hidden email]>
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> Hi Shengkai,
> > >>>>>>>>>
> > >>>>>>>>> Glad to see this improvement. And I have some additional
> > >>>> suggestions:
> > >>>>>>>>>
> > >>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > >>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > >>>>>>>>> #2. Improve the way of results retrieval: sql client collect
> the
> > >>>>>>>>> results
> > >>>>>>>>> locally all at once using accumulators at present,
> > >>>>>>>>>        which may have memory issues in JM or Local for the big
> > query
> > >>>>>>>>> result.
> > >>>>>>>>> Accumulator is only suitable for testing purpose.
> > >>>>>>>>>        We may change to use SelectTableSink, which is based
> > >>>>>>>>> on CollectSinkOperatorCoordinator.
> > >>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> FLIP-91.
> > >>>> Seems
> > >>>>>>>>> that this FLIP has not moved forward for a long time.
> > >>>>>>>>>        Provide a long running service out of the box to
> > facilitate
> > >>>> the
> > >>>>>>>>> sql
> > >>>>>>>>> submission is necessary.
> > >>>>>>>>>
> > >>>>>>>>> What do you think of these?
> > >>>>>>>>>
> > >>>>>>>>> [1]
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
> > >>>>>>>>>
> > >>>>>>>>>> Hi devs,
> > >>>>>>>>>>
> > >>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> Client
> > >>>>>>>>>> Improvements.
> > >>>>>>>>>>
> > >>>>>>>>>> Many users have complained about the problems of the sql
> client.
> > >>>> For
> > >>>>>>>>>> example, users can not register the table proposed by FLIP-95.
> > >>>>>>>>>>
> > >>>>>>>>>> The main changes in this FLIP:
> > >>>>>>>>>>
> > >>>>>>>>>> - use -i parameter to specify the sql file to initialize the
> > >>>> table
> > >>>>>>>>>> environment and deprecated YAML file;
> > >>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> > >>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > >>>>>>>>>> - support statement set syntax;
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > >>>>>>>>>>
> > >>>>>>>>>> Look forward to your feedback.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Shengkai
> > >>>>>>>>>>
> > >>>>>>>>>> [1]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>>
> > >>>>>>>>> *With kind regards
> > >>>>>>>>> ------------------------------------------------------------
> > >>>>>>>>> Sebastian Liu 刘洋
> > >>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
> > >>>>>>>>> Mobile\WeChat: +86—15201613655
> > >>>>>>>>> E-mail: [hidden email] <[hidden email]>
> > >>>>>>>>> QQ: 3239559*
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> --
> > >>>>>>>> Best regards!
> > >>>>>>>> Rui Li
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best regards!
> > >>>>>> Rui Li
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>> --
> > >>>> Best regards!
> > >>>> Rui Li
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Best, Jingsong Lee
> > >>>
> > >>
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Timo Walther-2
Hi everyone,

some feedback regarding the open questions. Maybe we can discuss the
`TableEnvironment.executeMultiSql` story offline to determine how we
proceed with this in the near future.

1) "whether the table environment has the ability to update itself"

Maybe there was some misunderstanding. I don't think that we should
support `tEnv.getConfig.getConfiguration.setString("table.planner",
"old")`. Instead I'm proposing to support
`TableEnvironment.create(Configuration)` where planner and execution
mode are read immediately and a subsequent changes to these options will
have no effect. We are doing it similar in `new
StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
must not be SQL Client specific but can be part of the core table code
base. Many users would like to get a 100% preconfigured environment from
just Configuration. And this is not possible right now. We can solve
both use cases in one change.

2) "the sql client, we will maintain two parsers"

I remember we had some discussion about this and decided that we would
like to maintain only one parser. In the end it is "One Flink SQL" where
commands influence each other also with respect to keywords. It should
be fine to include the SQL Client commands in the Flink parser. Of
cource the table environment would not be able to handle the `Operation`
instance that would be the result but we can introduce hooks to handle
those `Operation`s. Or we introduce parser extensions.

Can we skip `table.job.async` in the first version? We should further
discuss whether we introduce a special SQL clause for wrapping async
behavior or if we use a config option? Esp. for streaming queries we
need to be careful and should force users to either "one INSERT INTO" or
"one STATEMENT SET".

3) 4) "HIVE also uses these commands"

In general, Hive is not a good reference. Aligning the commands more
with the remaining commands should be our goal. We just had a MODULE
discussion where we selected SHOW instead of LIST. But it is true that
JARs are not part of the catalog which is why I would not use
CREATE/DROP. ADD/REMOVE are commonly siblings in the English language.
Take a look at the Java collection API as another example.

6) "Most of the commands should belong to the table environment"

Thanks for updating the FLIP this makes things easier to understand. It
is good to see that most commends will be available in TableEnvironment.
However, I would also support SET and RESET for consistency. Again, from
an architectural point of view, if we would allow some kind of
`Operation` hook in table environment, we could check for SQL Client
specific options and forward to regular `TableConfig.getConfiguration`
otherwise. What do you think?

Regards,
Timo


On 03.02.21 08:58, Jark Wu wrote:

> Hi Timo,
>
> I will respond some of the questions:
>
> 1) SQL client specific options
>
> Whether it starts with "table" or "sql-client" depends on where the
> configuration takes effect.
> If it is a table configuration, we should make clear what's the behavior
> when users change
> the configuration in the lifecycle of TableEnvironment.
>
> I agree with Shengkai `sql-client.planner` and `sql-client.execution.mode`
> are something special
> that can't be changed after TableEnvironment has been initialized. You can
> see
> `StreamExecutionEnvironment` provides `configure()`  method to override
> configuration after
> StreamExecutionEnvironment has been initialized.
>
> Therefore, I think it would be better to still use  `sql-client.planner`
> and `sql-client.execution.mode`.
>
> 2) Execution file
>
>>From my point of view, there is a big difference between
> `sql-client.job.detach` and
> `TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
> affect every single DML statement
> in the terminal, not only the statements in SQL files. I think the single
> DML statement in the interactive
> terminal is something like tEnv#executeSql() instead of
> tEnv#executeMultiSql.
> So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`.
> I just find that runtime provides a configuration called
> "execution.attached" [1] which is false by default
> which specifies if the pipeline is submitted in attached or detached mode.
> It provides exactly the same
> functionality of `sql-client.job.detach`. What do you think about using
> this option?
>
> If we also want to support this config in TableEnvironment, I think it
> should also affect the DML execution
>   of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
> Therefore, the behavior may look like this:
>
> val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by default
> tableResult.await()   ==> manually block until finish
> tEnv.getConfig().getConfiguration().setString("execution.attached", "true")
> val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't need
> to wait on the TableResult
> tEnv.executeMultiSql(
> """
> CREATE TABLE ....  ==> always sync
> INSERT INTO ...  => sync, because we set configuration above
> SET execution.attached = false;
> INSERT INTO ...  => async
> """)
>
> On the other hand, I think `sql-client.job.detach`
> and `TableEnvironment.executeMultiSql()` should be two separate topics,
> as Shengkai mentioned above, SQL CLI only depends on
> `TableEnvironment#executeSql()` to support multi-line statements.
> I'm fine with making `executeMultiSql()` clear but don't want it to block
> this FLIP, maybe we can discuss this in another thread.
>
>
> Best,
> Jark
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
>
> On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <[hidden email]> wrote:
>
>> Hi, Timo.
>> Thanks for your detailed feedback. I have some thoughts about your
>> feedback.
>>
>> *Regarding #1*: I think the main problem is whether the table environment
>> has the ability to update itself. Let's take a simple program as an
>> example.
>>
>>
>> ```
>> TableEnvironment tEnv = TableEnvironment.create(...);
>>
>> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
>>
>>
>> tEnv.executeSql("...");
>>
>> ```
>>
>> If we regard this option as a table option, users don't have to create
>> another table environment manually. In that case, tEnv needs to check
>> whether the current mode and planner are the same as before when executeSql
>> or explainSql. I don't think it's easy work for the table environment,
>> especially if users have a StreamExecutionEnvironment but set old planner
>> and batch mode. But when we make this option as a sql client option, users
>> only use the SET command to change the setting. We can rebuild a new table
>> environment when set successes.
>>
>>
>> *Regarding #2*: I think we need to discuss the implementation before
>> continuing this topic. In the sql client, we will maintain two parsers. The
>> first parser(client parser) will only match the sql client commands. If the
>> client parser can't parse the statement, we will leverage the power of the
>> table environment to execute. According to our blueprint,
>> TableEnvironment#executeSql is enough for the sql client. Therefore,
>> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
>>
>> But if we need to introduce the `TableEnvironment.executeMultiSql` in the
>> future, I think it's OK to use the option `table.multi-sql-async` rather
>> than option `sql-client.job.detach`. But we think the name is not suitable
>> because the name is confusing for others. When setting the option false, we
>> just mean it will block the execution of the INSERT INTO statement, not DDL
>> or others(other sql statements are always executed synchronously). So how
>> about `table.job.async`? It only works for the sql-client and the
>> executeMultiSql. If we set this value false, the table environment will
>> return the result until the job finishes.
>>
>>
>> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
>> because HIVE also uses these commands to add the jar into the classpath or
>> delete the jar. If we use  such commands, it can reduce our work for hive
>> compatibility.
>>
>> For SHOW JAR, I think the main concern is the jars are not maintained by
>> the Catalog. If we really needs to keep consistent with SQL grammar, maybe
>> we should use
>>
>> `ADD JAR` -> `CREATE JAR`,
>> `DELETE JAR` -> `DROP JAR`,
>> `LIST JAR` -> `SHOW JAR`.
>>
>> *Regarding #5*: I agree with you that we'd better keep consistent.
>>
>> *Regarding #6*: Yes. Most of the commands should belong to the table
>> environment. In the Summary section, I use the <NOTE> tag to identify which
>> commands should belong to the sql client and which commands should belong
>> to the table environment. I also add a new section about implementation
>> details in the FLIP.
>>
>> Best,
>> Shengkai
>>
>> Timo Walther <[hidden email]> 于2021年2月2日周二 下午6:43写道:
>>
>>> Thanks for this great proposal Shengkai. This will give the SQL Client a
>>> very good update and make it production ready.
>>>
>>> Here is some feedback from my side:
>>>
>>> 1) SQL client specific options
>>>
>>> I don't think that `sql-client.planner` and `sql-client.execution.mode`
>>> are SQL Client specific. Similar to `StreamExecutionEnvironment` and
>>> `ExecutionConfig#configure` that have been added recently, we should
>>> offer a possibility for TableEnvironment. How about we offer
>>> `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
>>> `table.execution-mode` to
>>> `org.apache.flink.table.api.config.TableConfigOptions`?
>>>
>>> 2) Execution file
>>>
>>> Did you have a look at the Appendix of FLIP-84 [1] including the mailing
>>> list thread at that time? Could you further elaborate how the
>>> multi-statement execution should work for a unified batch/streaming
>>> story? According to our past discussions, each line in an execution file
>>> should be executed blocking which means a streaming query needs a
>>> statement set to execute multiple INSERT INTO statement, correct? We
>>> should also offer this functionality in
>>> `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach` is
>>> SQL Client specific needs to be determined, it could also be a general
>>> `table.multi-sql-async` option?
>>>
>>> 3) DELETE JAR
>>>
>>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one is
>>> actively deleting the JAR in the corresponding path.
>>>
>>> 4) LIST JAR
>>>
>>> This should be `SHOW JARS` according to other SQL commands such as `SHOW
>>> CATALOGS`, `SHOW TABLES`, etc. [2].
>>>
>>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
>>>
>>> We should keep the details in sync with
>>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
>>> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
>>> instead of `COST`. I'm sure the original author had a reason why to call
>>> it that way.
>>>
>>> 6) Implementation details
>>>
>>> It would be nice to understand how we plan to implement the given
>>> features. Most of the commands and config options should go into
>>> TableEnvironment and SqlParser directly, correct? This way users have a
>>> unified way of using Flink SQL. TableEnvironment would provide a similar
>>> user experience in notebooks or interactive programs than the SQL Client.
>>>
>>> [1]
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>> [2]
>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 02.02.21 10:13, Shengkai Fang wrote:
>>>> Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
>>>>
>>>> Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
>>>>
>>>>> Hi, Jingsong.
>>>>>
>>>>> Thanks for your reply. I think `UNSET` is much better.
>>>>>
>>>>> 1. We don't need to introduce another command `UNSET`. `RESET` is
>>>>> supported in the current sql client now. Our proposal just extends its
>>>>> grammar and allow users to reset the specified keys.
>>>>> 2. Hive beeline also uses `RESET` to set the key to the default
>>> value[1].
>>>>> I think it is more friendly for batch users.
>>>>>
>>>>> Best,
>>>>> Shengkai
>>>>>
>>>>> [1]
>>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
>>>>>
>>>>> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
>>>>>
>>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
>>>>>> improving it.
>>>>>>
>>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
>>>>>>
>>>>>> Best,
>>>>>> Jingsong
>>>>>>
>>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]> wrote:
>>>>>>
>>>>>>> Thanks Shengkai for the update! The proposed changes look good to
>> me.
>>>>>>>
>>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]>
>>> wrote:
>>>>>>>
>>>>>>>> Hi, Rui.
>>>>>>>> You are right. I have already modified the FLIP.
>>>>>>>>
>>>>>>>> The main changes:
>>>>>>>>
>>>>>>>> # -f parameter has no restriction about the statement type.
>>>>>>>> Sometimes, users use the pipe to redirect the result of queries to
>>>>>>> debug
>>>>>>>> when submitting job by -f parameter. It's much convenient comparing
>>> to
>>>>>>>> writing INSERT INTO statements.
>>>>>>>>
>>>>>>>> # Add a new sql client option `sql-client.job.detach` .
>>>>>>>> Users prefer to execute jobs one by one in the batch mode. Users
>> can
>>>>>>> set
>>>>>>>> this option false and the client will process the next job until
>> the
>>>>>>>> current job finishes. The default value of this option is false,
>>> which
>>>>>>>> means the client will execute the next job when the current job is
>>>>>>>> submitted.
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Shengkai
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
>>>>>>>>
>>>>>>>>> Hi Shengkai,
>>>>>>>>>
>>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
>> different
>>>>>>>>> implications, and we should clarify the behavior. For example, if
>>> the
>>>>>>>>> client just submits the job and exits, what happens if the file
>>>>>>> contains
>>>>>>>>> two INSERT statements? I don't think we should treat them as a
>>>>>>> statement
>>>>>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
>>> that
>>>>>>>>> case. And the client shouldn't asynchronously submit the two jobs,
>>>>>>> because
>>>>>>>>> the 2nd may depend on the 1st, right?
>>>>>>>>>
>>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Rui,
>>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
>>>>>>>>>>
>>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
>>>>>>> command. In
>>>>>>>>>> the implementation, it will just put the key-value into the
>>>>>>>>>> `Configuration`, which will be used to generate the table config.
>>> If
>>>>>>> hive
>>>>>>>>>> supports to read the setting from the table config, users are
>> able
>>>>>>> to set
>>>>>>>>>> the hive-related settings.
>>>>>>>>>>
>>>>>>>>>> For the suggestion 2: The -f parameter will submit the job and
>>> exit.
>>>>>>> If
>>>>>>>>>> the queries never end, users have to cancel the job by
>> themselves,
>>>>>>> which is
>>>>>>>>>> not reliable(people may forget their jobs). In most case, queries
>>>>>>> are used
>>>>>>>>>> to analyze the data. Users should use queries in the interactive
>>>>>>> mode.
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Shengkai
>>>>>>>>>>
>>>>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
>>>>>>>>>>
>>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
>>> covers a
>>>>>>>>>>> lot of useful features which will dramatically improve the
>>>>>>> usability of our
>>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
>>>>>>>>>>>
>>>>>>>>>>> 1. Do you think we can let users set arbitrary configurations
>> via
>>>>>>> the
>>>>>>>>>>> SET command? A connector may have its own configurations and we
>>>>>>> don't have
>>>>>>>>>>> a way to dynamically change such configurations in SQL Client.
>> For
>>>>>>> example,
>>>>>>>>>>> users may want to be able to change hive conf when using hive
>>>>>>> connector [1].
>>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
>> specified
>>>>>>> with
>>>>>>>>>>> the -f option? Hive supports a similar -f option but allows
>>> queries
>>>>>>> in the
>>>>>>>>>>> file. And a common use case is to run some query and redirect
>> the
>>>>>>> results
>>>>>>>>>>> to a file. So I think maybe flink users would like to do the
>> same,
>>>>>>>>>>> especially in batch scenarios.
>>>>>>>>>>>
>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
>>>>>>> [hidden email]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Shengkai,
>>>>>>>>>>>>
>>>>>>>>>>>> Glad to see this improvement. And I have some additional
>>>>>>> suggestions:
>>>>>>>>>>>>
>>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
>>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
>>>>>>>>>>>> #2. Improve the way of results retrieval: sql client collect
>> the
>>>>>>>>>>>> results
>>>>>>>>>>>> locally all at once using accumulators at present,
>>>>>>>>>>>>         which may have memory issues in JM or Local for the big
>>> query
>>>>>>>>>>>> result.
>>>>>>>>>>>> Accumulator is only suitable for testing purpose.
>>>>>>>>>>>>         We may change to use SelectTableSink, which is based
>>>>>>>>>>>> on CollectSinkOperatorCoordinator.
>>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
>> FLIP-91.
>>>>>>> Seems
>>>>>>>>>>>> that this FLIP has not moved forward for a long time.
>>>>>>>>>>>>         Provide a long running service out of the box to
>>> facilitate
>>>>>>> the
>>>>>>>>>>>> sql
>>>>>>>>>>>> submission is necessary.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think of these?
>>>>>>>>>>>>
>>>>>>>>>>>> [1]
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi devs,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
>> Client
>>>>>>>>>>>>> Improvements.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Many users have complained about the problems of the sql
>> client.
>>>>>>> For
>>>>>>>>>>>>> example, users can not register the table proposed by FLIP-95.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The main changes in this FLIP:
>>>>>>>>>>>>>
>>>>>>>>>>>>> - use -i parameter to specify the sql file to initialize the
>>>>>>> table
>>>>>>>>>>>>> environment and deprecated YAML file;
>>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
>>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
>>>>>>>>>>>>> - support statement set syntax;
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
>>>>>>>>>>>>>
>>>>>>>>>>>>> Look forward to your feedback.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Shengkai
>>>>>>>>>>>>>
>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>>> *With kind regards
>>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>>> Sebastian Liu 刘洋
>>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
>>>>>>>>>>>> Mobile\WeChat: +86—15201613655
>>>>>>>>>>>> E-mail: [hidden email] <[hidden email]>
>>>>>>>>>>>> QQ: 3239559*
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best regards!
>>>>>>>>>>> Rui Li
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards!
>>>>>>>>> Rui Li
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards!
>>>>>>> Rui Li
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best, Jingsong Lee
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Rui Li
Hi guys,

Regarding #3 and #4, I agree SHOW JARS is more consistent with other
commands than LIST JARS. I don't have a strong opinion about REMOVE vs
DELETE though.

While flink doesn't need to follow hive syntax, as far as I know, most
users who are requesting these features are previously hive users. So I
wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE JARS
as synonyms? It's just like lots of systems accept both EXIT and QUIT as
the command to terminate the program. So if that's not hard to achieve, and
will make users happier, I don't see a reason why we must choose one over
the other.

On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <[hidden email]> wrote:

> Hi everyone,
>
> some feedback regarding the open questions. Maybe we can discuss the
> `TableEnvironment.executeMultiSql` story offline to determine how we
> proceed with this in the near future.
>
> 1) "whether the table environment has the ability to update itself"
>
> Maybe there was some misunderstanding. I don't think that we should
> support `tEnv.getConfig.getConfiguration.setString("table.planner",
> "old")`. Instead I'm proposing to support
> `TableEnvironment.create(Configuration)` where planner and execution
> mode are read immediately and a subsequent changes to these options will
> have no effect. We are doing it similar in `new
> StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> must not be SQL Client specific but can be part of the core table code
> base. Many users would like to get a 100% preconfigured environment from
> just Configuration. And this is not possible right now. We can solve
> both use cases in one change.
>
> 2) "the sql client, we will maintain two parsers"
>
> I remember we had some discussion about this and decided that we would
> like to maintain only one parser. In the end it is "One Flink SQL" where
> commands influence each other also with respect to keywords. It should
> be fine to include the SQL Client commands in the Flink parser. Of
> cource the table environment would not be able to handle the `Operation`
> instance that would be the result but we can introduce hooks to handle
> those `Operation`s. Or we introduce parser extensions.
>
> Can we skip `table.job.async` in the first version? We should further
> discuss whether we introduce a special SQL clause for wrapping async
> behavior or if we use a config option? Esp. for streaming queries we
> need to be careful and should force users to either "one INSERT INTO" or
> "one STATEMENT SET".
>
> 3) 4) "HIVE also uses these commands"
>
> In general, Hive is not a good reference. Aligning the commands more
> with the remaining commands should be our goal. We just had a MODULE
> discussion where we selected SHOW instead of LIST. But it is true that
> JARs are not part of the catalog which is why I would not use
> CREATE/DROP. ADD/REMOVE are commonly siblings in the English language.
> Take a look at the Java collection API as another example.
>
> 6) "Most of the commands should belong to the table environment"
>
> Thanks for updating the FLIP this makes things easier to understand. It
> is good to see that most commends will be available in TableEnvironment.
> However, I would also support SET and RESET for consistency. Again, from
> an architectural point of view, if we would allow some kind of
> `Operation` hook in table environment, we could check for SQL Client
> specific options and forward to regular `TableConfig.getConfiguration`
> otherwise. What do you think?
>
> Regards,
> Timo
>
>
> On 03.02.21 08:58, Jark Wu wrote:
> > Hi Timo,
> >
> > I will respond some of the questions:
> >
> > 1) SQL client specific options
> >
> > Whether it starts with "table" or "sql-client" depends on where the
> > configuration takes effect.
> > If it is a table configuration, we should make clear what's the behavior
> > when users change
> > the configuration in the lifecycle of TableEnvironment.
> >
> > I agree with Shengkai `sql-client.planner` and
> `sql-client.execution.mode`
> > are something special
> > that can't be changed after TableEnvironment has been initialized. You
> can
> > see
> > `StreamExecutionEnvironment` provides `configure()`  method to override
> > configuration after
> > StreamExecutionEnvironment has been initialized.
> >
> > Therefore, I think it would be better to still use  `sql-client.planner`
> > and `sql-client.execution.mode`.
> >
> > 2) Execution file
> >
> >>From my point of view, there is a big difference between
> > `sql-client.job.detach` and
> > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
> > affect every single DML statement
> > in the terminal, not only the statements in SQL files. I think the single
> > DML statement in the interactive
> > terminal is something like tEnv#executeSql() instead of
> > tEnv#executeMultiSql.
> > So I don't like the "multi" and "sql" keyword in `table.multi-sql-async`.
> > I just find that runtime provides a configuration called
> > "execution.attached" [1] which is false by default
> > which specifies if the pipeline is submitted in attached or detached
> mode.
> > It provides exactly the same
> > functionality of `sql-client.job.detach`. What do you think about using
> > this option?
> >
> > If we also want to support this config in TableEnvironment, I think it
> > should also affect the DML execution
> >   of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
> > Therefore, the behavior may look like this:
> >
> > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> default
> > tableResult.await()   ==> manually block until finish
> > tEnv.getConfig().getConfiguration().setString("execution.attached",
> "true")
> > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't
> need
> > to wait on the TableResult
> > tEnv.executeMultiSql(
> > """
> > CREATE TABLE ....  ==> always sync
> > INSERT INTO ...  => sync, because we set configuration above
> > SET execution.attached = false;
> > INSERT INTO ...  => async
> > """)
> >
> > On the other hand, I think `sql-client.job.detach`
> > and `TableEnvironment.executeMultiSql()` should be two separate topics,
> > as Shengkai mentioned above, SQL CLI only depends on
> > `TableEnvironment#executeSql()` to support multi-line statements.
> > I'm fine with making `executeMultiSql()` clear but don't want it to block
> > this FLIP, maybe we can discuss this in another thread.
> >
> >
> > Best,
> > Jark
> >
> > [1]:
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> >
> > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <[hidden email]> wrote:
> >
> >> Hi, Timo.
> >> Thanks for your detailed feedback. I have some thoughts about your
> >> feedback.
> >>
> >> *Regarding #1*: I think the main problem is whether the table
> environment
> >> has the ability to update itself. Let's take a simple program as an
> >> example.
> >>
> >>
> >> ```
> >> TableEnvironment tEnv = TableEnvironment.create(...);
> >>
> >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> >>
> >>
> >> tEnv.executeSql("...");
> >>
> >> ```
> >>
> >> If we regard this option as a table option, users don't have to create
> >> another table environment manually. In that case, tEnv needs to check
> >> whether the current mode and planner are the same as before when
> executeSql
> >> or explainSql. I don't think it's easy work for the table environment,
> >> especially if users have a StreamExecutionEnvironment but set old
> planner
> >> and batch mode. But when we make this option as a sql client option,
> users
> >> only use the SET command to change the setting. We can rebuild a new
> table
> >> environment when set successes.
> >>
> >>
> >> *Regarding #2*: I think we need to discuss the implementation before
> >> continuing this topic. In the sql client, we will maintain two parsers.
> The
> >> first parser(client parser) will only match the sql client commands. If
> the
> >> client parser can't parse the statement, we will leverage the power of
> the
> >> table environment to execute. According to our blueprint,
> >> TableEnvironment#executeSql is enough for the sql client. Therefore,
> >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> >>
> >> But if we need to introduce the `TableEnvironment.executeMultiSql` in
> the
> >> future, I think it's OK to use the option `table.multi-sql-async` rather
> >> than option `sql-client.job.detach`. But we think the name is not
> suitable
> >> because the name is confusing for others. When setting the option
> false, we
> >> just mean it will block the execution of the INSERT INTO statement, not
> DDL
> >> or others(other sql statements are always executed synchronously). So
> how
> >> about `table.job.async`? It only works for the sql-client and the
> >> executeMultiSql. If we set this value false, the table environment will
> >> return the result until the job finishes.
> >>
> >>
> >> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST JAR
> >> because HIVE also uses these commands to add the jar into the classpath
> or
> >> delete the jar. If we use  such commands, it can reduce our work for
> hive
> >> compatibility.
> >>
> >> For SHOW JAR, I think the main concern is the jars are not maintained by
> >> the Catalog. If we really needs to keep consistent with SQL grammar,
> maybe
> >> we should use
> >>
> >> `ADD JAR` -> `CREATE JAR`,
> >> `DELETE JAR` -> `DROP JAR`,
> >> `LIST JAR` -> `SHOW JAR`.
> >>
> >> *Regarding #5*: I agree with you that we'd better keep consistent.
> >>
> >> *Regarding #6*: Yes. Most of the commands should belong to the table
> >> environment. In the Summary section, I use the <NOTE> tag to identify
> which
> >> commands should belong to the sql client and which commands should
> belong
> >> to the table environment. I also add a new section about implementation
> >> details in the FLIP.
> >>
> >> Best,
> >> Shengkai
> >>
> >> Timo Walther <[hidden email]> 于2021年2月2日周二 下午6:43写道:
> >>
> >>> Thanks for this great proposal Shengkai. This will give the SQL Client
> a
> >>> very good update and make it production ready.
> >>>
> >>> Here is some feedback from my side:
> >>>
> >>> 1) SQL client specific options
> >>>
> >>> I don't think that `sql-client.planner` and `sql-client.execution.mode`
> >>> are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> >>> `ExecutionConfig#configure` that have been added recently, we should
> >>> offer a possibility for TableEnvironment. How about we offer
> >>> `TableEnvironment.create(ReadableConfig)` and add a `table.planner` and
> >>> `table.execution-mode` to
> >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> >>>
> >>> 2) Execution file
> >>>
> >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> mailing
> >>> list thread at that time? Could you further elaborate how the
> >>> multi-statement execution should work for a unified batch/streaming
> >>> story? According to our past discussions, each line in an execution
> file
> >>> should be executed blocking which means a streaming query needs a
> >>> statement set to execute multiple INSERT INTO statement, correct? We
> >>> should also offer this functionality in
> >>> `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach`
> is
> >>> SQL Client specific needs to be determined, it could also be a general
> >>> `table.multi-sql-async` option?
> >>>
> >>> 3) DELETE JAR
> >>>
> >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one
> is
> >>> actively deleting the JAR in the corresponding path.
> >>>
> >>> 4) LIST JAR
> >>>
> >>> This should be `SHOW JARS` according to other SQL commands such as
> `SHOW
> >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> >>>
> >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> >>>
> >>> We should keep the details in sync with
> >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> >>> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> >>> instead of `COST`. I'm sure the original author had a reason why to
> call
> >>> it that way.
> >>>
> >>> 6) Implementation details
> >>>
> >>> It would be nice to understand how we plan to implement the given
> >>> features. Most of the commands and config options should go into
> >>> TableEnvironment and SqlParser directly, correct? This way users have a
> >>> unified way of using Flink SQL. TableEnvironment would provide a
> similar
> >>> user experience in notebooks or interactive programs than the SQL
> Client.
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>> [2]
> >>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> >>>
> >>> Regards,
> >>> Timo
> >>>
> >>>
> >>> On 02.02.21 10:13, Shengkai Fang wrote:
> >>>> Sorry for the typo. I mean `RESET` is much better rather than `UNSET`.
> >>>>
> >>>> Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
> >>>>
> >>>>> Hi, Jingsong.
> >>>>>
> >>>>> Thanks for your reply. I think `UNSET` is much better.
> >>>>>
> >>>>> 1. We don't need to introduce another command `UNSET`. `RESET` is
> >>>>> supported in the current sql client now. Our proposal just extends
> its
> >>>>> grammar and allow users to reset the specified keys.
> >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> >>> value[1].
> >>>>> I think it is more friendly for batch users.
> >>>>>
> >>>>> Best,
> >>>>> Shengkai
> >>>>>
> >>>>> [1]
> >>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> >>>>>
> >>>>> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
> >>>>>
> >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> >>>>>> improving it.
> >>>>>>
> >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> >>>>>>
> >>>>>> Best,
> >>>>>> Jingsong
> >>>>>>
> >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]>
> wrote:
> >>>>>>
> >>>>>>> Thanks Shengkai for the update! The proposed changes look good to
> >> me.
> >>>>>>>
> >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]>
> >>> wrote:
> >>>>>>>
> >>>>>>>> Hi, Rui.
> >>>>>>>> You are right. I have already modified the FLIP.
> >>>>>>>>
> >>>>>>>> The main changes:
> >>>>>>>>
> >>>>>>>> # -f parameter has no restriction about the statement type.
> >>>>>>>> Sometimes, users use the pipe to redirect the result of queries to
> >>>>>>> debug
> >>>>>>>> when submitting job by -f parameter. It's much convenient
> comparing
> >>> to
> >>>>>>>> writing INSERT INTO statements.
> >>>>>>>>
> >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> >>>>>>>> Users prefer to execute jobs one by one in the batch mode. Users
> >> can
> >>>>>>> set
> >>>>>>>> this option false and the client will process the next job until
> >> the
> >>>>>>>> current job finishes. The default value of this option is false,
> >>> which
> >>>>>>>> means the client will execute the next job when the current job is
> >>>>>>>> submitted.
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>> Shengkai
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
> >>>>>>>>
> >>>>>>>>> Hi Shengkai,
> >>>>>>>>>
> >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> >> different
> >>>>>>>>> implications, and we should clarify the behavior. For example, if
> >>> the
> >>>>>>>>> client just submits the job and exits, what happens if the file
> >>>>>>> contains
> >>>>>>>>> two INSERT statements? I don't think we should treat them as a
> >>>>>>> statement
> >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT SET in
> >>> that
> >>>>>>>>> case. And the client shouldn't asynchronously submit the two
> jobs,
> >>>>>>> because
> >>>>>>>>> the 2nd may depend on the 1st, right?
> >>>>>>>>>
> >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <[hidden email]
> >
> >>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Rui,
> >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> >>>>>>>>>>
> >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> >>>>>>> command. In
> >>>>>>>>>> the implementation, it will just put the key-value into the
> >>>>>>>>>> `Configuration`, which will be used to generate the table
> config.
> >>> If
> >>>>>>> hive
> >>>>>>>>>> supports to read the setting from the table config, users are
> >> able
> >>>>>>> to set
> >>>>>>>>>> the hive-related settings.
> >>>>>>>>>>
> >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job and
> >>> exit.
> >>>>>>> If
> >>>>>>>>>> the queries never end, users have to cancel the job by
> >> themselves,
> >>>>>>> which is
> >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> queries
> >>>>>>> are used
> >>>>>>>>>> to analyze the data. Users should use queries in the interactive
> >>>>>>> mode.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Shengkai
> >>>>>>>>>>
> >>>>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
> >>>>>>>>>>
> >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> >>> covers a
> >>>>>>>>>>> lot of useful features which will dramatically improve the
> >>>>>>> usability of our
> >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> >>>>>>>>>>>
> >>>>>>>>>>> 1. Do you think we can let users set arbitrary configurations
> >> via
> >>>>>>> the
> >>>>>>>>>>> SET command? A connector may have its own configurations and we
> >>>>>>> don't have
> >>>>>>>>>>> a way to dynamically change such configurations in SQL Client.
> >> For
> >>>>>>> example,
> >>>>>>>>>>> users may want to be able to change hive conf when using hive
> >>>>>>> connector [1].
> >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> >> specified
> >>>>>>> with
> >>>>>>>>>>> the -f option? Hive supports a similar -f option but allows
> >>> queries
> >>>>>>> in the
> >>>>>>>>>>> file. And a common use case is to run some query and redirect
> >> the
> >>>>>>> results
> >>>>>>>>>>> to a file. So I think maybe flink users would like to do the
> >> same,
> >>>>>>>>>>> especially in batch scenarios.
> >>>>>>>>>>>
> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> >>>>>>>>>>>
> >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> >>>>>>> [hidden email]>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Shengkai,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> >>>>>>> suggestions:
> >>>>>>>>>>>>
> >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client collect
> >> the
> >>>>>>>>>>>> results
> >>>>>>>>>>>> locally all at once using accumulators at present,
> >>>>>>>>>>>>         which may have memory issues in JM or Local for the
> big
> >>> query
> >>>>>>>>>>>> result.
> >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> >>>>>>>>>>>>         We may change to use SelectTableSink, which is based
> >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> >> FLIP-91.
> >>>>>>> Seems
> >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> >>>>>>>>>>>>         Provide a long running service out of the box to
> >>> facilitate
> >>>>>>> the
> >>>>>>>>>>>> sql
> >>>>>>>>>>>> submission is necessary.
> >>>>>>>>>>>>
> >>>>>>>>>>>> What do you think of these?
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> Hi devs,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> >> Client
> >>>>>>>>>>>>> Improvements.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Many users have complained about the problems of the sql
> >> client.
> >>>>>>> For
> >>>>>>>>>>>>> example, users can not register the table proposed by
> FLIP-95.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The main changes in this FLIP:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> - use -i parameter to specify the sql file to initialize the
> >>>>>>> table
> >>>>>>>>>>>>> environment and deprecated YAML file;
> >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> >>>>>>>>>>>>> - support statement set syntax;
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Look forward to your feedback.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>> Shengkai
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> [1]
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>
> >>>
> >>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>>
> >>>>>>>>>>>> *With kind regards
> >>>>>>>>>>>> ------------------------------------------------------------
> >>>>>>>>>>>> Sebastian Liu 刘洋
> >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of Science
> >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> >>>>>>>>>>>> E-mail: [hidden email] <[hidden email]>
> >>>>>>>>>>>> QQ: 3239559*
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Best regards!
> >>>>>>>>>>> Rui Li
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Best regards!
> >>>>>>>>> Rui Li
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best regards!
> >>>>>>> Rui Li
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best, Jingsong Lee
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

--
Best regards!
Rui Li
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

godfreyhe
Hi everyone,

Regarding "table.planner" and "table.execution-mode"
If we define that those two options are just used to initialize the
TableEnvironment, +1 for introducing table options instead of sql-client
options.

Regarding "the sql client, we will maintain two parsers", I want to give
more inputs:
We want to introduce sql-gateway into the Flink project (see FLIP-24 &
FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client and
the gateway service will communicate through Rest API. The " ADD JAR
/local/path/jar " will be executed in the CLI client machine. So when we
submit a sql file which contains multiple statements, the CLI client needs
to pick out the "ADD JAR" line, and also statements need to be submitted or
executed one by one to make sure the result is correct. The sql file may be
look like:

SET xxx=yyy;
create table my_table ...;
create table my_sink ...;
ADD JAR /local/path/jar1;
create function my_udf as com....MyUdf;
insert into my_sink select ..., my_udf(xx) from ...;
REMOVE JAR /local/path/jar1;
drop function my_udf;
ADD JAR /local/path/jar2;
create function my_udf as com....MyUdf2;
insert into my_sink select ..., my_udf(xx) from ...;

The lines need to be splitted into multiple statements first in the CLI
client, there are two approaches:
1. The CLI client depends on the sql-parser: the sql-parser splits the
lines and tells which lines are "ADD JAR".
pro: there is only one parser
cons: It's a little heavy that the CLI client depends on the sql-parser,
because the CLI client is just a simple tool which receives the user
commands and displays the result. The non "ADD JAR" command will be parsed
twice.

2. The CLI client splits the lines into multiple statements and finds the
ADD JAR command through regex matching.
pro: The CLI client is very light-weight.
cons: there are two parsers.

(personally, I prefer the second option)

Regarding "SHOW or LIST JARS", I think we can support them both.
For default dialect, we support SHOW JARS, but if we switch to hive
dialect, LIST JARS is also supported.


[1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway

Best,
Godfrey

Rui Li <[hidden email]> 于2021年2月4日周四 上午10:40写道:

> Hi guys,
>
> Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> commands than LIST JARS. I don't have a strong opinion about REMOVE vs
> DELETE though.
>
> While flink doesn't need to follow hive syntax, as far as I know, most
> users who are requesting these features are previously hive users. So I
> wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE JARS
> as synonyms? It's just like lots of systems accept both EXIT and QUIT as
> the command to terminate the program. So if that's not hard to achieve, and
> will make users happier, I don't see a reason why we must choose one over
> the other.
>
> On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > some feedback regarding the open questions. Maybe we can discuss the
> > `TableEnvironment.executeMultiSql` story offline to determine how we
> > proceed with this in the near future.
> >
> > 1) "whether the table environment has the ability to update itself"
> >
> > Maybe there was some misunderstanding. I don't think that we should
> > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > "old")`. Instead I'm proposing to support
> > `TableEnvironment.create(Configuration)` where planner and execution
> > mode are read immediately and a subsequent changes to these options will
> > have no effect. We are doing it similar in `new
> > StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> > must not be SQL Client specific but can be part of the core table code
> > base. Many users would like to get a 100% preconfigured environment from
> > just Configuration. And this is not possible right now. We can solve
> > both use cases in one change.
> >
> > 2) "the sql client, we will maintain two parsers"
> >
> > I remember we had some discussion about this and decided that we would
> > like to maintain only one parser. In the end it is "One Flink SQL" where
> > commands influence each other also with respect to keywords. It should
> > be fine to include the SQL Client commands in the Flink parser. Of
> > cource the table environment would not be able to handle the `Operation`
> > instance that would be the result but we can introduce hooks to handle
> > those `Operation`s. Or we introduce parser extensions.
> >
> > Can we skip `table.job.async` in the first version? We should further
> > discuss whether we introduce a special SQL clause for wrapping async
> > behavior or if we use a config option? Esp. for streaming queries we
> > need to be careful and should force users to either "one INSERT INTO" or
> > "one STATEMENT SET".
> >
> > 3) 4) "HIVE also uses these commands"
> >
> > In general, Hive is not a good reference. Aligning the commands more
> > with the remaining commands should be our goal. We just had a MODULE
> > discussion where we selected SHOW instead of LIST. But it is true that
> > JARs are not part of the catalog which is why I would not use
> > CREATE/DROP. ADD/REMOVE are commonly siblings in the English language.
> > Take a look at the Java collection API as another example.
> >
> > 6) "Most of the commands should belong to the table environment"
> >
> > Thanks for updating the FLIP this makes things easier to understand. It
> > is good to see that most commends will be available in TableEnvironment.
> > However, I would also support SET and RESET for consistency. Again, from
> > an architectural point of view, if we would allow some kind of
> > `Operation` hook in table environment, we could check for SQL Client
> > specific options and forward to regular `TableConfig.getConfiguration`
> > otherwise. What do you think?
> >
> > Regards,
> > Timo
> >
> >
> > On 03.02.21 08:58, Jark Wu wrote:
> > > Hi Timo,
> > >
> > > I will respond some of the questions:
> > >
> > > 1) SQL client specific options
> > >
> > > Whether it starts with "table" or "sql-client" depends on where the
> > > configuration takes effect.
> > > If it is a table configuration, we should make clear what's the
> behavior
> > > when users change
> > > the configuration in the lifecycle of TableEnvironment.
> > >
> > > I agree with Shengkai `sql-client.planner` and
> > `sql-client.execution.mode`
> > > are something special
> > > that can't be changed after TableEnvironment has been initialized. You
> > can
> > > see
> > > `StreamExecutionEnvironment` provides `configure()`  method to override
> > > configuration after
> > > StreamExecutionEnvironment has been initialized.
> > >
> > > Therefore, I think it would be better to still use
> `sql-client.planner`
> > > and `sql-client.execution.mode`.
> > >
> > > 2) Execution file
> > >
> > >>From my point of view, there is a big difference between
> > > `sql-client.job.detach` and
> > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach` will
> > > affect every single DML statement
> > > in the terminal, not only the statements in SQL files. I think the
> single
> > > DML statement in the interactive
> > > terminal is something like tEnv#executeSql() instead of
> > > tEnv#executeMultiSql.
> > > So I don't like the "multi" and "sql" keyword in
> `table.multi-sql-async`.
> > > I just find that runtime provides a configuration called
> > > "execution.attached" [1] which is false by default
> > > which specifies if the pipeline is submitted in attached or detached
> > mode.
> > > It provides exactly the same
> > > functionality of `sql-client.job.detach`. What do you think about using
> > > this option?
> > >
> > > If we also want to support this config in TableEnvironment, I think it
> > > should also affect the DML execution
> > >   of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
> > > Therefore, the behavior may look like this:
> > >
> > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> > default
> > > tableResult.await()   ==> manually block until finish
> > > tEnv.getConfig().getConfiguration().setString("execution.attached",
> > "true")
> > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync, don't
> > need
> > > to wait on the TableResult
> > > tEnv.executeMultiSql(
> > > """
> > > CREATE TABLE ....  ==> always sync
> > > INSERT INTO ...  => sync, because we set configuration above
> > > SET execution.attached = false;
> > > INSERT INTO ...  => async
> > > """)
> > >
> > > On the other hand, I think `sql-client.job.detach`
> > > and `TableEnvironment.executeMultiSql()` should be two separate topics,
> > > as Shengkai mentioned above, SQL CLI only depends on
> > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > I'm fine with making `executeMultiSql()` clear but don't want it to
> block
> > > this FLIP, maybe we can discuss this in another thread.
> > >
> > >
> > > Best,
> > > Jark
> > >
> > > [1]:
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > >
> > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <[hidden email]> wrote:
> > >
> > >> Hi, Timo.
> > >> Thanks for your detailed feedback. I have some thoughts about your
> > >> feedback.
> > >>
> > >> *Regarding #1*: I think the main problem is whether the table
> > environment
> > >> has the ability to update itself. Let's take a simple program as an
> > >> example.
> > >>
> > >>
> > >> ```
> > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > >>
> > >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> > >>
> > >>
> > >> tEnv.executeSql("...");
> > >>
> > >> ```
> > >>
> > >> If we regard this option as a table option, users don't have to create
> > >> another table environment manually. In that case, tEnv needs to check
> > >> whether the current mode and planner are the same as before when
> > executeSql
> > >> or explainSql. I don't think it's easy work for the table environment,
> > >> especially if users have a StreamExecutionEnvironment but set old
> > planner
> > >> and batch mode. But when we make this option as a sql client option,
> > users
> > >> only use the SET command to change the setting. We can rebuild a new
> > table
> > >> environment when set successes.
> > >>
> > >>
> > >> *Regarding #2*: I think we need to discuss the implementation before
> > >> continuing this topic. In the sql client, we will maintain two
> parsers.
> > The
> > >> first parser(client parser) will only match the sql client commands.
> If
> > the
> > >> client parser can't parse the statement, we will leverage the power of
> > the
> > >> table environment to execute. According to our blueprint,
> > >> TableEnvironment#executeSql is enough for the sql client. Therefore,
> > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > >>
> > >> But if we need to introduce the `TableEnvironment.executeMultiSql` in
> > the
> > >> future, I think it's OK to use the option `table.multi-sql-async`
> rather
> > >> than option `sql-client.job.detach`. But we think the name is not
> > suitable
> > >> because the name is confusing for others. When setting the option
> > false, we
> > >> just mean it will block the execution of the INSERT INTO statement,
> not
> > DDL
> > >> or others(other sql statements are always executed synchronously). So
> > how
> > >> about `table.job.async`? It only works for the sql-client and the
> > >> executeMultiSql. If we set this value false, the table environment
> will
> > >> return the result until the job finishes.
> > >>
> > >>
> > >> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST
> JAR
> > >> because HIVE also uses these commands to add the jar into the
> classpath
> > or
> > >> delete the jar. If we use  such commands, it can reduce our work for
> > hive
> > >> compatibility.
> > >>
> > >> For SHOW JAR, I think the main concern is the jars are not maintained
> by
> > >> the Catalog. If we really needs to keep consistent with SQL grammar,
> > maybe
> > >> we should use
> > >>
> > >> `ADD JAR` -> `CREATE JAR`,
> > >> `DELETE JAR` -> `DROP JAR`,
> > >> `LIST JAR` -> `SHOW JAR`.
> > >>
> > >> *Regarding #5*: I agree with you that we'd better keep consistent.
> > >>
> > >> *Regarding #6*: Yes. Most of the commands should belong to the table
> > >> environment. In the Summary section, I use the <NOTE> tag to identify
> > which
> > >> commands should belong to the sql client and which commands should
> > belong
> > >> to the table environment. I also add a new section about
> implementation
> > >> details in the FLIP.
> > >>
> > >> Best,
> > >> Shengkai
> > >>
> > >> Timo Walther <[hidden email]> 于2021年2月2日周二 下午6:43写道:
> > >>
> > >>> Thanks for this great proposal Shengkai. This will give the SQL
> Client
> > a
> > >>> very good update and make it production ready.
> > >>>
> > >>> Here is some feedback from my side:
> > >>>
> > >>> 1) SQL client specific options
> > >>>
> > >>> I don't think that `sql-client.planner` and
> `sql-client.execution.mode`
> > >>> are SQL Client specific. Similar to `StreamExecutionEnvironment` and
> > >>> `ExecutionConfig#configure` that have been added recently, we should
> > >>> offer a possibility for TableEnvironment. How about we offer
> > >>> `TableEnvironment.create(ReadableConfig)` and add a `table.planner`
> and
> > >>> `table.execution-mode` to
> > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > >>>
> > >>> 2) Execution file
> > >>>
> > >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> > mailing
> > >>> list thread at that time? Could you further elaborate how the
> > >>> multi-statement execution should work for a unified batch/streaming
> > >>> story? According to our past discussions, each line in an execution
> > file
> > >>> should be executed blocking which means a streaming query needs a
> > >>> statement set to execute multiple INSERT INTO statement, correct? We
> > >>> should also offer this functionality in
> > >>> `TableEnvironment.executeMultiSql()`. Whether `sql-client.job.detach`
> > is
> > >>> SQL Client specific needs to be determined, it could also be a
> general
> > >>> `table.multi-sql-async` option?
> > >>>
> > >>> 3) DELETE JAR
> > >>>
> > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like one
> > is
> > >>> actively deleting the JAR in the corresponding path.
> > >>>
> > >>> 4) LIST JAR
> > >>>
> > >>> This should be `SHOW JARS` according to other SQL commands such as
> > `SHOW
> > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > >>>
> > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > >>>
> > >>> We should keep the details in sync with
> > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion about
> > >>> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> > >>> instead of `COST`. I'm sure the original author had a reason why to
> > call
> > >>> it that way.
> > >>>
> > >>> 6) Implementation details
> > >>>
> > >>> It would be nice to understand how we plan to implement the given
> > >>> features. Most of the commands and config options should go into
> > >>> TableEnvironment and SqlParser directly, correct? This way users
> have a
> > >>> unified way of using Flink SQL. TableEnvironment would provide a
> > similar
> > >>> user experience in notebooks or interactive programs than the SQL
> > Client.
> > >>>
> > >>> [1]
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>> [2]
> > >>>
> > >>>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > >>>
> > >>> Regards,
> > >>> Timo
> > >>>
> > >>>
> > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> `UNSET`.
> > >>>>
> > >>>> Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
> > >>>>
> > >>>>> Hi, Jingsong.
> > >>>>>
> > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > >>>>>
> > >>>>> 1. We don't need to introduce another command `UNSET`. `RESET` is
> > >>>>> supported in the current sql client now. Our proposal just extends
> > its
> > >>>>> grammar and allow users to reset the specified keys.
> > >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> > >>> value[1].
> > >>>>> I think it is more friendly for batch users.
> > >>>>>
> > >>>>> Best,
> > >>>>> Shengkai
> > >>>>>
> > >>>>> [1]
> > >>> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > >>>>>
> > >>>>> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
> > >>>>>
> > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> > >>>>>> improving it.
> > >>>>>>
> > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > >>>>>>
> > >>>>>> Best,
> > >>>>>> Jingsong
> > >>>>>>
> > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]>
> > wrote:
> > >>>>>>
> > >>>>>>> Thanks Shengkai for the update! The proposed changes look good to
> > >> me.
> > >>>>>>>
> > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <[hidden email]
> >
> > >>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi, Rui.
> > >>>>>>>> You are right. I have already modified the FLIP.
> > >>>>>>>>
> > >>>>>>>> The main changes:
> > >>>>>>>>
> > >>>>>>>> # -f parameter has no restriction about the statement type.
> > >>>>>>>> Sometimes, users use the pipe to redirect the result of queries
> to
> > >>>>>>> debug
> > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > comparing
> > >>> to
> > >>>>>>>> writing INSERT INTO statements.
> > >>>>>>>>
> > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > >>>>>>>> Users prefer to execute jobs one by one in the batch mode. Users
> > >> can
> > >>>>>>> set
> > >>>>>>>> this option false and the client will process the next job until
> > >> the
> > >>>>>>>> current job finishes. The default value of this option is false,
> > >>> which
> > >>>>>>>> means the client will execute the next job when the current job
> is
> > >>>>>>>> submitted.
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>> Shengkai
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
> > >>>>>>>>
> > >>>>>>>>> Hi Shengkai,
> > >>>>>>>>>
> > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > >> different
> > >>>>>>>>> implications, and we should clarify the behavior. For example,
> if
> > >>> the
> > >>>>>>>>> client just submits the job and exits, what happens if the file
> > >>>>>>> contains
> > >>>>>>>>> two INSERT statements? I don't think we should treat them as a
> > >>>>>>> statement
> > >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT SET
> in
> > >>> that
> > >>>>>>>>> case. And the client shouldn't asynchronously submit the two
> > jobs,
> > >>>>>>> because
> > >>>>>>>>> the 2nd may depend on the 1st, right?
> > >>>>>>>>>
> > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> [hidden email]
> > >
> > >>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Hi Rui,
> > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > >>>>>>>>>>
> > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> > >>>>>>> command. In
> > >>>>>>>>>> the implementation, it will just put the key-value into the
> > >>>>>>>>>> `Configuration`, which will be used to generate the table
> > config.
> > >>> If
> > >>>>>>> hive
> > >>>>>>>>>> supports to read the setting from the table config, users are
> > >> able
> > >>>>>>> to set
> > >>>>>>>>>> the hive-related settings.
> > >>>>>>>>>>
> > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job and
> > >>> exit.
> > >>>>>>> If
> > >>>>>>>>>> the queries never end, users have to cancel the job by
> > >> themselves,
> > >>>>>>> which is
> > >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> > queries
> > >>>>>>> are used
> > >>>>>>>>>> to analyze the data. Users should use queries in the
> interactive
> > >>>>>>> mode.
> > >>>>>>>>>>
> > >>>>>>>>>> Best,
> > >>>>>>>>>> Shengkai
> > >>>>>>>>>>
> > >>>>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
> > >>>>>>>>>>
> > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> > >>> covers a
> > >>>>>>>>>>> lot of useful features which will dramatically improve the
> > >>>>>>> usability of our
> > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1. Do you think we can let users set arbitrary configurations
> > >> via
> > >>>>>>> the
> > >>>>>>>>>>> SET command? A connector may have its own configurations and
> we
> > >>>>>>> don't have
> > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> Client.
> > >> For
> > >>>>>>> example,
> > >>>>>>>>>>> users may want to be able to change hive conf when using hive
> > >>>>>>> connector [1].
> > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> > >> specified
> > >>>>>>> with
> > >>>>>>>>>>> the -f option? Hive supports a similar -f option but allows
> > >>> queries
> > >>>>>>> in the
> > >>>>>>>>>>> file. And a common use case is to run some query and redirect
> > >> the
> > >>>>>>> results
> > >>>>>>>>>>> to a file. So I think maybe flink users would like to do the
> > >> same,
> > >>>>>>>>>>> especially in batch scenarios.
> > >>>>>>>>>>>
> > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > >>>>>>> [hidden email]>
> > >>>>>>>>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi Shengkai,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> > >>>>>>> suggestions:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client collect
> > >> the
> > >>>>>>>>>>>> results
> > >>>>>>>>>>>> locally all at once using accumulators at present,
> > >>>>>>>>>>>>         which may have memory issues in JM or Local for the
> > big
> > >>> query
> > >>>>>>>>>>>> result.
> > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > >>>>>>>>>>>>         We may change to use SelectTableSink, which is based
> > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> > >> FLIP-91.
> > >>>>>>> Seems
> > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > >>>>>>>>>>>>         Provide a long running service out of the box to
> > >>> facilitate
> > >>>>>>> the
> > >>>>>>>>>>>> sql
> > >>>>>>>>>>>> submission is necessary.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> What do you think of these?
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> [1]
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi devs,
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> > >> Client
> > >>>>>>>>>>>>> Improvements.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Many users have complained about the problems of the sql
> > >> client.
> > >>>>>>> For
> > >>>>>>>>>>>>> example, users can not register the table proposed by
> > FLIP-95.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> The main changes in this FLIP:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> - use -i parameter to specify the sql file to initialize
> the
> > >>>>>>> table
> > >>>>>>>>>>>>> environment and deprecated YAML file;
> > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u' parameter;
> > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > >>>>>>>>>>>>> - support statement set syntax;
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Look forward to your feedback.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Best,
> > >>>>>>>>>>>>> Shengkai
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> [1]
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> --
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> *With kind regards
> > >>>>>>>>>>>> ------------------------------------------------------------
> > >>>>>>>>>>>> Sebastian Liu 刘洋
> > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> Science
> > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > >>>>>>>>>>>> E-mail: [hidden email] <[hidden email]>
> > >>>>>>>>>>>> QQ: 3239559*
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> --
> > >>>>>>>>>>> Best regards!
> > >>>>>>>>>>> Rui Li
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> --
> > >>>>>>>>> Best regards!
> > >>>>>>>>> Rui Li
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Best regards!
> > >>>>>>> Rui Li
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best, Jingsong Lee
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
> >
>
> --
> Best regards!
> Rui Li
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Jark Wu-2
Hi all,

Regarding "One Parser", I think it's not possible for now because Calcite
parser can't parse
special characters (e.g. "-") unless quoting them as string literals.
That's why the WITH option
key are string literals not identifiers.

SET table.exec.mini-batch.enabled = true and ADD JAR /local/my-home/test.jar
have the same
problems. That's why we propose two parser, one splits lines into multiple
statements and match special
command through regex which is light-weight, and delegate other statements
to the other parser which is Calcite parser.

Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
true syntax,
both for backward-compatibility and easy-to-use, and all the other systems
don't have quotes on the key.


Regarding "table.planner" vs "sql-client.planner",
if we want to use "table.planner", I think we should explain clearly what's
the scope it can be used in documentation.
Otherwise, there will be users complaining why the planner doesn't change
when setting the configuration on TableEnv.
Would be better throwing an exception to indicate users it's now allowed to
change planner after TableEnv is initialized.
However, it seems not easy to implement.

Best,
Jark

On Thu, 4 Feb 2021 at 15:49, godfrey he <[hidden email]> wrote:

> Hi everyone,
>
> Regarding "table.planner" and "table.execution-mode"
> If we define that those two options are just used to initialize the
> TableEnvironment, +1 for introducing table options instead of sql-client
> options.
>
> Regarding "the sql client, we will maintain two parsers", I want to give
> more inputs:
> We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client and
> the gateway service will communicate through Rest API. The " ADD JAR
> /local/path/jar " will be executed in the CLI client machine. So when we
> submit a sql file which contains multiple statements, the CLI client needs
> to pick out the "ADD JAR" line, and also statements need to be submitted or
> executed one by one to make sure the result is correct. The sql file may be
> look like:
>
> SET xxx=yyy;
> create table my_table ...;
> create table my_sink ...;
> ADD JAR /local/path/jar1;
> create function my_udf as com....MyUdf;
> insert into my_sink select ..., my_udf(xx) from ...;
> REMOVE JAR /local/path/jar1;
> drop function my_udf;
> ADD JAR /local/path/jar2;
> create function my_udf as com....MyUdf2;
> insert into my_sink select ..., my_udf(xx) from ...;
>
> The lines need to be splitted into multiple statements first in the CLI
> client, there are two approaches:
> 1. The CLI client depends on the sql-parser: the sql-parser splits the
> lines and tells which lines are "ADD JAR".
> pro: there is only one parser
> cons: It's a little heavy that the CLI client depends on the sql-parser,
> because the CLI client is just a simple tool which receives the user
> commands and displays the result. The non "ADD JAR" command will be parsed
> twice.
>
> 2. The CLI client splits the lines into multiple statements and finds the
> ADD JAR command through regex matching.
> pro: The CLI client is very light-weight.
> cons: there are two parsers.
>
> (personally, I prefer the second option)
>
> Regarding "SHOW or LIST JARS", I think we can support them both.
> For default dialect, we support SHOW JARS, but if we switch to hive
> dialect, LIST JARS is also supported.
>
>
> [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>
> Best,
> Godfrey
>
> Rui Li <[hidden email]> 于2021年2月4日周四 上午10:40写道:
>
> > Hi guys,
> >
> > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> > commands than LIST JARS. I don't have a strong opinion about REMOVE vs
> > DELETE though.
> >
> > While flink doesn't need to follow hive syntax, as far as I know, most
> > users who are requesting these features are previously hive users. So I
> > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE JARS
> > as synonyms? It's just like lots of systems accept both EXIT and QUIT as
> > the command to terminate the program. So if that's not hard to achieve,
> and
> > will make users happier, I don't see a reason why we must choose one over
> > the other.
> >
> > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <[hidden email]> wrote:
> >
> > > Hi everyone,
> > >
> > > some feedback regarding the open questions. Maybe we can discuss the
> > > `TableEnvironment.executeMultiSql` story offline to determine how we
> > > proceed with this in the near future.
> > >
> > > 1) "whether the table environment has the ability to update itself"
> > >
> > > Maybe there was some misunderstanding. I don't think that we should
> > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > "old")`. Instead I'm proposing to support
> > > `TableEnvironment.create(Configuration)` where planner and execution
> > > mode are read immediately and a subsequent changes to these options
> will
> > > have no effect. We are doing it similar in `new
> > > StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> > > must not be SQL Client specific but can be part of the core table code
> > > base. Many users would like to get a 100% preconfigured environment
> from
> > > just Configuration. And this is not possible right now. We can solve
> > > both use cases in one change.
> > >
> > > 2) "the sql client, we will maintain two parsers"
> > >
> > > I remember we had some discussion about this and decided that we would
> > > like to maintain only one parser. In the end it is "One Flink SQL"
> where
> > > commands influence each other also with respect to keywords. It should
> > > be fine to include the SQL Client commands in the Flink parser. Of
> > > cource the table environment would not be able to handle the
> `Operation`
> > > instance that would be the result but we can introduce hooks to handle
> > > those `Operation`s. Or we introduce parser extensions.
> > >
> > > Can we skip `table.job.async` in the first version? We should further
> > > discuss whether we introduce a special SQL clause for wrapping async
> > > behavior or if we use a config option? Esp. for streaming queries we
> > > need to be careful and should force users to either "one INSERT INTO"
> or
> > > "one STATEMENT SET".
> > >
> > > 3) 4) "HIVE also uses these commands"
> > >
> > > In general, Hive is not a good reference. Aligning the commands more
> > > with the remaining commands should be our goal. We just had a MODULE
> > > discussion where we selected SHOW instead of LIST. But it is true that
> > > JARs are not part of the catalog which is why I would not use
> > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English language.
> > > Take a look at the Java collection API as another example.
> > >
> > > 6) "Most of the commands should belong to the table environment"
> > >
> > > Thanks for updating the FLIP this makes things easier to understand. It
> > > is good to see that most commends will be available in
> TableEnvironment.
> > > However, I would also support SET and RESET for consistency. Again,
> from
> > > an architectural point of view, if we would allow some kind of
> > > `Operation` hook in table environment, we could check for SQL Client
> > > specific options and forward to regular `TableConfig.getConfiguration`
> > > otherwise. What do you think?
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 03.02.21 08:58, Jark Wu wrote:
> > > > Hi Timo,
> > > >
> > > > I will respond some of the questions:
> > > >
> > > > 1) SQL client specific options
> > > >
> > > > Whether it starts with "table" or "sql-client" depends on where the
> > > > configuration takes effect.
> > > > If it is a table configuration, we should make clear what's the
> > behavior
> > > > when users change
> > > > the configuration in the lifecycle of TableEnvironment.
> > > >
> > > > I agree with Shengkai `sql-client.planner` and
> > > `sql-client.execution.mode`
> > > > are something special
> > > > that can't be changed after TableEnvironment has been initialized.
> You
> > > can
> > > > see
> > > > `StreamExecutionEnvironment` provides `configure()`  method to
> override
> > > > configuration after
> > > > StreamExecutionEnvironment has been initialized.
> > > >
> > > > Therefore, I think it would be better to still use
> > `sql-client.planner`
> > > > and `sql-client.execution.mode`.
> > > >
> > > > 2) Execution file
> > > >
> > > >>From my point of view, there is a big difference between
> > > > `sql-client.job.detach` and
> > > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach`
> will
> > > > affect every single DML statement
> > > > in the terminal, not only the statements in SQL files. I think the
> > single
> > > > DML statement in the interactive
> > > > terminal is something like tEnv#executeSql() instead of
> > > > tEnv#executeMultiSql.
> > > > So I don't like the "multi" and "sql" keyword in
> > `table.multi-sql-async`.
> > > > I just find that runtime provides a configuration called
> > > > "execution.attached" [1] which is false by default
> > > > which specifies if the pipeline is submitted in attached or detached
> > > mode.
> > > > It provides exactly the same
> > > > functionality of `sql-client.job.detach`. What do you think about
> using
> > > > this option?
> > > >
> > > > If we also want to support this config in TableEnvironment, I think
> it
> > > > should also affect the DML execution
> > > >   of `tEnv#executeSql()`, not only DMLs in `tEnv#executeMultiSql()`.
> > > > Therefore, the behavior may look like this:
> > > >
> > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> > > default
> > > > tableResult.await()   ==> manually block until finish
> > > > tEnv.getConfig().getConfiguration().setString("execution.attached",
> > > "true")
> > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> don't
> > > need
> > > > to wait on the TableResult
> > > > tEnv.executeMultiSql(
> > > > """
> > > > CREATE TABLE ....  ==> always sync
> > > > INSERT INTO ...  => sync, because we set configuration above
> > > > SET execution.attached = false;
> > > > INSERT INTO ...  => async
> > > > """)
> > > >
> > > > On the other hand, I think `sql-client.job.detach`
> > > > and `TableEnvironment.executeMultiSql()` should be two separate
> topics,
> > > > as Shengkai mentioned above, SQL CLI only depends on
> > > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > > I'm fine with making `executeMultiSql()` clear but don't want it to
> > block
> > > > this FLIP, maybe we can discuss this in another thread.
> > > >
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > > >
> > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <[hidden email]>
> wrote:
> > > >
> > > >> Hi, Timo.
> > > >> Thanks for your detailed feedback. I have some thoughts about your
> > > >> feedback.
> > > >>
> > > >> *Regarding #1*: I think the main problem is whether the table
> > > environment
> > > >> has the ability to update itself. Let's take a simple program as an
> > > >> example.
> > > >>
> > > >>
> > > >> ```
> > > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > > >>
> > > >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> > > >>
> > > >>
> > > >> tEnv.executeSql("...");
> > > >>
> > > >> ```
> > > >>
> > > >> If we regard this option as a table option, users don't have to
> create
> > > >> another table environment manually. In that case, tEnv needs to
> check
> > > >> whether the current mode and planner are the same as before when
> > > executeSql
> > > >> or explainSql. I don't think it's easy work for the table
> environment,
> > > >> especially if users have a StreamExecutionEnvironment but set old
> > > planner
> > > >> and batch mode. But when we make this option as a sql client option,
> > > users
> > > >> only use the SET command to change the setting. We can rebuild a new
> > > table
> > > >> environment when set successes.
> > > >>
> > > >>
> > > >> *Regarding #2*: I think we need to discuss the implementation before
> > > >> continuing this topic. In the sql client, we will maintain two
> > parsers.
> > > The
> > > >> first parser(client parser) will only match the sql client commands.
> > If
> > > the
> > > >> client parser can't parse the statement, we will leverage the power
> of
> > > the
> > > >> table environment to execute. According to our blueprint,
> > > >> TableEnvironment#executeSql is enough for the sql client. Therefore,
> > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > > >>
> > > >> But if we need to introduce the `TableEnvironment.executeMultiSql`
> in
> > > the
> > > >> future, I think it's OK to use the option `table.multi-sql-async`
> > rather
> > > >> than option `sql-client.job.detach`. But we think the name is not
> > > suitable
> > > >> because the name is confusing for others. When setting the option
> > > false, we
> > > >> just mean it will block the execution of the INSERT INTO statement,
> > not
> > > DDL
> > > >> or others(other sql statements are always executed synchronously).
> So
> > > how
> > > >> about `table.job.async`? It only works for the sql-client and the
> > > >> executeMultiSql. If we set this value false, the table environment
> > will
> > > >> return the result until the job finishes.
> > > >>
> > > >>
> > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and LIST
> > JAR
> > > >> because HIVE also uses these commands to add the jar into the
> > classpath
> > > or
> > > >> delete the jar. If we use  such commands, it can reduce our work for
> > > hive
> > > >> compatibility.
> > > >>
> > > >> For SHOW JAR, I think the main concern is the jars are not
> maintained
> > by
> > > >> the Catalog. If we really needs to keep consistent with SQL grammar,
> > > maybe
> > > >> we should use
> > > >>
> > > >> `ADD JAR` -> `CREATE JAR`,
> > > >> `DELETE JAR` -> `DROP JAR`,
> > > >> `LIST JAR` -> `SHOW JAR`.
> > > >>
> > > >> *Regarding #5*: I agree with you that we'd better keep consistent.
> > > >>
> > > >> *Regarding #6*: Yes. Most of the commands should belong to the table
> > > >> environment. In the Summary section, I use the <NOTE> tag to
> identify
> > > which
> > > >> commands should belong to the sql client and which commands should
> > > belong
> > > >> to the table environment. I also add a new section about
> > implementation
> > > >> details in the FLIP.
> > > >>
> > > >> Best,
> > > >> Shengkai
> > > >>
> > > >> Timo Walther <[hidden email]> 于2021年2月2日周二 下午6:43写道:
> > > >>
> > > >>> Thanks for this great proposal Shengkai. This will give the SQL
> > Client
> > > a
> > > >>> very good update and make it production ready.
> > > >>>
> > > >>> Here is some feedback from my side:
> > > >>>
> > > >>> 1) SQL client specific options
> > > >>>
> > > >>> I don't think that `sql-client.planner` and
> > `sql-client.execution.mode`
> > > >>> are SQL Client specific. Similar to `StreamExecutionEnvironment`
> and
> > > >>> `ExecutionConfig#configure` that have been added recently, we
> should
> > > >>> offer a possibility for TableEnvironment. How about we offer
> > > >>> `TableEnvironment.create(ReadableConfig)` and add a `table.planner`
> > and
> > > >>> `table.execution-mode` to
> > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > > >>>
> > > >>> 2) Execution file
> > > >>>
> > > >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> > > mailing
> > > >>> list thread at that time? Could you further elaborate how the
> > > >>> multi-statement execution should work for a unified batch/streaming
> > > >>> story? According to our past discussions, each line in an execution
> > > file
> > > >>> should be executed blocking which means a streaming query needs a
> > > >>> statement set to execute multiple INSERT INTO statement, correct?
> We
> > > >>> should also offer this functionality in
> > > >>> `TableEnvironment.executeMultiSql()`. Whether
> `sql-client.job.detach`
> > > is
> > > >>> SQL Client specific needs to be determined, it could also be a
> > general
> > > >>> `table.multi-sql-async` option?
> > > >>>
> > > >>> 3) DELETE JAR
> > > >>>
> > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like
> one
> > > is
> > > >>> actively deleting the JAR in the corresponding path.
> > > >>>
> > > >>> 4) LIST JAR
> > > >>>
> > > >>> This should be `SHOW JARS` according to other SQL commands such as
> > > `SHOW
> > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > > >>>
> > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > > >>>
> > > >>> We should keep the details in sync with
> > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> about
> > > >>> differently named ExplainDetails. I would vote for `ESTIMATED_COST`
> > > >>> instead of `COST`. I'm sure the original author had a reason why to
> > > call
> > > >>> it that way.
> > > >>>
> > > >>> 6) Implementation details
> > > >>>
> > > >>> It would be nice to understand how we plan to implement the given
> > > >>> features. Most of the commands and config options should go into
> > > >>> TableEnvironment and SqlParser directly, correct? This way users
> > have a
> > > >>> unified way of using Flink SQL. TableEnvironment would provide a
> > > similar
> > > >>> user experience in notebooks or interactive programs than the SQL
> > > Client.
> > > >>>
> > > >>> [1]
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > > >>> [2]
> > > >>>
> > > >>>
> > > >>
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > > >>>
> > > >>> Regards,
> > > >>> Timo
> > > >>>
> > > >>>
> > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> > `UNSET`.
> > > >>>>
> > > >>>> Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
> > > >>>>
> > > >>>>> Hi, Jingsong.
> > > >>>>>
> > > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > > >>>>>
> > > >>>>> 1. We don't need to introduce another command `UNSET`. `RESET` is
> > > >>>>> supported in the current sql client now. Our proposal just
> extends
> > > its
> > > >>>>> grammar and allow users to reset the specified keys.
> > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> > > >>> value[1].
> > > >>>>> I think it is more friendly for batch users.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>> Shengkai
> > > >>>>>
> > > >>>>> [1]
> > > >>>
> https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > > >>>>>
> > > >>>>> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
> > > >>>>>
> > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1 for
> > > >>>>>> improving it.
> > > >>>>>>
> > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > > >>>>>>
> > > >>>>>> Best,
> > > >>>>>> Jingsong
> > > >>>>>>
> > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]>
> > > wrote:
> > > >>>>>>
> > > >>>>>>> Thanks Shengkai for the update! The proposed changes look good
> to
> > > >> me.
> > > >>>>>>>
> > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> [hidden email]
> > >
> > > >>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi, Rui.
> > > >>>>>>>> You are right. I have already modified the FLIP.
> > > >>>>>>>>
> > > >>>>>>>> The main changes:
> > > >>>>>>>>
> > > >>>>>>>> # -f parameter has no restriction about the statement type.
> > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
> queries
> > to
> > > >>>>>>> debug
> > > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > > comparing
> > > >>> to
> > > >>>>>>>> writing INSERT INTO statements.
> > > >>>>>>>>
> > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > > >>>>>>>> Users prefer to execute jobs one by one in the batch mode.
> Users
> > > >> can
> > > >>>>>>> set
> > > >>>>>>>> this option false and the client will process the next job
> until
> > > >> the
> > > >>>>>>>> current job finishes. The default value of this option is
> false,
> > > >>> which
> > > >>>>>>>> means the client will execute the next job when the current
> job
> > is
> > > >>>>>>>> submitted.
> > > >>>>>>>>
> > > >>>>>>>> Best,
> > > >>>>>>>> Shengkai
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
> > > >>>>>>>>
> > > >>>>>>>>> Hi Shengkai,
> > > >>>>>>>>>
> > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > > >> different
> > > >>>>>>>>> implications, and we should clarify the behavior. For
> example,
> > if
> > > >>> the
> > > >>>>>>>>> client just submits the job and exits, what happens if the
> file
> > > >>>>>>> contains
> > > >>>>>>>>> two INSERT statements? I don't think we should treat them as
> a
> > > >>>>>>> statement
> > > >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT
> SET
> > in
> > > >>> that
> > > >>>>>>>>> case. And the client shouldn't asynchronously submit the two
> > > jobs,
> > > >>>>>>> because
> > > >>>>>>>>> the 2nd may depend on the 1st, right?
> > > >>>>>>>>>
> > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > [hidden email]
> > > >
> > > >>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi Rui,
> > > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the set
> > > >>>>>>> command. In
> > > >>>>>>>>>> the implementation, it will just put the key-value into the
> > > >>>>>>>>>> `Configuration`, which will be used to generate the table
> > > config.
> > > >>> If
> > > >>>>>>> hive
> > > >>>>>>>>>> supports to read the setting from the table config, users
> are
> > > >> able
> > > >>>>>>> to set
> > > >>>>>>>>>> the hive-related settings.
> > > >>>>>>>>>>
> > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job
> and
> > > >>> exit.
> > > >>>>>>> If
> > > >>>>>>>>>> the queries never end, users have to cancel the job by
> > > >> themselves,
> > > >>>>>>> which is
> > > >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> > > queries
> > > >>>>>>> are used
> > > >>>>>>>>>> to analyze the data. Users should use queries in the
> > interactive
> > > >>>>>>> mode.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Shengkai
> > > >>>>>>>>>>
> > > >>>>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think it
> > > >>> covers a
> > > >>>>>>>>>>> lot of useful features which will dramatically improve the
> > > >>>>>>> usability of our
> > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
> configurations
> > > >> via
> > > >>>>>>> the
> > > >>>>>>>>>>> SET command? A connector may have its own configurations
> and
> > we
> > > >>>>>>> don't have
> > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> > Client.
> > > >> For
> > > >>>>>>> example,
> > > >>>>>>>>>>> users may want to be able to change hive conf when using
> hive
> > > >>>>>>> connector [1].
> > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> > > >> specified
> > > >>>>>>> with
> > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but allows
> > > >>> queries
> > > >>>>>>> in the
> > > >>>>>>>>>>> file. And a common use case is to run some query and
> redirect
> > > >> the
> > > >>>>>>> results
> > > >>>>>>>>>>> to a file. So I think maybe flink users would like to do
> the
> > > >> same,
> > > >>>>>>>>>>> especially in batch scenarios.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > > >>>>>>> [hidden email]>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi Shengkai,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> > > >>>>>>> suggestions:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> collect
> > > >> the
> > > >>>>>>>>>>>> results
> > > >>>>>>>>>>>> locally all at once using accumulators at present,
> > > >>>>>>>>>>>>         which may have memory issues in JM or Local for
> the
> > > big
> > > >>> query
> > > >>>>>>>>>>>> result.
> > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > > >>>>>>>>>>>>         We may change to use SelectTableSink, which is
> based
> > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> > > >> FLIP-91.
> > > >>>>>>> Seems
> > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > > >>>>>>>>>>>>         Provide a long running service out of the box to
> > > >>> facilitate
> > > >>>>>>> the
> > > >>>>>>>>>>>> sql
> > > >>>>>>>>>>>> submission is necessary.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> What do you think of these?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四 下午8:54写道:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Hi devs,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Jark and I want to start a discussion about FLIP-163:SQL
> > > >> Client
> > > >>>>>>>>>>>>> Improvements.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Many users have complained about the problems of the sql
> > > >> client.
> > > >>>>>>> For
> > > >>>>>>>>>>>>> example, users can not register the table proposed by
> > > FLIP-95.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> The main changes in this FLIP:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to initialize
> > the
> > > >>>>>>> table
> > > >>>>>>>>>>>>> environment and deprecated YAML file;
> > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> parameter;
> > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > > >>>>>>>>>>>>> - support statement set syntax;
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Look forward to your feedback.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>> Shengkai
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> [1]
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> --
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> *With kind regards
> > > >>>>>>>>>>>>
> ------------------------------------------------------------
> > > >>>>>>>>>>>> Sebastian Liu 刘洋
> > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > Science
> > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > > >>>>>>>>>>>> E-mail: [hidden email] <[hidden email]>
> > > >>>>>>>>>>>> QQ: 3239559*
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> --
> > > >>>>>>>>>>> Best regards!
> > > >>>>>>>>>>> Rui Li
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Best regards!
> > > >>>>>>>>> Rui Li
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Best regards!
> > > >>>>>>> Rui Li
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Best, Jingsong Lee
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > >
> > >
> >
> > --
> > Best regards!
> > Rui Li
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Ingo Bürk
Hi,

regarding the (un-)quoted question, compatibility is of course an important
argument, but in terms of consistency I'd find it a bit surprising that
WITH handles it differently than SET, and I wonder if that could cause
friction for developers when writing their SQL.


Regards
Ingo

On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <[hidden email]> wrote:

> Hi all,
>
> Regarding "One Parser", I think it's not possible for now because Calcite
> parser can't parse
> special characters (e.g. "-") unless quoting them as string literals.
> That's why the WITH option
> key are string literals not identifiers.
>
> SET table.exec.mini-batch.enabled = true and ADD JAR
> /local/my-home/test.jar
> have the same
> problems. That's why we propose two parser, one splits lines into multiple
> statements and match special
> command through regex which is light-weight, and delegate other statements
> to the other parser which is Calcite parser.
>
> Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
> true syntax,
> both for backward-compatibility and easy-to-use, and all the other systems
> don't have quotes on the key.
>
>
> Regarding "table.planner" vs "sql-client.planner",
> if we want to use "table.planner", I think we should explain clearly what's
> the scope it can be used in documentation.
> Otherwise, there will be users complaining why the planner doesn't change
> when setting the configuration on TableEnv.
> Would be better throwing an exception to indicate users it's now allowed to
> change planner after TableEnv is initialized.
> However, it seems not easy to implement.
>
> Best,
> Jark
>
> On Thu, 4 Feb 2021 at 15:49, godfrey he <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > Regarding "table.planner" and "table.execution-mode"
> > If we define that those two options are just used to initialize the
> > TableEnvironment, +1 for introducing table options instead of sql-client
> > options.
> >
> > Regarding "the sql client, we will maintain two parsers", I want to give
> > more inputs:
> > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> > FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client and
> > the gateway service will communicate through Rest API. The " ADD JAR
> > /local/path/jar " will be executed in the CLI client machine. So when we
> > submit a sql file which contains multiple statements, the CLI client
> needs
> > to pick out the "ADD JAR" line, and also statements need to be submitted
> or
> > executed one by one to make sure the result is correct. The sql file may
> be
> > look like:
> >
> > SET xxx=yyy;
> > create table my_table ...;
> > create table my_sink ...;
> > ADD JAR /local/path/jar1;
> > create function my_udf as com....MyUdf;
> > insert into my_sink select ..., my_udf(xx) from ...;
> > REMOVE JAR /local/path/jar1;
> > drop function my_udf;
> > ADD JAR /local/path/jar2;
> > create function my_udf as com....MyUdf2;
> > insert into my_sink select ..., my_udf(xx) from ...;
> >
> > The lines need to be splitted into multiple statements first in the CLI
> > client, there are two approaches:
> > 1. The CLI client depends on the sql-parser: the sql-parser splits the
> > lines and tells which lines are "ADD JAR".
> > pro: there is only one parser
> > cons: It's a little heavy that the CLI client depends on the sql-parser,
> > because the CLI client is just a simple tool which receives the user
> > commands and displays the result. The non "ADD JAR" command will be
> parsed
> > twice.
> >
> > 2. The CLI client splits the lines into multiple statements and finds the
> > ADD JAR command through regex matching.
> > pro: The CLI client is very light-weight.
> > cons: there are two parsers.
> >
> > (personally, I prefer the second option)
> >
> > Regarding "SHOW or LIST JARS", I think we can support them both.
> > For default dialect, we support SHOW JARS, but if we switch to hive
> > dialect, LIST JARS is also supported.
> >
> >
> > [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >
> > Best,
> > Godfrey
> >
> > Rui Li <[hidden email]> 于2021年2月4日周四 上午10:40写道:
> >
> > > Hi guys,
> > >
> > > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> > > commands than LIST JARS. I don't have a strong opinion about REMOVE vs
> > > DELETE though.
> > >
> > > While flink doesn't need to follow hive syntax, as far as I know, most
> > > users who are requesting these features are previously hive users. So I
> > > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
> JARS
> > > as synonyms? It's just like lots of systems accept both EXIT and QUIT
> as
> > > the command to terminate the program. So if that's not hard to achieve,
> > and
> > > will make users happier, I don't see a reason why we must choose one
> over
> > > the other.
> > >
> > > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <[hidden email]>
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > some feedback regarding the open questions. Maybe we can discuss the
> > > > `TableEnvironment.executeMultiSql` story offline to determine how we
> > > > proceed with this in the near future.
> > > >
> > > > 1) "whether the table environment has the ability to update itself"
> > > >
> > > > Maybe there was some misunderstanding. I don't think that we should
> > > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > > "old")`. Instead I'm proposing to support
> > > > `TableEnvironment.create(Configuration)` where planner and execution
> > > > mode are read immediately and a subsequent changes to these options
> > will
> > > > have no effect. We are doing it similar in `new
> > > > StreamExecutionEnvironment(Configuration)`. These two ConfigOption's
> > > > must not be SQL Client specific but can be part of the core table
> code
> > > > base. Many users would like to get a 100% preconfigured environment
> > from
> > > > just Configuration. And this is not possible right now. We can solve
> > > > both use cases in one change.
> > > >
> > > > 2) "the sql client, we will maintain two parsers"
> > > >
> > > > I remember we had some discussion about this and decided that we
> would
> > > > like to maintain only one parser. In the end it is "One Flink SQL"
> > where
> > > > commands influence each other also with respect to keywords. It
> should
> > > > be fine to include the SQL Client commands in the Flink parser. Of
> > > > cource the table environment would not be able to handle the
> > `Operation`
> > > > instance that would be the result but we can introduce hooks to
> handle
> > > > those `Operation`s. Or we introduce parser extensions.
> > > >
> > > > Can we skip `table.job.async` in the first version? We should further
> > > > discuss whether we introduce a special SQL clause for wrapping async
> > > > behavior or if we use a config option? Esp. for streaming queries we
> > > > need to be careful and should force users to either "one INSERT INTO"
> > or
> > > > "one STATEMENT SET".
> > > >
> > > > 3) 4) "HIVE also uses these commands"
> > > >
> > > > In general, Hive is not a good reference. Aligning the commands more
> > > > with the remaining commands should be our goal. We just had a MODULE
> > > > discussion where we selected SHOW instead of LIST. But it is true
> that
> > > > JARs are not part of the catalog which is why I would not use
> > > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> language.
> > > > Take a look at the Java collection API as another example.
> > > >
> > > > 6) "Most of the commands should belong to the table environment"
> > > >
> > > > Thanks for updating the FLIP this makes things easier to understand.
> It
> > > > is good to see that most commends will be available in
> > TableEnvironment.
> > > > However, I would also support SET and RESET for consistency. Again,
> > from
> > > > an architectural point of view, if we would allow some kind of
> > > > `Operation` hook in table environment, we could check for SQL Client
> > > > specific options and forward to regular
> `TableConfig.getConfiguration`
> > > > otherwise. What do you think?
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > On 03.02.21 08:58, Jark Wu wrote:
> > > > > Hi Timo,
> > > > >
> > > > > I will respond some of the questions:
> > > > >
> > > > > 1) SQL client specific options
> > > > >
> > > > > Whether it starts with "table" or "sql-client" depends on where the
> > > > > configuration takes effect.
> > > > > If it is a table configuration, we should make clear what's the
> > > behavior
> > > > > when users change
> > > > > the configuration in the lifecycle of TableEnvironment.
> > > > >
> > > > > I agree with Shengkai `sql-client.planner` and
> > > > `sql-client.execution.mode`
> > > > > are something special
> > > > > that can't be changed after TableEnvironment has been initialized.
> > You
> > > > can
> > > > > see
> > > > > `StreamExecutionEnvironment` provides `configure()`  method to
> > override
> > > > > configuration after
> > > > > StreamExecutionEnvironment has been initialized.
> > > > >
> > > > > Therefore, I think it would be better to still use
> > > `sql-client.planner`
> > > > > and `sql-client.execution.mode`.
> > > > >
> > > > > 2) Execution file
> > > > >
> > > > >>From my point of view, there is a big difference between
> > > > > `sql-client.job.detach` and
> > > > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach`
> > will
> > > > > affect every single DML statement
> > > > > in the terminal, not only the statements in SQL files. I think the
> > > single
> > > > > DML statement in the interactive
> > > > > terminal is something like tEnv#executeSql() instead of
> > > > > tEnv#executeMultiSql.
> > > > > So I don't like the "multi" and "sql" keyword in
> > > `table.multi-sql-async`.
> > > > > I just find that runtime provides a configuration called
> > > > > "execution.attached" [1] which is false by default
> > > > > which specifies if the pipeline is submitted in attached or
> detached
> > > > mode.
> > > > > It provides exactly the same
> > > > > functionality of `sql-client.job.detach`. What do you think about
> > using
> > > > > this option?
> > > > >
> > > > > If we also want to support this config in TableEnvironment, I think
> > it
> > > > > should also affect the DML execution
> > > > >   of `tEnv#executeSql()`, not only DMLs in
> `tEnv#executeMultiSql()`.
> > > > > Therefore, the behavior may look like this:
> > > > >
> > > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async by
> > > > default
> > > > > tableResult.await()   ==> manually block until finish
> > > > > tEnv.getConfig().getConfiguration().setString("execution.attached",
> > > > "true")
> > > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> > don't
> > > > need
> > > > > to wait on the TableResult
> > > > > tEnv.executeMultiSql(
> > > > > """
> > > > > CREATE TABLE ....  ==> always sync
> > > > > INSERT INTO ...  => sync, because we set configuration above
> > > > > SET execution.attached = false;
> > > > > INSERT INTO ...  => async
> > > > > """)
> > > > >
> > > > > On the other hand, I think `sql-client.job.detach`
> > > > > and `TableEnvironment.executeMultiSql()` should be two separate
> > topics,
> > > > > as Shengkai mentioned above, SQL CLI only depends on
> > > > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > > > I'm fine with making `executeMultiSql()` clear but don't want it to
> > > block
> > > > > this FLIP, maybe we can discuss this in another thread.
> > > > >
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > > [1]:
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > > > >
> > > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <[hidden email]>
> > wrote:
> > > > >
> > > > >> Hi, Timo.
> > > > >> Thanks for your detailed feedback. I have some thoughts about your
> > > > >> feedback.
> > > > >>
> > > > >> *Regarding #1*: I think the main problem is whether the table
> > > > environment
> > > > >> has the ability to update itself. Let's take a simple program as
> an
> > > > >> example.
> > > > >>
> > > > >>
> > > > >> ```
> > > > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > > > >>
> > > > >> tEnv.getConfig.getConfiguration.setString("table.planner", "old");
> > > > >>
> > > > >>
> > > > >> tEnv.executeSql("...");
> > > > >>
> > > > >> ```
> > > > >>
> > > > >> If we regard this option as a table option, users don't have to
> > create
> > > > >> another table environment manually. In that case, tEnv needs to
> > check
> > > > >> whether the current mode and planner are the same as before when
> > > > executeSql
> > > > >> or explainSql. I don't think it's easy work for the table
> > environment,
> > > > >> especially if users have a StreamExecutionEnvironment but set old
> > > > planner
> > > > >> and batch mode. But when we make this option as a sql client
> option,
> > > > users
> > > > >> only use the SET command to change the setting. We can rebuild a
> new
> > > > table
> > > > >> environment when set successes.
> > > > >>
> > > > >>
> > > > >> *Regarding #2*: I think we need to discuss the implementation
> before
> > > > >> continuing this topic. In the sql client, we will maintain two
> > > parsers.
> > > > The
> > > > >> first parser(client parser) will only match the sql client
> commands.
> > > If
> > > > the
> > > > >> client parser can't parse the statement, we will leverage the
> power
> > of
> > > > the
> > > > >> table environment to execute. According to our blueprint,
> > > > >> TableEnvironment#executeSql is enough for the sql client.
> Therefore,
> > > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > > > >>
> > > > >> But if we need to introduce the `TableEnvironment.executeMultiSql`
> > in
> > > > the
> > > > >> future, I think it's OK to use the option `table.multi-sql-async`
> > > rather
> > > > >> than option `sql-client.job.detach`. But we think the name is not
> > > > suitable
> > > > >> because the name is confusing for others. When setting the option
> > > > false, we
> > > > >> just mean it will block the execution of the INSERT INTO
> statement,
> > > not
> > > > DDL
> > > > >> or others(other sql statements are always executed synchronously).
> > So
> > > > how
> > > > >> about `table.job.async`? It only works for the sql-client and the
> > > > >> executeMultiSql. If we set this value false, the table environment
> > > will
> > > > >> return the result until the job finishes.
> > > > >>
> > > > >>
> > > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and
> LIST
> > > JAR
> > > > >> because HIVE also uses these commands to add the jar into the
> > > classpath
> > > > or
> > > > >> delete the jar. If we use  such commands, it can reduce our work
> for
> > > > hive
> > > > >> compatibility.
> > > > >>
> > > > >> For SHOW JAR, I think the main concern is the jars are not
> > maintained
> > > by
> > > > >> the Catalog. If we really needs to keep consistent with SQL
> grammar,
> > > > maybe
> > > > >> we should use
> > > > >>
> > > > >> `ADD JAR` -> `CREATE JAR`,
> > > > >> `DELETE JAR` -> `DROP JAR`,
> > > > >> `LIST JAR` -> `SHOW JAR`.
> > > > >>
> > > > >> *Regarding #5*: I agree with you that we'd better keep consistent.
> > > > >>
> > > > >> *Regarding #6*: Yes. Most of the commands should belong to the
> table
> > > > >> environment. In the Summary section, I use the <NOTE> tag to
> > identify
> > > > which
> > > > >> commands should belong to the sql client and which commands should
> > > > belong
> > > > >> to the table environment. I also add a new section about
> > > implementation
> > > > >> details in the FLIP.
> > > > >>
> > > > >> Best,
> > > > >> Shengkai
> > > > >>
> > > > >> Timo Walther <[hidden email]> 于2021年2月2日周二 下午6:43写道:
> > > > >>
> > > > >>> Thanks for this great proposal Shengkai. This will give the SQL
> > > Client
> > > > a
> > > > >>> very good update and make it production ready.
> > > > >>>
> > > > >>> Here is some feedback from my side:
> > > > >>>
> > > > >>> 1) SQL client specific options
> > > > >>>
> > > > >>> I don't think that `sql-client.planner` and
> > > `sql-client.execution.mode`
> > > > >>> are SQL Client specific. Similar to `StreamExecutionEnvironment`
> > and
> > > > >>> `ExecutionConfig#configure` that have been added recently, we
> > should
> > > > >>> offer a possibility for TableEnvironment. How about we offer
> > > > >>> `TableEnvironment.create(ReadableConfig)` and add a
> `table.planner`
> > > and
> > > > >>> `table.execution-mode` to
> > > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > > > >>>
> > > > >>> 2) Execution file
> > > > >>>
> > > > >>> Did you have a look at the Appendix of FLIP-84 [1] including the
> > > > mailing
> > > > >>> list thread at that time? Could you further elaborate how the
> > > > >>> multi-statement execution should work for a unified
> batch/streaming
> > > > >>> story? According to our past discussions, each line in an
> execution
> > > > file
> > > > >>> should be executed blocking which means a streaming query needs a
> > > > >>> statement set to execute multiple INSERT INTO statement, correct?
> > We
> > > > >>> should also offer this functionality in
> > > > >>> `TableEnvironment.executeMultiSql()`. Whether
> > `sql-client.job.detach`
> > > > is
> > > > >>> SQL Client specific needs to be determined, it could also be a
> > > general
> > > > >>> `table.multi-sql-async` option?
> > > > >>>
> > > > >>> 3) DELETE JAR
> > > > >>>
> > > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds like
> > one
> > > > is
> > > > >>> actively deleting the JAR in the corresponding path.
> > > > >>>
> > > > >>> 4) LIST JAR
> > > > >>>
> > > > >>> This should be `SHOW JARS` according to other SQL commands such
> as
> > > > `SHOW
> > > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > > > >>>
> > > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > > > >>>
> > > > >>> We should keep the details in sync with
> > > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> > about
> > > > >>> differently named ExplainDetails. I would vote for
> `ESTIMATED_COST`
> > > > >>> instead of `COST`. I'm sure the original author had a reason why
> to
> > > > call
> > > > >>> it that way.
> > > > >>>
> > > > >>> 6) Implementation details
> > > > >>>
> > > > >>> It would be nice to understand how we plan to implement the given
> > > > >>> features. Most of the commands and config options should go into
> > > > >>> TableEnvironment and SqlParser directly, correct? This way users
> > > have a
> > > > >>> unified way of using Flink SQL. TableEnvironment would provide a
> > > > similar
> > > > >>> user experience in notebooks or interactive programs than the SQL
> > > > Client.
> > > > >>>
> > > > >>> [1]
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > > > >>> [2]
> > > > >>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > > > >>>
> > > > >>> Regards,
> > > > >>> Timo
> > > > >>>
> > > > >>>
> > > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> > > `UNSET`.
> > > > >>>>
> > > > >>>> Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
> > > > >>>>
> > > > >>>>> Hi, Jingsong.
> > > > >>>>>
> > > > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > > > >>>>>
> > > > >>>>> 1. We don't need to introduce another command `UNSET`. `RESET`
> is
> > > > >>>>> supported in the current sql client now. Our proposal just
> > extends
> > > > its
> > > > >>>>> grammar and allow users to reset the specified keys.
> > > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the default
> > > > >>> value[1].
> > > > >>>>> I think it is more friendly for batch users.
> > > > >>>>>
> > > > >>>>> Best,
> > > > >>>>> Shengkai
> > > > >>>>>
> > > > >>>>> [1]
> > > > >>>
> > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > > > >>>>>
> > > > >>>>> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
> > > > >>>>>
> > > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1
> for
> > > > >>>>>> improving it.
> > > > >>>>>>
> > > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > > > >>>>>>
> > > > >>>>>> Best,
> > > > >>>>>> Jingsong
> > > > >>>>>>
> > > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <[hidden email]>
> > > > wrote:
> > > > >>>>>>
> > > > >>>>>>> Thanks Shengkai for the update! The proposed changes look
> good
> > to
> > > > >> me.
> > > > >>>>>>>
> > > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> > [hidden email]
> > > >
> > > > >>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Hi, Rui.
> > > > >>>>>>>> You are right. I have already modified the FLIP.
> > > > >>>>>>>>
> > > > >>>>>>>> The main changes:
> > > > >>>>>>>>
> > > > >>>>>>>> # -f parameter has no restriction about the statement type.
> > > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
> > queries
> > > to
> > > > >>>>>>> debug
> > > > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > > > comparing
> > > > >>> to
> > > > >>>>>>>> writing INSERT INTO statements.
> > > > >>>>>>>>
> > > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > > > >>>>>>>> Users prefer to execute jobs one by one in the batch mode.
> > Users
> > > > >> can
> > > > >>>>>>> set
> > > > >>>>>>>> this option false and the client will process the next job
> > until
> > > > >> the
> > > > >>>>>>>> current job finishes. The default value of this option is
> > false,
> > > > >>> which
> > > > >>>>>>>> means the client will execute the next job when the current
> > job
> > > is
> > > > >>>>>>>> submitted.
> > > > >>>>>>>>
> > > > >>>>>>>> Best,
> > > > >>>>>>>> Shengkai
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Shengkai,
> > > > >>>>>>>>>
> > > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > > > >> different
> > > > >>>>>>>>> implications, and we should clarify the behavior. For
> > example,
> > > if
> > > > >>> the
> > > > >>>>>>>>> client just submits the job and exits, what happens if the
> > file
> > > > >>>>>>> contains
> > > > >>>>>>>>> two INSERT statements? I don't think we should treat them
> as
> > a
> > > > >>>>>>> statement
> > > > >>>>>>>>> set, because users should explicitly write BEGIN STATEMENT
> > SET
> > > in
> > > > >>> that
> > > > >>>>>>>>> case. And the client shouldn't asynchronously submit the
> two
> > > > jobs,
> > > > >>>>>>> because
> > > > >>>>>>>>> the 2nd may depend on the 1st, right?
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > > [hidden email]
> > > > >
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> Hi Rui,
> > > > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the
> set
> > > > >>>>>>> command. In
> > > > >>>>>>>>>> the implementation, it will just put the key-value into
> the
> > > > >>>>>>>>>> `Configuration`, which will be used to generate the table
> > > > config.
> > > > >>> If
> > > > >>>>>>> hive
> > > > >>>>>>>>>> supports to read the setting from the table config, users
> > are
> > > > >> able
> > > > >>>>>>> to set
> > > > >>>>>>>>>> the hive-related settings.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the job
> > and
> > > > >>> exit.
> > > > >>>>>>> If
> > > > >>>>>>>>>> the queries never end, users have to cancel the job by
> > > > >> themselves,
> > > > >>>>>>> which is
> > > > >>>>>>>>>> not reliable(people may forget their jobs). In most case,
> > > > queries
> > > > >>>>>>> are used
> > > > >>>>>>>>>> to analyze the data. Users should use queries in the
> > > interactive
> > > > >>>>>>> mode.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Shengkai
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I think
> it
> > > > >>> covers a
> > > > >>>>>>>>>>> lot of useful features which will dramatically improve
> the
> > > > >>>>>>> usability of our
> > > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
> > configurations
> > > > >> via
> > > > >>>>>>> the
> > > > >>>>>>>>>>> SET command? A connector may have its own configurations
> > and
> > > we
> > > > >>>>>>> don't have
> > > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> > > Client.
> > > > >> For
> > > > >>>>>>> example,
> > > > >>>>>>>>>>> users may want to be able to change hive conf when using
> > hive
> > > > >>>>>>> connector [1].
> > > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL files
> > > > >> specified
> > > > >>>>>>> with
> > > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but
> allows
> > > > >>> queries
> > > > >>>>>>> in the
> > > > >>>>>>>>>>> file. And a common use case is to run some query and
> > redirect
> > > > >> the
> > > > >>>>>>> results
> > > > >>>>>>>>>>> to a file. So I think maybe flink users would like to do
> > the
> > > > >> same,
> > > > >>>>>>>>>>> especially in batch scenarios.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > > > >>>>>>> [hidden email]>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> Hi Shengkai,
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Glad to see this improvement. And I have some additional
> > > > >>>>>>> suggestions:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch sql.
> > > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> > collect
> > > > >> the
> > > > >>>>>>>>>>>> results
> > > > >>>>>>>>>>>> locally all at once using accumulators at present,
> > > > >>>>>>>>>>>>         which may have memory issues in JM or Local for
> > the
> > > > big
> > > > >>> query
> > > > >>>>>>>>>>>> result.
> > > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > > > >>>>>>>>>>>>         We may change to use SelectTableSink, which is
> > based
> > > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is in
> > > > >> FLIP-91.
> > > > >>>>>>> Seems
> > > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > > > >>>>>>>>>>>>         Provide a long running service out of the box to
> > > > >>> facilitate
> > > > >>>>>>> the
> > > > >>>>>>>>>>>> sql
> > > > >>>>>>>>>>>> submission is necessary.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> What do you think of these?
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> [1]
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四
> 下午8:54写道:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Hi devs,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Jark and I want to start a discussion about
> FLIP-163:SQL
> > > > >> Client
> > > > >>>>>>>>>>>>> Improvements.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Many users have complained about the problems of the
> sql
> > > > >> client.
> > > > >>>>>>> For
> > > > >>>>>>>>>>>>> example, users can not register the table proposed by
> > > > FLIP-95.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> The main changes in this FLIP:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to
> initialize
> > > the
> > > > >>>>>>> table
> > > > >>>>>>>>>>>>> environment and deprecated YAML file;
> > > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> > parameter;
> > > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > > > >>>>>>>>>>>>> - support statement set syntax;
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> For more detailed changes, please refer to FLIP-163[1].
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Look forward to your feedback.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Best,
> > > > >>>>>>>>>>>>> Shengkai
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> [1]
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> --
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> *With kind regards
> > > > >>>>>>>>>>>>
> > ------------------------------------------------------------
> > > > >>>>>>>>>>>> Sebastian Liu 刘洋
> > > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > > Science
> > > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > > > >>>>>>>>>>>> E-mail: [hidden email] <[hidden email]>
> > > > >>>>>>>>>>>> QQ: 3239559*
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> --
> > > > >>>>>>>>>>> Best regards!
> > > > >>>>>>>>>>> Rui Li
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> --
> > > > >>>>>>>>> Best regards!
> > > > >>>>>>>>> Rui Li
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> --
> > > > >>>>>>> Best regards!
> > > > >>>>>>> Rui Li
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> --
> > > > >>>>>> Best, Jingsong Lee
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> > > --
> > > Best regards!
> > > Rui Li
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS]FLIP-163: SQL Client Improvements

Jark Wu-2
Hi Ingo,

Since we have supported the WITH syntax and SET command since v1.9 [1][2],
and
we have never received such complaints, I think it's fine for such
differences.

Besides, the TBLPROPERTIES clause of CREATE TABLE in Hive also requires
string literal keys[3],
and the SET <key>=<value> doesn't allow quoted keys [4].

Best,
Jark

[1]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/connect.html
[2]:
https://ci.apache.org/projects/flink/flink-docs-release-1.9/dev/table/sqlClient.html#running-sql-queries
[3]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
[4]: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
(search "set mapred.reduce.tasks=32")

On Thu, 4 Feb 2021 at 17:09, Ingo Bürk <[hidden email]> wrote:

> Hi,
>
> regarding the (un-)quoted question, compatibility is of course an important
> argument, but in terms of consistency I'd find it a bit surprising that
> WITH handles it differently than SET, and I wonder if that could cause
> friction for developers when writing their SQL.
>
>
> Regards
> Ingo
>
> On Thu, Feb 4, 2021 at 9:38 AM Jark Wu <[hidden email]> wrote:
>
> > Hi all,
> >
> > Regarding "One Parser", I think it's not possible for now because Calcite
> > parser can't parse
> > special characters (e.g. "-") unless quoting them as string literals.
> > That's why the WITH option
> > key are string literals not identifiers.
> >
> > SET table.exec.mini-batch.enabled = true and ADD JAR
> > /local/my-home/test.jar
> > have the same
> > problems. That's why we propose two parser, one splits lines into
> multiple
> > statements and match special
> > command through regex which is light-weight, and delegate other
> statements
> > to the other parser which is Calcite parser.
> >
> > Note: we should stick on the unquoted SET table.exec.mini-batch.enabled =
> > true syntax,
> > both for backward-compatibility and easy-to-use, and all the other
> systems
> > don't have quotes on the key.
> >
> >
> > Regarding "table.planner" vs "sql-client.planner",
> > if we want to use "table.planner", I think we should explain clearly
> what's
> > the scope it can be used in documentation.
> > Otherwise, there will be users complaining why the planner doesn't change
> > when setting the configuration on TableEnv.
> > Would be better throwing an exception to indicate users it's now allowed
> to
> > change planner after TableEnv is initialized.
> > However, it seems not easy to implement.
> >
> > Best,
> > Jark
> >
> > On Thu, 4 Feb 2021 at 15:49, godfrey he <[hidden email]> wrote:
> >
> > > Hi everyone,
> > >
> > > Regarding "table.planner" and "table.execution-mode"
> > > If we define that those two options are just used to initialize the
> > > TableEnvironment, +1 for introducing table options instead of
> sql-client
> > > options.
> > >
> > > Regarding "the sql client, we will maintain two parsers", I want to
> give
> > > more inputs:
> > > We want to introduce sql-gateway into the Flink project (see FLIP-24 &
> > > FLIP-91 for more info [1] [2]). In the "gateway" mode, the CLI client
> and
> > > the gateway service will communicate through Rest API. The " ADD JAR
> > > /local/path/jar " will be executed in the CLI client machine. So when
> we
> > > submit a sql file which contains multiple statements, the CLI client
> > needs
> > > to pick out the "ADD JAR" line, and also statements need to be
> submitted
> > or
> > > executed one by one to make sure the result is correct. The sql file
> may
> > be
> > > look like:
> > >
> > > SET xxx=yyy;
> > > create table my_table ...;
> > > create table my_sink ...;
> > > ADD JAR /local/path/jar1;
> > > create function my_udf as com....MyUdf;
> > > insert into my_sink select ..., my_udf(xx) from ...;
> > > REMOVE JAR /local/path/jar1;
> > > drop function my_udf;
> > > ADD JAR /local/path/jar2;
> > > create function my_udf as com....MyUdf2;
> > > insert into my_sink select ..., my_udf(xx) from ...;
> > >
> > > The lines need to be splitted into multiple statements first in the CLI
> > > client, there are two approaches:
> > > 1. The CLI client depends on the sql-parser: the sql-parser splits the
> > > lines and tells which lines are "ADD JAR".
> > > pro: there is only one parser
> > > cons: It's a little heavy that the CLI client depends on the
> sql-parser,
> > > because the CLI client is just a simple tool which receives the user
> > > commands and displays the result. The non "ADD JAR" command will be
> > parsed
> > > twice.
> > >
> > > 2. The CLI client splits the lines into multiple statements and finds
> the
> > > ADD JAR command through regex matching.
> > > pro: The CLI client is very light-weight.
> > > cons: there are two parsers.
> > >
> > > (personally, I prefer the second option)
> > >
> > > Regarding "SHOW or LIST JARS", I think we can support them both.
> > > For default dialect, we support SHOW JARS, but if we switch to hive
> > > dialect, LIST JARS is also supported.
> > >
> > >
> > > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client
> > > [2]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > >
> > > Best,
> > > Godfrey
> > >
> > > Rui Li <[hidden email]> 于2021年2月4日周四 上午10:40写道:
> > >
> > > > Hi guys,
> > > >
> > > > Regarding #3 and #4, I agree SHOW JARS is more consistent with other
> > > > commands than LIST JARS. I don't have a strong opinion about REMOVE
> vs
> > > > DELETE though.
> > > >
> > > > While flink doesn't need to follow hive syntax, as far as I know,
> most
> > > > users who are requesting these features are previously hive users.
> So I
> > > > wonder whether we can support both LIST/SHOW JARS and REMOVE/DELETE
> > JARS
> > > > as synonyms? It's just like lots of systems accept both EXIT and QUIT
> > as
> > > > the command to terminate the program. So if that's not hard to
> achieve,
> > > and
> > > > will make users happier, I don't see a reason why we must choose one
> > over
> > > > the other.
> > > >
> > > > On Wed, Feb 3, 2021 at 10:33 PM Timo Walther <[hidden email]>
> > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > some feedback regarding the open questions. Maybe we can discuss
> the
> > > > > `TableEnvironment.executeMultiSql` story offline to determine how
> we
> > > > > proceed with this in the near future.
> > > > >
> > > > > 1) "whether the table environment has the ability to update itself"
> > > > >
> > > > > Maybe there was some misunderstanding. I don't think that we should
> > > > > support `tEnv.getConfig.getConfiguration.setString("table.planner",
> > > > > "old")`. Instead I'm proposing to support
> > > > > `TableEnvironment.create(Configuration)` where planner and
> execution
> > > > > mode are read immediately and a subsequent changes to these options
> > > will
> > > > > have no effect. We are doing it similar in `new
> > > > > StreamExecutionEnvironment(Configuration)`. These two
> ConfigOption's
> > > > > must not be SQL Client specific but can be part of the core table
> > code
> > > > > base. Many users would like to get a 100% preconfigured environment
> > > from
> > > > > just Configuration. And this is not possible right now. We can
> solve
> > > > > both use cases in one change.
> > > > >
> > > > > 2) "the sql client, we will maintain two parsers"
> > > > >
> > > > > I remember we had some discussion about this and decided that we
> > would
> > > > > like to maintain only one parser. In the end it is "One Flink SQL"
> > > where
> > > > > commands influence each other also with respect to keywords. It
> > should
> > > > > be fine to include the SQL Client commands in the Flink parser. Of
> > > > > cource the table environment would not be able to handle the
> > > `Operation`
> > > > > instance that would be the result but we can introduce hooks to
> > handle
> > > > > those `Operation`s. Or we introduce parser extensions.
> > > > >
> > > > > Can we skip `table.job.async` in the first version? We should
> further
> > > > > discuss whether we introduce a special SQL clause for wrapping
> async
> > > > > behavior or if we use a config option? Esp. for streaming queries
> we
> > > > > need to be careful and should force users to either "one INSERT
> INTO"
> > > or
> > > > > "one STATEMENT SET".
> > > > >
> > > > > 3) 4) "HIVE also uses these commands"
> > > > >
> > > > > In general, Hive is not a good reference. Aligning the commands
> more
> > > > > with the remaining commands should be our goal. We just had a
> MODULE
> > > > > discussion where we selected SHOW instead of LIST. But it is true
> > that
> > > > > JARs are not part of the catalog which is why I would not use
> > > > > CREATE/DROP. ADD/REMOVE are commonly siblings in the English
> > language.
> > > > > Take a look at the Java collection API as another example.
> > > > >
> > > > > 6) "Most of the commands should belong to the table environment"
> > > > >
> > > > > Thanks for updating the FLIP this makes things easier to
> understand.
> > It
> > > > > is good to see that most commends will be available in
> > > TableEnvironment.
> > > > > However, I would also support SET and RESET for consistency. Again,
> > > from
> > > > > an architectural point of view, if we would allow some kind of
> > > > > `Operation` hook in table environment, we could check for SQL
> Client
> > > > > specific options and forward to regular
> > `TableConfig.getConfiguration`
> > > > > otherwise. What do you think?
> > > > >
> > > > > Regards,
> > > > > Timo
> > > > >
> > > > >
> > > > > On 03.02.21 08:58, Jark Wu wrote:
> > > > > > Hi Timo,
> > > > > >
> > > > > > I will respond some of the questions:
> > > > > >
> > > > > > 1) SQL client specific options
> > > > > >
> > > > > > Whether it starts with "table" or "sql-client" depends on where
> the
> > > > > > configuration takes effect.
> > > > > > If it is a table configuration, we should make clear what's the
> > > > behavior
> > > > > > when users change
> > > > > > the configuration in the lifecycle of TableEnvironment.
> > > > > >
> > > > > > I agree with Shengkai `sql-client.planner` and
> > > > > `sql-client.execution.mode`
> > > > > > are something special
> > > > > > that can't be changed after TableEnvironment has been
> initialized.
> > > You
> > > > > can
> > > > > > see
> > > > > > `StreamExecutionEnvironment` provides `configure()`  method to
> > > override
> > > > > > configuration after
> > > > > > StreamExecutionEnvironment has been initialized.
> > > > > >
> > > > > > Therefore, I think it would be better to still use
> > > > `sql-client.planner`
> > > > > > and `sql-client.execution.mode`.
> > > > > >
> > > > > > 2) Execution file
> > > > > >
> > > > > >>From my point of view, there is a big difference between
> > > > > > `sql-client.job.detach` and
> > > > > > `TableEnvironment.executeMultiSql()` that `sql-client.job.detach`
> > > will
> > > > > > affect every single DML statement
> > > > > > in the terminal, not only the statements in SQL files. I think
> the
> > > > single
> > > > > > DML statement in the interactive
> > > > > > terminal is something like tEnv#executeSql() instead of
> > > > > > tEnv#executeMultiSql.
> > > > > > So I don't like the "multi" and "sql" keyword in
> > > > `table.multi-sql-async`.
> > > > > > I just find that runtime provides a configuration called
> > > > > > "execution.attached" [1] which is false by default
> > > > > > which specifies if the pipeline is submitted in attached or
> > detached
> > > > > mode.
> > > > > > It provides exactly the same
> > > > > > functionality of `sql-client.job.detach`. What do you think about
> > > using
> > > > > > this option?
> > > > > >
> > > > > > If we also want to support this config in TableEnvironment, I
> think
> > > it
> > > > > > should also affect the DML execution
> > > > > >   of `tEnv#executeSql()`, not only DMLs in
> > `tEnv#executeMultiSql()`.
> > > > > > Therefore, the behavior may look like this:
> > > > > >
> > > > > > val tableResult = tEnv.executeSql("INSERT INTO ...")  ==> async
> by
> > > > > default
> > > > > > tableResult.await()   ==> manually block until finish
> > > > > >
> tEnv.getConfig().getConfiguration().setString("execution.attached",
> > > > > "true")
> > > > > > val tableResult2 = tEnv.executeSql("INSERT INTO ...")  ==> sync,
> > > don't
> > > > > need
> > > > > > to wait on the TableResult
> > > > > > tEnv.executeMultiSql(
> > > > > > """
> > > > > > CREATE TABLE ....  ==> always sync
> > > > > > INSERT INTO ...  => sync, because we set configuration above
> > > > > > SET execution.attached = false;
> > > > > > INSERT INTO ...  => async
> > > > > > """)
> > > > > >
> > > > > > On the other hand, I think `sql-client.job.detach`
> > > > > > and `TableEnvironment.executeMultiSql()` should be two separate
> > > topics,
> > > > > > as Shengkai mentioned above, SQL CLI only depends on
> > > > > > `TableEnvironment#executeSql()` to support multi-line statements.
> > > > > > I'm fine with making `executeMultiSql()` clear but don't want it
> to
> > > > block
> > > > > > this FLIP, maybe we can discuss this in another thread.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > > [1]:
> > > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/deployment/config.html#execution-attached
> > > > > >
> > > > > > On Wed, 3 Feb 2021 at 15:33, Shengkai Fang <[hidden email]>
> > > wrote:
> > > > > >
> > > > > >> Hi, Timo.
> > > > > >> Thanks for your detailed feedback. I have some thoughts about
> your
> > > > > >> feedback.
> > > > > >>
> > > > > >> *Regarding #1*: I think the main problem is whether the table
> > > > > environment
> > > > > >> has the ability to update itself. Let's take a simple program as
> > an
> > > > > >> example.
> > > > > >>
> > > > > >>
> > > > > >> ```
> > > > > >> TableEnvironment tEnv = TableEnvironment.create(...);
> > > > > >>
> > > > > >> tEnv.getConfig.getConfiguration.setString("table.planner",
> "old");
> > > > > >>
> > > > > >>
> > > > > >> tEnv.executeSql("...");
> > > > > >>
> > > > > >> ```
> > > > > >>
> > > > > >> If we regard this option as a table option, users don't have to
> > > create
> > > > > >> another table environment manually. In that case, tEnv needs to
> > > check
> > > > > >> whether the current mode and planner are the same as before when
> > > > > executeSql
> > > > > >> or explainSql. I don't think it's easy work for the table
> > > environment,
> > > > > >> especially if users have a StreamExecutionEnvironment but set
> old
> > > > > planner
> > > > > >> and batch mode. But when we make this option as a sql client
> > option,
> > > > > users
> > > > > >> only use the SET command to change the setting. We can rebuild a
> > new
> > > > > table
> > > > > >> environment when set successes.
> > > > > >>
> > > > > >>
> > > > > >> *Regarding #2*: I think we need to discuss the implementation
> > before
> > > > > >> continuing this topic. In the sql client, we will maintain two
> > > > parsers.
> > > > > The
> > > > > >> first parser(client parser) will only match the sql client
> > commands.
> > > > If
> > > > > the
> > > > > >> client parser can't parse the statement, we will leverage the
> > power
> > > of
> > > > > the
> > > > > >> table environment to execute. According to our blueprint,
> > > > > >> TableEnvironment#executeSql is enough for the sql client.
> > Therefore,
> > > > > >> TableEnvironment#executeMultiSql is out-of-scope for this FLIP.
> > > > > >>
> > > > > >> But if we need to introduce the
> `TableEnvironment.executeMultiSql`
> > > in
> > > > > the
> > > > > >> future, I think it's OK to use the option
> `table.multi-sql-async`
> > > > rather
> > > > > >> than option `sql-client.job.detach`. But we think the name is
> not
> > > > > suitable
> > > > > >> because the name is confusing for others. When setting the
> option
> > > > > false, we
> > > > > >> just mean it will block the execution of the INSERT INTO
> > statement,
> > > > not
> > > > > DDL
> > > > > >> or others(other sql statements are always executed
> synchronously).
> > > So
> > > > > how
> > > > > >> about `table.job.async`? It only works for the sql-client and
> the
> > > > > >> executeMultiSql. If we set this value false, the table
> environment
> > > > will
> > > > > >> return the result until the job finishes.
> > > > > >>
> > > > > >>
> > > > > >> *Regarding #3, #4*: I still think we should use DELETE JAR and
> > LIST
> > > > JAR
> > > > > >> because HIVE also uses these commands to add the jar into the
> > > > classpath
> > > > > or
> > > > > >> delete the jar. If we use  such commands, it can reduce our work
> > for
> > > > > hive
> > > > > >> compatibility.
> > > > > >>
> > > > > >> For SHOW JAR, I think the main concern is the jars are not
> > > maintained
> > > > by
> > > > > >> the Catalog. If we really needs to keep consistent with SQL
> > grammar,
> > > > > maybe
> > > > > >> we should use
> > > > > >>
> > > > > >> `ADD JAR` -> `CREATE JAR`,
> > > > > >> `DELETE JAR` -> `DROP JAR`,
> > > > > >> `LIST JAR` -> `SHOW JAR`.
> > > > > >>
> > > > > >> *Regarding #5*: I agree with you that we'd better keep
> consistent.
> > > > > >>
> > > > > >> *Regarding #6*: Yes. Most of the commands should belong to the
> > table
> > > > > >> environment. In the Summary section, I use the <NOTE> tag to
> > > identify
> > > > > which
> > > > > >> commands should belong to the sql client and which commands
> should
> > > > > belong
> > > > > >> to the table environment. I also add a new section about
> > > > implementation
> > > > > >> details in the FLIP.
> > > > > >>
> > > > > >> Best,
> > > > > >> Shengkai
> > > > > >>
> > > > > >> Timo Walther <[hidden email]> 于2021年2月2日周二 下午6:43写道:
> > > > > >>
> > > > > >>> Thanks for this great proposal Shengkai. This will give the SQL
> > > > Client
> > > > > a
> > > > > >>> very good update and make it production ready.
> > > > > >>>
> > > > > >>> Here is some feedback from my side:
> > > > > >>>
> > > > > >>> 1) SQL client specific options
> > > > > >>>
> > > > > >>> I don't think that `sql-client.planner` and
> > > > `sql-client.execution.mode`
> > > > > >>> are SQL Client specific. Similar to
> `StreamExecutionEnvironment`
> > > and
> > > > > >>> `ExecutionConfig#configure` that have been added recently, we
> > > should
> > > > > >>> offer a possibility for TableEnvironment. How about we offer
> > > > > >>> `TableEnvironment.create(ReadableConfig)` and add a
> > `table.planner`
> > > > and
> > > > > >>> `table.execution-mode` to
> > > > > >>> `org.apache.flink.table.api.config.TableConfigOptions`?
> > > > > >>>
> > > > > >>> 2) Execution file
> > > > > >>>
> > > > > >>> Did you have a look at the Appendix of FLIP-84 [1] including
> the
> > > > > mailing
> > > > > >>> list thread at that time? Could you further elaborate how the
> > > > > >>> multi-statement execution should work for a unified
> > batch/streaming
> > > > > >>> story? According to our past discussions, each line in an
> > execution
> > > > > file
> > > > > >>> should be executed blocking which means a streaming query
> needs a
> > > > > >>> statement set to execute multiple INSERT INTO statement,
> correct?
> > > We
> > > > > >>> should also offer this functionality in
> > > > > >>> `TableEnvironment.executeMultiSql()`. Whether
> > > `sql-client.job.detach`
> > > > > is
> > > > > >>> SQL Client specific needs to be determined, it could also be a
> > > > general
> > > > > >>> `table.multi-sql-async` option?
> > > > > >>>
> > > > > >>> 3) DELETE JAR
> > > > > >>>
> > > > > >>> Shouldn't the opposite of "ADD" be "REMOVE"? "DELETE" sounds
> like
> > > one
> > > > > is
> > > > > >>> actively deleting the JAR in the corresponding path.
> > > > > >>>
> > > > > >>> 4) LIST JAR
> > > > > >>>
> > > > > >>> This should be `SHOW JARS` according to other SQL commands such
> > as
> > > > > `SHOW
> > > > > >>> CATALOGS`, `SHOW TABLES`, etc. [2].
> > > > > >>>
> > > > > >>> 5) EXPLAIN [ExplainDetail[, ExplainDetail]*]
> > > > > >>>
> > > > > >>> We should keep the details in sync with
> > > > > >>> `org.apache.flink.table.api.ExplainDetail` and avoid confusion
> > > about
> > > > > >>> differently named ExplainDetails. I would vote for
> > `ESTIMATED_COST`
> > > > > >>> instead of `COST`. I'm sure the original author had a reason
> why
> > to
> > > > > call
> > > > > >>> it that way.
> > > > > >>>
> > > > > >>> 6) Implementation details
> > > > > >>>
> > > > > >>> It would be nice to understand how we plan to implement the
> given
> > > > > >>> features. Most of the commands and config options should go
> into
> > > > > >>> TableEnvironment and SqlParser directly, correct? This way
> users
> > > > have a
> > > > > >>> unified way of using Flink SQL. TableEnvironment would provide
> a
> > > > > similar
> > > > > >>> user experience in notebooks or interactive programs than the
> SQL
> > > > > Client.
> > > > > >>>
> > > > > >>> [1]
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > > > > >>> [2]
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/show.html
> > > > > >>>
> > > > > >>> Regards,
> > > > > >>> Timo
> > > > > >>>
> > > > > >>>
> > > > > >>> On 02.02.21 10:13, Shengkai Fang wrote:
> > > > > >>>> Sorry for the typo. I mean `RESET` is much better rather than
> > > > `UNSET`.
> > > > > >>>>
> > > > > >>>> Shengkai Fang <[hidden email]> 于2021年2月2日周二 下午4:44写道:
> > > > > >>>>
> > > > > >>>>> Hi, Jingsong.
> > > > > >>>>>
> > > > > >>>>> Thanks for your reply. I think `UNSET` is much better.
> > > > > >>>>>
> > > > > >>>>> 1. We don't need to introduce another command `UNSET`.
> `RESET`
> > is
> > > > > >>>>> supported in the current sql client now. Our proposal just
> > > extends
> > > > > its
> > > > > >>>>> grammar and allow users to reset the specified keys.
> > > > > >>>>> 2. Hive beeline also uses `RESET` to set the key to the
> default
> > > > > >>> value[1].
> > > > > >>>>> I think it is more friendly for batch users.
> > > > > >>>>>
> > > > > >>>>> Best,
> > > > > >>>>> Shengkai
> > > > > >>>>>
> > > > > >>>>> [1]
> > > > > >>>
> > > https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
> > > > > >>>>>
> > > > > >>>>> Jingsong Li <[hidden email]> 于2021年2月2日周二 下午1:56写道:
> > > > > >>>>>
> > > > > >>>>>> Thanks for the proposal, yes, sql-client is too outdated. +1
> > for
> > > > > >>>>>> improving it.
> > > > > >>>>>>
> > > > > >>>>>> About "SET"  and "RESET", Why not be "SET" and "UNSET"?
> > > > > >>>>>>
> > > > > >>>>>> Best,
> > > > > >>>>>> Jingsong
> > > > > >>>>>>
> > > > > >>>>>> On Mon, Feb 1, 2021 at 2:46 PM Rui Li <
> [hidden email]>
> > > > > wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Thanks Shengkai for the update! The proposed changes look
> > good
> > > to
> > > > > >> me.
> > > > > >>>>>>>
> > > > > >>>>>>> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang <
> > > [hidden email]
> > > > >
> > > > > >>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> Hi, Rui.
> > > > > >>>>>>>> You are right. I have already modified the FLIP.
> > > > > >>>>>>>>
> > > > > >>>>>>>> The main changes:
> > > > > >>>>>>>>
> > > > > >>>>>>>> # -f parameter has no restriction about the statement
> type.
> > > > > >>>>>>>> Sometimes, users use the pipe to redirect the result of
> > > queries
> > > > to
> > > > > >>>>>>> debug
> > > > > >>>>>>>> when submitting job by -f parameter. It's much convenient
> > > > > comparing
> > > > > >>> to
> > > > > >>>>>>>> writing INSERT INTO statements.
> > > > > >>>>>>>>
> > > > > >>>>>>>> # Add a new sql client option `sql-client.job.detach` .
> > > > > >>>>>>>> Users prefer to execute jobs one by one in the batch mode.
> > > Users
> > > > > >> can
> > > > > >>>>>>> set
> > > > > >>>>>>>> this option false and the client will process the next job
> > > until
> > > > > >> the
> > > > > >>>>>>>> current job finishes. The default value of this option is
> > > false,
> > > > > >>> which
> > > > > >>>>>>>> means the client will execute the next job when the
> current
> > > job
> > > > is
> > > > > >>>>>>>> submitted.
> > > > > >>>>>>>>
> > > > > >>>>>>>> Best,
> > > > > >>>>>>>> Shengkai
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午4:52写道:
> > > > > >>>>>>>>
> > > > > >>>>>>>>> Hi Shengkai,
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Regarding #2, maybe the -f options in flink and hive have
> > > > > >> different
> > > > > >>>>>>>>> implications, and we should clarify the behavior. For
> > > example,
> > > > if
> > > > > >>> the
> > > > > >>>>>>>>> client just submits the job and exits, what happens if
> the
> > > file
> > > > > >>>>>>> contains
> > > > > >>>>>>>>> two INSERT statements? I don't think we should treat them
> > as
> > > a
> > > > > >>>>>>> statement
> > > > > >>>>>>>>> set, because users should explicitly write BEGIN
> STATEMENT
> > > SET
> > > > in
> > > > > >>> that
> > > > > >>>>>>>>> case. And the client shouldn't asynchronously submit the
> > two
> > > > > jobs,
> > > > > >>>>>>> because
> > > > > >>>>>>>>> the 2nd may depend on the 1st, right?
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang <
> > > > [hidden email]
> > > > > >
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> Hi Rui,
> > > > > >>>>>>>>>> Thanks for your feedback. I agree with your suggestions.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> For the suggestion 1: Yes. we are plan to strengthen the
> > set
> > > > > >>>>>>> command. In
> > > > > >>>>>>>>>> the implementation, it will just put the key-value into
> > the
> > > > > >>>>>>>>>> `Configuration`, which will be used to generate the
> table
> > > > > config.
> > > > > >>> If
> > > > > >>>>>>> hive
> > > > > >>>>>>>>>> supports to read the setting from the table config,
> users
> > > are
> > > > > >> able
> > > > > >>>>>>> to set
> > > > > >>>>>>>>>> the hive-related settings.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> For the suggestion 2: The -f parameter will submit the
> job
> > > and
> > > > > >>> exit.
> > > > > >>>>>>> If
> > > > > >>>>>>>>>> the queries never end, users have to cancel the job by
> > > > > >> themselves,
> > > > > >>>>>>> which is
> > > > > >>>>>>>>>> not reliable(people may forget their jobs). In most
> case,
> > > > > queries
> > > > > >>>>>>> are used
> > > > > >>>>>>>>>> to analyze the data. Users should use queries in the
> > > > interactive
> > > > > >>>>>>> mode.
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Best,
> > > > > >>>>>>>>>> Shengkai
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Rui Li <[hidden email]> 于2021年1月29日周五 下午3:18写道:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> Thanks Shengkai for bringing up this discussion. I
> think
> > it
> > > > > >>> covers a
> > > > > >>>>>>>>>>> lot of useful features which will dramatically improve
> > the
> > > > > >>>>>>> usability of our
> > > > > >>>>>>>>>>> SQL Client. I have two questions regarding the FLIP.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> 1. Do you think we can let users set arbitrary
> > > configurations
> > > > > >> via
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>> SET command? A connector may have its own
> configurations
> > > and
> > > > we
> > > > > >>>>>>> don't have
> > > > > >>>>>>>>>>> a way to dynamically change such configurations in SQL
> > > > Client.
> > > > > >> For
> > > > > >>>>>>> example,
> > > > > >>>>>>>>>>> users may want to be able to change hive conf when
> using
> > > hive
> > > > > >>>>>>> connector [1].
> > > > > >>>>>>>>>>> 2. Any reason why we have to forbid queries in SQL
> files
> > > > > >> specified
> > > > > >>>>>>> with
> > > > > >>>>>>>>>>> the -f option? Hive supports a similar -f option but
> > allows
> > > > > >>> queries
> > > > > >>>>>>> in the
> > > > > >>>>>>>>>>> file. And a common use case is to run some query and
> > > redirect
> > > > > >> the
> > > > > >>>>>>> results
> > > > > >>>>>>>>>>> to a file. So I think maybe flink users would like to
> do
> > > the
> > > > > >> same,
> > > > > >>>>>>>>>>> especially in batch scenarios.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/FLINK-20590
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu <
> > > > > >>>>>>> [hidden email]>
> > > > > >>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> Hi Shengkai,
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Glad to see this improvement. And I have some
> additional
> > > > > >>>>>>> suggestions:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> #1. Unify the TableEnvironment in ExecutionContext to
> > > > > >>>>>>>>>>>> StreamTableEnvironment for both streaming and batch
> sql.
> > > > > >>>>>>>>>>>> #2. Improve the way of results retrieval: sql client
> > > collect
> > > > > >> the
> > > > > >>>>>>>>>>>> results
> > > > > >>>>>>>>>>>> locally all at once using accumulators at present,
> > > > > >>>>>>>>>>>>         which may have memory issues in JM or Local
> for
> > > the
> > > > > big
> > > > > >>> query
> > > > > >>>>>>>>>>>> result.
> > > > > >>>>>>>>>>>> Accumulator is only suitable for testing purpose.
> > > > > >>>>>>>>>>>>         We may change to use SelectTableSink, which is
> > > based
> > > > > >>>>>>>>>>>> on CollectSinkOperatorCoordinator.
> > > > > >>>>>>>>>>>> #3. Do we need to consider Flink SQL gateway which is
> in
> > > > > >> FLIP-91.
> > > > > >>>>>>> Seems
> > > > > >>>>>>>>>>>> that this FLIP has not moved forward for a long time.
> > > > > >>>>>>>>>>>>         Provide a long running service out of the box
> to
> > > > > >>> facilitate
> > > > > >>>>>>> the
> > > > > >>>>>>>>>>>> sql
> > > > > >>>>>>>>>>>> submission is necessary.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> What do you think of these?
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> [1]
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Shengkai Fang <[hidden email]> 于2021年1月28日周四
> > 下午8:54写道:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Hi devs,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Jark and I want to start a discussion about
> > FLIP-163:SQL
> > > > > >> Client
> > > > > >>>>>>>>>>>>> Improvements.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Many users have complained about the problems of the
> > sql
> > > > > >> client.
> > > > > >>>>>>> For
> > > > > >>>>>>>>>>>>> example, users can not register the table proposed by
> > > > > FLIP-95.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> The main changes in this FLIP:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> - use -i parameter to specify the sql file to
> > initialize
> > > > the
> > > > > >>>>>>> table
> > > > > >>>>>>>>>>>>> environment and deprecated YAML file;
> > > > > >>>>>>>>>>>>> - add -f to submit sql file and deprecated '-u'
> > > parameter;
> > > > > >>>>>>>>>>>>> - add more interactive commands, e.g ADD JAR;
> > > > > >>>>>>>>>>>>> - support statement set syntax;
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> For more detailed changes, please refer to
> FLIP-163[1].
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Look forward to your feedback.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Best,
> > > > > >>>>>>>>>>>>> Shengkai
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> [1]
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-163%3A+SQL+Client+Improvements
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> *With kind regards
> > > > > >>>>>>>>>>>>
> > > ------------------------------------------------------------
> > > > > >>>>>>>>>>>> Sebastian Liu 刘洋
> > > > > >>>>>>>>>>>> Institute of Computing Technology, Chinese Academy of
> > > > Science
> > > > > >>>>>>>>>>>> Mobile\WeChat: +86—15201613655
> > > > > >>>>>>>>>>>> E-mail: [hidden email] <[hidden email]>
> > > > > >>>>>>>>>>>> QQ: 3239559*
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> --
> > > > > >>>>>>>>>>> Best regards!
> > > > > >>>>>>>>>>> Rui Li
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> --
> > > > > >>>>>>>>> Best regards!
> > > > > >>>>>>>>> Rui Li
> > > > > >>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> --
> > > > > >>>>>>> Best regards!
> > > > > >>>>>>> Rui Li
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> --
> > > > > >>>>>> Best, Jingsong Lee
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Best regards!
> > > > Rui Li
> > > >
> > >
> >
>
123