[DISCUSS] SQL Syntax for Table API StatementSet

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] SQL Syntax for Table API StatementSet

Fabian Hueske-2
Hi everyone,

FLIP-84 [1] added the concept of a "statement set" to group multiple INSERT
INTO statements (SQL or Table API) together. The statements in a statement
set are jointly optimized and executed as a single Flink job.

I would like to start a discussion about a SQL syntax to group multiple
INSERT INTO statements in a statement set. The use case would be to expose
the statement set feature to a solely text based client for Flink SQL such
as Flink's SQL CLI [1].

During the discussion of FLIP-84, we had briefly talked about such a syntax
[3].

START STATEMENT SET;
INSERT INTO ... SELECT ...;
INSERT INTO ... SELECT ...;
...
END STATEMENT SET;

We didn't follow up on this proposal, to keep the focus on the FLIP-84
Table API changes and to not dive into a discussion about multiline SQL
query support [4].

While this feature is clearly based on multiple SQL queries, I think it is
a bit different from what we usually understand as multiline SQL support.
That's because a statement set ends up to be a single Flink job. Hence,
there is no need on the Flink side to coordinate the execution of multiple
jobs (incl. the discussion about blocking or async execution of queries).
Flink would treat the queries in a STATEMENT SET as a single query.

I would like to start a discussion about supporting the [START|END]
STATEMENT SET syntax (or a different syntax with equivalent semantics) in
Flink.
I don't have a strong preference whether this should be implemented in
Flink's SQL core or be a purely client side implementation in the CLI
client. It would be good though to have parser support in Flink for this.

What do others think?

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
[2]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
[3]
https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
[4]
https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] SQL Syntax for Table API StatementSet

Jark Wu-2
Hi Fabian,

Thanks for starting this discussion. I think this is a very important
syntax to support file mode and multi-statement for SQL Client.
I'm +1 to introduce a syntax to group SQL statements to execute together.

As a reference, traditional database systems also have similar syntax, such
as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
transaction [1],
and also "BEGIN ... END" [2] [3] to group a set of SQL statements that
execute together.

Maybe we can also use "BEGIN ... END" syntax which is much simpler?

Regarding where to implement, I also prefer to have it in Flink SQL core,
here are some reasons from my side:
1) I think many downstream projects (e.g Zeppelin) will have the same
requirement. It would be better to have it in core instead of reinventing
the wheel by users.
2) Having it in SQL CLI means it is a standard syntax to support statement
set in Flink. So I think it makes sense to have it in core too, otherwise,
it looks like a broken feature.
    In 1.10, CREATE VIEW is only supported in SQL CLI, not supported in
TableEnvironment, which confuses many users.
3) Currently, we are moving statement parsing to use sql-parser
(FLINK-17728). Calcite has a good support for parsing multi-statements.
    It will be tricky to parse multi-statements only in SQL Client.

Best,
Jark

[1]:
https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
[2]:
https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
[3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html

On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <[hidden email]> wrote:

> Hi everyone,
>
> FLIP-84 [1] added the concept of a "statement set" to group multiple INSERT
> INTO statements (SQL or Table API) together. The statements in a statement
> set are jointly optimized and executed as a single Flink job.
>
> I would like to start a discussion about a SQL syntax to group multiple
> INSERT INTO statements in a statement set. The use case would be to expose
> the statement set feature to a solely text based client for Flink SQL such
> as Flink's SQL CLI [1].
>
> During the discussion of FLIP-84, we had briefly talked about such a syntax
> [3].
>
> START STATEMENT SET;
> INSERT INTO ... SELECT ...;
> INSERT INTO ... SELECT ...;
> ...
> END STATEMENT SET;
>
> We didn't follow up on this proposal, to keep the focus on the FLIP-84
> Table API changes and to not dive into a discussion about multiline SQL
> query support [4].
>
> While this feature is clearly based on multiple SQL queries, I think it is
> a bit different from what we usually understand as multiline SQL support.
> That's because a statement set ends up to be a single Flink job. Hence,
> there is no need on the Flink side to coordinate the execution of multiple
> jobs (incl. the discussion about blocking or async execution of queries).
> Flink would treat the queries in a STATEMENT SET as a single query.
>
> I would like to start a discussion about supporting the [START|END]
> STATEMENT SET syntax (or a different syntax with equivalent semantics) in
> Flink.
> I don't have a strong preference whether this should be implemented in
> Flink's SQL core or be a purely client side implementation in the CLI
> client. It would be good though to have parser support in Flink for this.
>
> What do others think?
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> [2]
>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
> [3]
>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
> [4]
>
> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] SQL Syntax for Table API StatementSet

Fabian Hueske-2
Thanks for joining this discussion Jark!

This feature is a bit different from BEGIN TRANSACTION / COMMIT and BEGIN /
END.

The only commonality is that all three group multiple statements.
* BEGIN TRANSACTION / COMMIT creates a transactional context that
guarantees atomicity, consistency, and isolation. Statements and queries
are sequentially executed.
* BEGIN / END defines a block of statements just like curly braces ({ and
}) do in Java. The statements (which can also include variable definitions
and printing) are sequentially executed.
* A statement set defines a group of statements that are optimized together
and jointly executed at the same time, i.e., there is no sequence or order.

A statement set (consisting of multiple INSERT INTO statements) behaves
just like a single INSERT INTO statement.
Everywhere where an INSERT INTO statement can be executed, it should be
possible to execute a statement set consisting of multiple INSERT INTO
statements.
That's also why I think that statement sets are orthogonal to
multi-statement execution.

As I said before, I'm happy to discuss syntax proposals for statement sets.
However, I think a BEGIN / END syntax for statement sets would confuse
users who know this syntax from MySQL, SQL Server, or another DBMS.

Thanks,
Fabian


Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <[hidden email]>:

> Hi Fabian,
>
> Thanks for starting this discussion. I think this is a very important
> syntax to support file mode and multi-statement for SQL Client.
> I'm +1 to introduce a syntax to group SQL statements to execute together.
>
> As a reference, traditional database systems also have similar syntax, such
> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
> transaction [1],
> and also "BEGIN ... END" [2] [3] to group a set of SQL statements that
> execute together.
>
> Maybe we can also use "BEGIN ... END" syntax which is much simpler?
>
> Regarding where to implement, I also prefer to have it in Flink SQL core,
> here are some reasons from my side:
> 1) I think many downstream projects (e.g Zeppelin) will have the same
> requirement. It would be better to have it in core instead of reinventing
> the wheel by users.
> 2) Having it in SQL CLI means it is a standard syntax to support statement
> set in Flink. So I think it makes sense to have it in core too, otherwise,
> it looks like a broken feature.
>     In 1.10, CREATE VIEW is only supported in SQL CLI, not supported in
> TableEnvironment, which confuses many users.
> 3) Currently, we are moving statement parsing to use sql-parser
> (FLINK-17728). Calcite has a good support for parsing multi-statements.
>     It will be tricky to parse multi-statements only in SQL Client.
>
> Best,
> Jark
>
> [1]:
>
> https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
> [2]:
>
> https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
>
> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > FLIP-84 [1] added the concept of a "statement set" to group multiple
> INSERT
> > INTO statements (SQL or Table API) together. The statements in a
> statement
> > set are jointly optimized and executed as a single Flink job.
> >
> > I would like to start a discussion about a SQL syntax to group multiple
> > INSERT INTO statements in a statement set. The use case would be to
> expose
> > the statement set feature to a solely text based client for Flink SQL
> such
> > as Flink's SQL CLI [1].
> >
> > During the discussion of FLIP-84, we had briefly talked about such a
> syntax
> > [3].
> >
> > START STATEMENT SET;
> > INSERT INTO ... SELECT ...;
> > INSERT INTO ... SELECT ...;
> > ...
> > END STATEMENT SET;
> >
> > We didn't follow up on this proposal, to keep the focus on the FLIP-84
> > Table API changes and to not dive into a discussion about multiline SQL
> > query support [4].
> >
> > While this feature is clearly based on multiple SQL queries, I think it
> is
> > a bit different from what we usually understand as multiline SQL support.
> > That's because a statement set ends up to be a single Flink job. Hence,
> > there is no need on the Flink side to coordinate the execution of
> multiple
> > jobs (incl. the discussion about blocking or async execution of queries).
> > Flink would treat the queries in a STATEMENT SET as a single query.
> >
> > I would like to start a discussion about supporting the [START|END]
> > STATEMENT SET syntax (or a different syntax with equivalent semantics) in
> > Flink.
> > I don't have a strong preference whether this should be implemented in
> > Flink's SQL core or be a purely client side implementation in the CLI
> > client. It would be good though to have parser support in Flink for this.
> >
> > What do others think?
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > [2]
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
> > [3]
> >
> >
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
> > [4]
> >
> >
> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] SQL Syntax for Table API StatementSet

Timo Walther-2
Hi Fabian,

thanks for the proposal. I agree that we should have consensus on the
SQL syntax as well and thus finalize the concepts introduced in FLIP-84.

I would favor Jark's proposal. I would like to propose the following syntax:

BEGIN STATEMENT SET;
   INSERT INTO ...;
   INSERT INTO ...;
END;

1) BEGIN and END are commonly used for blocks in SQL.

2) We should not start mixing START/BEGIN for different kind of blocks.
Because that can also be confusing for users. There is no additional
helpful semantic in using START over BEGIN.

3) Instead, we should rather parameterize the block statament with
`STATEMENT SET` and keep the END of the block simple (also similar to
CASE ... WHEN ... END).

4) If we look at Jark's example in SQL Server, the BEGIN is also
parameterized by `BEGIN { TRAN | TRANSACTION }`.

5) Also in Java curly braces are used for both classes, methods, and
loops for different purposes parameterized by the preceding code.

Regards,
Timo


On 17.06.20 11:36, Fabian Hueske wrote:

> Thanks for joining this discussion Jark!
>
> This feature is a bit different from BEGIN TRANSACTION / COMMIT and BEGIN /
> END.
>
> The only commonality is that all three group multiple statements.
> * BEGIN TRANSACTION / COMMIT creates a transactional context that
> guarantees atomicity, consistency, and isolation. Statements and queries
> are sequentially executed.
> * BEGIN / END defines a block of statements just like curly braces ({ and
> }) do in Java. The statements (which can also include variable definitions
> and printing) are sequentially executed.
> * A statement set defines a group of statements that are optimized together
> and jointly executed at the same time, i.e., there is no sequence or order.
>
> A statement set (consisting of multiple INSERT INTO statements) behaves
> just like a single INSERT INTO statement.
> Everywhere where an INSERT INTO statement can be executed, it should be
> possible to execute a statement set consisting of multiple INSERT INTO
> statements.
> That's also why I think that statement sets are orthogonal to
> multi-statement execution.
>
> As I said before, I'm happy to discuss syntax proposals for statement sets.
> However, I think a BEGIN / END syntax for statement sets would confuse
> users who know this syntax from MySQL, SQL Server, or another DBMS.
>
> Thanks,
> Fabian
>
>
> Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <[hidden email]>:
>
>> Hi Fabian,
>>
>> Thanks for starting this discussion. I think this is a very important
>> syntax to support file mode and multi-statement for SQL Client.
>> I'm +1 to introduce a syntax to group SQL statements to execute together.
>>
>> As a reference, traditional database systems also have similar syntax, such
>> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
>> transaction [1],
>> and also "BEGIN ... END" [2] [3] to group a set of SQL statements that
>> execute together.
>>
>> Maybe we can also use "BEGIN ... END" syntax which is much simpler?
>>
>> Regarding where to implement, I also prefer to have it in Flink SQL core,
>> here are some reasons from my side:
>> 1) I think many downstream projects (e.g Zeppelin) will have the same
>> requirement. It would be better to have it in core instead of reinventing
>> the wheel by users.
>> 2) Having it in SQL CLI means it is a standard syntax to support statement
>> set in Flink. So I think it makes sense to have it in core too, otherwise,
>> it looks like a broken feature.
>>      In 1.10, CREATE VIEW is only supported in SQL CLI, not supported in
>> TableEnvironment, which confuses many users.
>> 3) Currently, we are moving statement parsing to use sql-parser
>> (FLINK-17728). Calcite has a good support for parsing multi-statements.
>>      It will be tricky to parse multi-statements only in SQL Client.
>>
>> Best,
>> Jark
>>
>> [1]:
>>
>> https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
>> [2]:
>>
>> https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
>> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
>>
>> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <[hidden email]> wrote:
>>
>>> Hi everyone,
>>>
>>> FLIP-84 [1] added the concept of a "statement set" to group multiple
>> INSERT
>>> INTO statements (SQL or Table API) together. The statements in a
>> statement
>>> set are jointly optimized and executed as a single Flink job.
>>>
>>> I would like to start a discussion about a SQL syntax to group multiple
>>> INSERT INTO statements in a statement set. The use case would be to
>> expose
>>> the statement set feature to a solely text based client for Flink SQL
>> such
>>> as Flink's SQL CLI [1].
>>>
>>> During the discussion of FLIP-84, we had briefly talked about such a
>> syntax
>>> [3].
>>>
>>> START STATEMENT SET;
>>> INSERT INTO ... SELECT ...;
>>> INSERT INTO ... SELECT ...;
>>> ...
>>> END STATEMENT SET;
>>>
>>> We didn't follow up on this proposal, to keep the focus on the FLIP-84
>>> Table API changes and to not dive into a discussion about multiline SQL
>>> query support [4].
>>>
>>> While this feature is clearly based on multiple SQL queries, I think it
>> is
>>> a bit different from what we usually understand as multiline SQL support.
>>> That's because a statement set ends up to be a single Flink job. Hence,
>>> there is no need on the Flink side to coordinate the execution of
>> multiple
>>> jobs (incl. the discussion about blocking or async execution of queries).
>>> Flink would treat the queries in a STATEMENT SET as a single query.
>>>
>>> I would like to start a discussion about supporting the [START|END]
>>> STATEMENT SET syntax (or a different syntax with equivalent semantics) in
>>> Flink.
>>> I don't have a strong preference whether this should be implemented in
>>> Flink's SQL core or be a purely client side implementation in the CLI
>>> client. It would be good though to have parser support in Flink for this.
>>>
>>> What do others think?
>>>
>>> [1]
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>> [2]
>>>
>>>
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
>>> [3]
>>>
>>>
>> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
>>> [4]
>>>
>>>
>> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] SQL Syntax for Table API StatementSet

godfreyhe
Hi Fabian, Jack, Timo

Thanks for the suggestions.

Regarding the SQL syntax, BEGIN is more popular than START. I'm fine with
the syntax Timo suggested.

Regarding whether this should be implemented in Flink's SQL core. I think
there are three things to consider:

First one, do we need to unify the default behavior of API and sql file?
The execution of `TableEnvironment#executeSql` method and
`StatementSet#execute` method is asynchronous
for both batch and streaming, which means these methods just submit the job
and then return a `TableResult`.
 While for batch processing (e.g. hive, traditional databases), the default
behavior is sync mode.
So this behavior is different from the APIs. I think it's better we can
unify the default behavior.

Second one, how to determine the execution behavior of each statement in a
file which contains both
batch sql and streaming sql. Currently, we have a flag to tell the planner
that the TableEnvironment is
batch env or stream env which can determine the default behavior. We want
to remove
the flag and unify the TableEnvironment in the future. Then
TableEnvironment can execute both
batch sql and streaming sql. Timo and I have a discussion about this on
slack: for DML & DQL,
if a statement has keywords like `EMIT STREAM`, it's streaming sql and will
be executed in async mode.
otherwise it's a batch sql and will be executed in sync mode.

Three one, how to flexibly support execution mode switching for batch sql.
For streaming sql, all DMLs & DQLs should be in async mode because the job
may be never finished.
While for batch sql, I think both modes are needed. I know some platforms
execute batch sql
in async mode, and then continuously monitor the job status. Do we need
introduce `set execute-mode=xx` command
 or new sql syntax like `START SYNC EXECUTION` ?

For sql-client or other projects, we can easily decide what behavior an app
can support.
Just as Jark said, many downstream projects have the same requirement for
multiple statement support,
but they may have different execution behaviors. It's great if flink can
support flexible execution modes.
Or Flink core just defines the syntax, provides parser and supports a
default execution mode.
The downstream projects can use the APIs and parsed results to decide how
to execute a sql.

Best,
Godfrey

Timo Walther <[hidden email]> 于2020年6月17日周三 下午6:32写道:

> Hi Fabian,
>
> thanks for the proposal. I agree that we should have consensus on the
> SQL syntax as well and thus finalize the concepts introduced in FLIP-84.
>
> I would favor Jark's proposal. I would like to propose the following
> syntax:
>
> BEGIN STATEMENT SET;
>    INSERT INTO ...;
>    INSERT INTO ...;
> END;
>
> 1) BEGIN and END are commonly used for blocks in SQL.
>
> 2) We should not start mixing START/BEGIN for different kind of blocks.
> Because that can also be confusing for users. There is no additional
> helpful semantic in using START over BEGIN.
>
> 3) Instead, we should rather parameterize the block statament with
> `STATEMENT SET` and keep the END of the block simple (also similar to
> CASE ... WHEN ... END).
>
> 4) If we look at Jark's example in SQL Server, the BEGIN is also
> parameterized by `BEGIN { TRAN | TRANSACTION }`.
>
> 5) Also in Java curly braces are used for both classes, methods, and
> loops for different purposes parameterized by the preceding code.
>
> Regards,
> Timo
>
>
> On 17.06.20 11:36, Fabian Hueske wrote:
> > Thanks for joining this discussion Jark!
> >
> > This feature is a bit different from BEGIN TRANSACTION / COMMIT and
> BEGIN /
> > END.
> >
> > The only commonality is that all three group multiple statements.
> > * BEGIN TRANSACTION / COMMIT creates a transactional context that
> > guarantees atomicity, consistency, and isolation. Statements and queries
> > are sequentially executed.
> > * BEGIN / END defines a block of statements just like curly braces ({ and
> > }) do in Java. The statements (which can also include variable
> definitions
> > and printing) are sequentially executed.
> > * A statement set defines a group of statements that are optimized
> together
> > and jointly executed at the same time, i.e., there is no sequence or
> order.
> >
> > A statement set (consisting of multiple INSERT INTO statements) behaves
> > just like a single INSERT INTO statement.
> > Everywhere where an INSERT INTO statement can be executed, it should be
> > possible to execute a statement set consisting of multiple INSERT INTO
> > statements.
> > That's also why I think that statement sets are orthogonal to
> > multi-statement execution.
> >
> > As I said before, I'm happy to discuss syntax proposals for statement
> sets.
> > However, I think a BEGIN / END syntax for statement sets would confuse
> > users who know this syntax from MySQL, SQL Server, or another DBMS.
> >
> > Thanks,
> > Fabian
> >
> >
> > Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <[hidden email]>:
> >
> >> Hi Fabian,
> >>
> >> Thanks for starting this discussion. I think this is a very important
> >> syntax to support file mode and multi-statement for SQL Client.
> >> I'm +1 to introduce a syntax to group SQL statements to execute
> together.
> >>
> >> As a reference, traditional database systems also have similar syntax,
> such
> >> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
> >> transaction [1],
> >> and also "BEGIN ... END" [2] [3] to group a set of SQL statements that
> >> execute together.
> >>
> >> Maybe we can also use "BEGIN ... END" syntax which is much simpler?
> >>
> >> Regarding where to implement, I also prefer to have it in Flink SQL
> core,
> >> here are some reasons from my side:
> >> 1) I think many downstream projects (e.g Zeppelin) will have the same
> >> requirement. It would be better to have it in core instead of
> reinventing
> >> the wheel by users.
> >> 2) Having it in SQL CLI means it is a standard syntax to support
> statement
> >> set in Flink. So I think it makes sense to have it in core too,
> otherwise,
> >> it looks like a broken feature.
> >>      In 1.10, CREATE VIEW is only supported in SQL CLI, not supported in
> >> TableEnvironment, which confuses many users.
> >> 3) Currently, we are moving statement parsing to use sql-parser
> >> (FLINK-17728). Calcite has a good support for parsing multi-statements.
> >>      It will be tricky to parse multi-statements only in SQL Client.
> >>
> >> Best,
> >> Jark
> >>
> >> [1]:
> >>
> >>
> https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
> >> [2]:
> >>
> >>
> https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
> >> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
> >>
> >> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <[hidden email]> wrote:
> >>
> >>> Hi everyone,
> >>>
> >>> FLIP-84 [1] added the concept of a "statement set" to group multiple
> >> INSERT
> >>> INTO statements (SQL or Table API) together. The statements in a
> >> statement
> >>> set are jointly optimized and executed as a single Flink job.
> >>>
> >>> I would like to start a discussion about a SQL syntax to group multiple
> >>> INSERT INTO statements in a statement set. The use case would be to
> >> expose
> >>> the statement set feature to a solely text based client for Flink SQL
> >> such
> >>> as Flink's SQL CLI [1].
> >>>
> >>> During the discussion of FLIP-84, we had briefly talked about such a
> >> syntax
> >>> [3].
> >>>
> >>> START STATEMENT SET;
> >>> INSERT INTO ... SELECT ...;
> >>> INSERT INTO ... SELECT ...;
> >>> ...
> >>> END STATEMENT SET;
> >>>
> >>> We didn't follow up on this proposal, to keep the focus on the FLIP-84
> >>> Table API changes and to not dive into a discussion about multiline SQL
> >>> query support [4].
> >>>
> >>> While this feature is clearly based on multiple SQL queries, I think it
> >> is
> >>> a bit different from what we usually understand as multiline SQL
> support.
> >>> That's because a statement set ends up to be a single Flink job. Hence,
> >>> there is no need on the Flink side to coordinate the execution of
> >> multiple
> >>> jobs (incl. the discussion about blocking or async execution of
> queries).
> >>> Flink would treat the queries in a STATEMENT SET as a single query.
> >>>
> >>> I would like to start a discussion about supporting the [START|END]
> >>> STATEMENT SET syntax (or a different syntax with equivalent semantics)
> in
> >>> Flink.
> >>> I don't have a strong preference whether this should be implemented in
> >>> Flink's SQL core or be a purely client side implementation in the CLI
> >>> client. It would be good though to have parser support in Flink for
> this.
> >>>
> >>> What do others think?
> >>>
> >>> [1]
> >>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>> [2]
> >>>
> >>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
> >>> [3]
> >>>
> >>>
> >>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
> >>> [4]
> >>>
> >>>
> >>
> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
> >>>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] SQL Syntax for Table API StatementSet

Timo Walther-2
Hi Godfrey,

1) Of course we should have unified behavior for API and SQL file.
However, this doesn't mean that `executeSql` needs to become blocking or
support multi-statements. In a programmatic API, async is more useful as
a user can control long running jobs (regardless of batch or streaming).
Sync behavior can be expressed on an async API (e.g.
TableResult.await()). If we support multi-statements in the API, it will
not be supported through `executeSql`, this part of the API has been
finalized in the last release. We need to come up with a new API method.

3) I think forcing async execution also for multiline batch queries in
SQL can be future work. Either we enable those using a flag or special
syntax in a SQL file. Or do we want this flecibility already in the
first multi-statement support version?

Regards,
Timo

On 17.06.20 15:27, godfrey he wrote:

> Hi Fabian, Jack, Timo
>
> Thanks for the suggestions.
>
> Regarding the SQL syntax, BEGIN is more popular than START. I'm fine with
> the syntax Timo suggested.
>
> Regarding whether this should be implemented in Flink's SQL core. I think
> there are three things to consider:
>
> First one, do we need to unify the default behavior of API and sql file?
> The execution of `TableEnvironment#executeSql` method and
> `StatementSet#execute` method is asynchronous
> for both batch and streaming, which means these methods just submit the job
> and then return a `TableResult`.
>   While for batch processing (e.g. hive, traditional databases), the default
> behavior is sync mode.
> So this behavior is different from the APIs. I think it's better we can
> unify the default behavior.
>
> Second one, how to determine the execution behavior of each statement in a
> file which contains both
> batch sql and streaming sql. Currently, we have a flag to tell the planner
> that the TableEnvironment is
> batch env or stream env which can determine the default behavior. We want
> to remove
> the flag and unify the TableEnvironment in the future. Then
> TableEnvironment can execute both
> batch sql and streaming sql. Timo and I have a discussion about this on
> slack: for DML & DQL,
> if a statement has keywords like `EMIT STREAM`, it's streaming sql and will
> be executed in async mode.
> otherwise it's a batch sql and will be executed in sync mode.
>
> Three one, how to flexibly support execution mode switching for batch sql.
> For streaming sql, all DMLs & DQLs should be in async mode because the job
> may be never finished.
> While for batch sql, I think both modes are needed. I know some platforms
> execute batch sql
> in async mode, and then continuously monitor the job status. Do we need
> introduce `set execute-mode=xx` command
>   or new sql syntax like `START SYNC EXECUTION` ?
>
> For sql-client or other projects, we can easily decide what behavior an app
> can support.
> Just as Jark said, many downstream projects have the same requirement for
> multiple statement support,
> but they may have different execution behaviors. It's great if flink can
> support flexible execution modes.
> Or Flink core just defines the syntax, provides parser and supports a
> default execution mode.
> The downstream projects can use the APIs and parsed results to decide how
> to execute a sql.
>
> Best,
> Godfrey
>
> Timo Walther <[hidden email]> 于2020年6月17日周三 下午6:32写道:
>
>> Hi Fabian,
>>
>> thanks for the proposal. I agree that we should have consensus on the
>> SQL syntax as well and thus finalize the concepts introduced in FLIP-84.
>>
>> I would favor Jark's proposal. I would like to propose the following
>> syntax:
>>
>> BEGIN STATEMENT SET;
>>     INSERT INTO ...;
>>     INSERT INTO ...;
>> END;
>>
>> 1) BEGIN and END are commonly used for blocks in SQL.
>>
>> 2) We should not start mixing START/BEGIN for different kind of blocks.
>> Because that can also be confusing for users. There is no additional
>> helpful semantic in using START over BEGIN.
>>
>> 3) Instead, we should rather parameterize the block statament with
>> `STATEMENT SET` and keep the END of the block simple (also similar to
>> CASE ... WHEN ... END).
>>
>> 4) If we look at Jark's example in SQL Server, the BEGIN is also
>> parameterized by `BEGIN { TRAN | TRANSACTION }`.
>>
>> 5) Also in Java curly braces are used for both classes, methods, and
>> loops for different purposes parameterized by the preceding code.
>>
>> Regards,
>> Timo
>>
>>
>> On 17.06.20 11:36, Fabian Hueske wrote:
>>> Thanks for joining this discussion Jark!
>>>
>>> This feature is a bit different from BEGIN TRANSACTION / COMMIT and
>> BEGIN /
>>> END.
>>>
>>> The only commonality is that all three group multiple statements.
>>> * BEGIN TRANSACTION / COMMIT creates a transactional context that
>>> guarantees atomicity, consistency, and isolation. Statements and queries
>>> are sequentially executed.
>>> * BEGIN / END defines a block of statements just like curly braces ({ and
>>> }) do in Java. The statements (which can also include variable
>> definitions
>>> and printing) are sequentially executed.
>>> * A statement set defines a group of statements that are optimized
>> together
>>> and jointly executed at the same time, i.e., there is no sequence or
>> order.
>>>
>>> A statement set (consisting of multiple INSERT INTO statements) behaves
>>> just like a single INSERT INTO statement.
>>> Everywhere where an INSERT INTO statement can be executed, it should be
>>> possible to execute a statement set consisting of multiple INSERT INTO
>>> statements.
>>> That's also why I think that statement sets are orthogonal to
>>> multi-statement execution.
>>>
>>> As I said before, I'm happy to discuss syntax proposals for statement
>> sets.
>>> However, I think a BEGIN / END syntax for statement sets would confuse
>>> users who know this syntax from MySQL, SQL Server, or another DBMS.
>>>
>>> Thanks,
>>> Fabian
>>>
>>>
>>> Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <[hidden email]>:
>>>
>>>> Hi Fabian,
>>>>
>>>> Thanks for starting this discussion. I think this is a very important
>>>> syntax to support file mode and multi-statement for SQL Client.
>>>> I'm +1 to introduce a syntax to group SQL statements to execute
>> together.
>>>>
>>>> As a reference, traditional database systems also have similar syntax,
>> such
>>>> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
>>>> transaction [1],
>>>> and also "BEGIN ... END" [2] [3] to group a set of SQL statements that
>>>> execute together.
>>>>
>>>> Maybe we can also use "BEGIN ... END" syntax which is much simpler?
>>>>
>>>> Regarding where to implement, I also prefer to have it in Flink SQL
>> core,
>>>> here are some reasons from my side:
>>>> 1) I think many downstream projects (e.g Zeppelin) will have the same
>>>> requirement. It would be better to have it in core instead of
>> reinventing
>>>> the wheel by users.
>>>> 2) Having it in SQL CLI means it is a standard syntax to support
>> statement
>>>> set in Flink. So I think it makes sense to have it in core too,
>> otherwise,
>>>> it looks like a broken feature.
>>>>       In 1.10, CREATE VIEW is only supported in SQL CLI, not supported in
>>>> TableEnvironment, which confuses many users.
>>>> 3) Currently, we are moving statement parsing to use sql-parser
>>>> (FLINK-17728). Calcite has a good support for parsing multi-statements.
>>>>       It will be tricky to parse multi-statements only in SQL Client.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> [1]:
>>>>
>>>>
>> https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
>>>> [2]:
>>>>
>>>>
>> https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
>>>> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
>>>>
>>>> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <[hidden email]> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> FLIP-84 [1] added the concept of a "statement set" to group multiple
>>>> INSERT
>>>>> INTO statements (SQL or Table API) together. The statements in a
>>>> statement
>>>>> set are jointly optimized and executed as a single Flink job.
>>>>>
>>>>> I would like to start a discussion about a SQL syntax to group multiple
>>>>> INSERT INTO statements in a statement set. The use case would be to
>>>> expose
>>>>> the statement set feature to a solely text based client for Flink SQL
>>>> such
>>>>> as Flink's SQL CLI [1].
>>>>>
>>>>> During the discussion of FLIP-84, we had briefly talked about such a
>>>> syntax
>>>>> [3].
>>>>>
>>>>> START STATEMENT SET;
>>>>> INSERT INTO ... SELECT ...;
>>>>> INSERT INTO ... SELECT ...;
>>>>> ...
>>>>> END STATEMENT SET;
>>>>>
>>>>> We didn't follow up on this proposal, to keep the focus on the FLIP-84
>>>>> Table API changes and to not dive into a discussion about multiline SQL
>>>>> query support [4].
>>>>>
>>>>> While this feature is clearly based on multiple SQL queries, I think it
>>>> is
>>>>> a bit different from what we usually understand as multiline SQL
>> support.
>>>>> That's because a statement set ends up to be a single Flink job. Hence,
>>>>> there is no need on the Flink side to coordinate the execution of
>>>> multiple
>>>>> jobs (incl. the discussion about blocking or async execution of
>> queries).
>>>>> Flink would treat the queries in a STATEMENT SET as a single query.
>>>>>
>>>>> I would like to start a discussion about supporting the [START|END]
>>>>> STATEMENT SET syntax (or a different syntax with equivalent semantics)
>> in
>>>>> Flink.
>>>>> I don't have a strong preference whether this should be implemented in
>>>>> Flink's SQL core or be a purely client side implementation in the CLI
>>>>> client. It would be good though to have parser support in Flink for
>> this.
>>>>>
>>>>> What do others think?
>>>>>
>>>>> [1]
>>>>>
>>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>>>> [2]
>>>>>
>>>>>
>>>>
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
>>>>> [3]
>>>>>
>>>>>
>>>>
>> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
>>>>> [4]
>>>>>
>>>>>
>>>>
>> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
>>>>>
>>>>
>>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] SQL Syntax for Table API StatementSet

Fabian Hueske-2
Thanks for the discussion Godfrey and Timo,

I like the syntax proposed by Jark and Timo:

BEGIN STATEMENT SET;
   INSERT INTO ...;
   INSERT INTO ...;
END;

(I didn't pay attention and didn't mean to propose START over BEGIN. I just
wanted to make the point that the syntax should make it clear that a
statement set is started).

I think the important questions about streaming/batch queries and
sync/async execution need to be discussed and solved.
However, I think these points are orthogonal to the question about
supporting statement sets.
These issues exist today (without a SQL syntax for statement sets) and IMO
such a syntax doesn't make the situation any worse or better (assuming that
we agree on the limitation that all statements in a set are either
streaming or batch queries).
As I said before, from Flink's point of view a statement set can be
replaced by a single INSERT INTO query (either streaming or batch,
depending on the type of queries in the set).

Best, Fabian


Am Mo., 22. Juni 2020 um 10:55 Uhr schrieb Timo Walther <[hidden email]
>:

> Hi Godfrey,
>
> 1) Of course we should have unified behavior for API and SQL file.
> However, this doesn't mean that `executeSql` needs to become blocking or
> support multi-statements. In a programmatic API, async is more useful as
> a user can control long running jobs (regardless of batch or streaming).
> Sync behavior can be expressed on an async API (e.g.
> TableResult.await()). If we support multi-statements in the API, it will
> not be supported through `executeSql`, this part of the API has been
> finalized in the last release. We need to come up with a new API method.
>
> 3) I think forcing async execution also for multiline batch queries in
> SQL can be future work. Either we enable those using a flag or special
> syntax in a SQL file. Or do we want this flecibility already in the
> first multi-statement support version?
>
> Regards,
> Timo
>
> On 17.06.20 15:27, godfrey he wrote:
> > Hi Fabian, Jack, Timo
> >
> > Thanks for the suggestions.
> >
> > Regarding the SQL syntax, BEGIN is more popular than START. I'm fine with
> > the syntax Timo suggested.
> >
> > Regarding whether this should be implemented in Flink's SQL core. I think
> > there are three things to consider:
> >
> > First one, do we need to unify the default behavior of API and sql file?
> > The execution of `TableEnvironment#executeSql` method and
> > `StatementSet#execute` method is asynchronous
> > for both batch and streaming, which means these methods just submit the
> job
> > and then return a `TableResult`.
> >   While for batch processing (e.g. hive, traditional databases), the
> default
> > behavior is sync mode.
> > So this behavior is different from the APIs. I think it's better we can
> > unify the default behavior.
> >
> > Second one, how to determine the execution behavior of each statement in
> a
> > file which contains both
> > batch sql and streaming sql. Currently, we have a flag to tell the
> planner
> > that the TableEnvironment is
> > batch env or stream env which can determine the default behavior. We want
> > to remove
> > the flag and unify the TableEnvironment in the future. Then
> > TableEnvironment can execute both
> > batch sql and streaming sql. Timo and I have a discussion about this on
> > slack: for DML & DQL,
> > if a statement has keywords like `EMIT STREAM`, it's streaming sql and
> will
> > be executed in async mode.
> > otherwise it's a batch sql and will be executed in sync mode.
> >
> > Three one, how to flexibly support execution mode switching for batch
> sql.
> > For streaming sql, all DMLs & DQLs should be in async mode because the
> job
> > may be never finished.
> > While for batch sql, I think both modes are needed. I know some platforms
> > execute batch sql
> > in async mode, and then continuously monitor the job status. Do we need
> > introduce `set execute-mode=xx` command
> >   or new sql syntax like `START SYNC EXECUTION` ?
> >
> > For sql-client or other projects, we can easily decide what behavior an
> app
> > can support.
> > Just as Jark said, many downstream projects have the same requirement for
> > multiple statement support,
> > but they may have different execution behaviors. It's great if flink can
> > support flexible execution modes.
> > Or Flink core just defines the syntax, provides parser and supports a
> > default execution mode.
> > The downstream projects can use the APIs and parsed results to decide how
> > to execute a sql.
> >
> > Best,
> > Godfrey
> >
> > Timo Walther <[hidden email]> 于2020年6月17日周三 下午6:32写道:
> >
> >> Hi Fabian,
> >>
> >> thanks for the proposal. I agree that we should have consensus on the
> >> SQL syntax as well and thus finalize the concepts introduced in FLIP-84.
> >>
> >> I would favor Jark's proposal. I would like to propose the following
> >> syntax:
> >>
> >> BEGIN STATEMENT SET;
> >>     INSERT INTO ...;
> >>     INSERT INTO ...;
> >> END;
> >>
> >> 1) BEGIN and END are commonly used for blocks in SQL.
> >>
> >> 2) We should not start mixing START/BEGIN for different kind of blocks.
> >> Because that can also be confusing for users. There is no additional
> >> helpful semantic in using START over BEGIN.
> >>
> >> 3) Instead, we should rather parameterize the block statament with
> >> `STATEMENT SET` and keep the END of the block simple (also similar to
> >> CASE ... WHEN ... END).
> >>
> >> 4) If we look at Jark's example in SQL Server, the BEGIN is also
> >> parameterized by `BEGIN { TRAN | TRANSACTION }`.
> >>
> >> 5) Also in Java curly braces are used for both classes, methods, and
> >> loops for different purposes parameterized by the preceding code.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 17.06.20 11:36, Fabian Hueske wrote:
> >>> Thanks for joining this discussion Jark!
> >>>
> >>> This feature is a bit different from BEGIN TRANSACTION / COMMIT and
> >> BEGIN /
> >>> END.
> >>>
> >>> The only commonality is that all three group multiple statements.
> >>> * BEGIN TRANSACTION / COMMIT creates a transactional context that
> >>> guarantees atomicity, consistency, and isolation. Statements and
> queries
> >>> are sequentially executed.
> >>> * BEGIN / END defines a block of statements just like curly braces ({
> and
> >>> }) do in Java. The statements (which can also include variable
> >> definitions
> >>> and printing) are sequentially executed.
> >>> * A statement set defines a group of statements that are optimized
> >> together
> >>> and jointly executed at the same time, i.e., there is no sequence or
> >> order.
> >>>
> >>> A statement set (consisting of multiple INSERT INTO statements) behaves
> >>> just like a single INSERT INTO statement.
> >>> Everywhere where an INSERT INTO statement can be executed, it should be
> >>> possible to execute a statement set consisting of multiple INSERT INTO
> >>> statements.
> >>> That's also why I think that statement sets are orthogonal to
> >>> multi-statement execution.
> >>>
> >>> As I said before, I'm happy to discuss syntax proposals for statement
> >> sets.
> >>> However, I think a BEGIN / END syntax for statement sets would confuse
> >>> users who know this syntax from MySQL, SQL Server, or another DBMS.
> >>>
> >>> Thanks,
> >>> Fabian
> >>>
> >>>
> >>> Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <[hidden email]>:
> >>>
> >>>> Hi Fabian,
> >>>>
> >>>> Thanks for starting this discussion. I think this is a very important
> >>>> syntax to support file mode and multi-statement for SQL Client.
> >>>> I'm +1 to introduce a syntax to group SQL statements to execute
> >> together.
> >>>>
> >>>> As a reference, traditional database systems also have similar syntax,
> >> such
> >>>> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
> >>>> transaction [1],
> >>>> and also "BEGIN ... END" [2] [3] to group a set of SQL statements that
> >>>> execute together.
> >>>>
> >>>> Maybe we can also use "BEGIN ... END" syntax which is much simpler?
> >>>>
> >>>> Regarding where to implement, I also prefer to have it in Flink SQL
> >> core,
> >>>> here are some reasons from my side:
> >>>> 1) I think many downstream projects (e.g Zeppelin) will have the same
> >>>> requirement. It would be better to have it in core instead of
> >> reinventing
> >>>> the wheel by users.
> >>>> 2) Having it in SQL CLI means it is a standard syntax to support
> >> statement
> >>>> set in Flink. So I think it makes sense to have it in core too,
> >> otherwise,
> >>>> it looks like a broken feature.
> >>>>       In 1.10, CREATE VIEW is only supported in SQL CLI, not
> supported in
> >>>> TableEnvironment, which confuses many users.
> >>>> 3) Currently, we are moving statement parsing to use sql-parser
> >>>> (FLINK-17728). Calcite has a good support for parsing
> multi-statements.
> >>>>       It will be tricky to parse multi-statements only in SQL Client.
> >>>>
> >>>> Best,
> >>>> Jark
> >>>>
> >>>> [1]:
> >>>>
> >>>>
> >>
> https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
> >>>> [2]:
> >>>>
> >>>>
> >>
> https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
> >>>> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
> >>>>
> >>>> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <[hidden email]>
> wrote:
> >>>>
> >>>>> Hi everyone,
> >>>>>
> >>>>> FLIP-84 [1] added the concept of a "statement set" to group multiple
> >>>> INSERT
> >>>>> INTO statements (SQL or Table API) together. The statements in a
> >>>> statement
> >>>>> set are jointly optimized and executed as a single Flink job.
> >>>>>
> >>>>> I would like to start a discussion about a SQL syntax to group
> multiple
> >>>>> INSERT INTO statements in a statement set. The use case would be to
> >>>> expose
> >>>>> the statement set feature to a solely text based client for Flink SQL
> >>>> such
> >>>>> as Flink's SQL CLI [1].
> >>>>>
> >>>>> During the discussion of FLIP-84, we had briefly talked about such a
> >>>> syntax
> >>>>> [3].
> >>>>>
> >>>>> START STATEMENT SET;
> >>>>> INSERT INTO ... SELECT ...;
> >>>>> INSERT INTO ... SELECT ...;
> >>>>> ...
> >>>>> END STATEMENT SET;
> >>>>>
> >>>>> We didn't follow up on this proposal, to keep the focus on the
> FLIP-84
> >>>>> Table API changes and to not dive into a discussion about multiline
> SQL
> >>>>> query support [4].
> >>>>>
> >>>>> While this feature is clearly based on multiple SQL queries, I think
> it
> >>>> is
> >>>>> a bit different from what we usually understand as multiline SQL
> >> support.
> >>>>> That's because a statement set ends up to be a single Flink job.
> Hence,
> >>>>> there is no need on the Flink side to coordinate the execution of
> >>>> multiple
> >>>>> jobs (incl. the discussion about blocking or async execution of
> >> queries).
> >>>>> Flink would treat the queries in a STATEMENT SET as a single query.
> >>>>>
> >>>>> I would like to start a discussion about supporting the [START|END]
> >>>>> STATEMENT SET syntax (or a different syntax with equivalent
> semantics)
> >> in
> >>>>> Flink.
> >>>>> I don't have a strong preference whether this should be implemented
> in
> >>>>> Flink's SQL core or be a purely client side implementation in the CLI
> >>>>> client. It would be good though to have parser support in Flink for
> >> this.
> >>>>>
> >>>>> What do others think?
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>> [2]
> >>>>>
> >>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
> >>>>> [3]
> >>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
> >>>>> [4]
> >>>>>
> >>>>>
> >>>>
> >>
> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] SQL Syntax for Table API StatementSet

Jark Wu-2
+1 to "BEGIN STATEMENT SET; ... END;" syntax.

I also think sync/async execution is orthogonal to statement set syntax.
This problem still stand there for individual statements.
We can discuss this in a separate thread.

Best,
Jark

On Mon, 22 Jun 2020 at 23:23, Fabian Hueske <[hidden email]> wrote:

> Thanks for the discussion Godfrey and Timo,
>
> I like the syntax proposed by Jark and Timo:
>
> BEGIN STATEMENT SET;
>    INSERT INTO ...;
>    INSERT INTO ...;
> END;
>
> (I didn't pay attention and didn't mean to propose START over BEGIN. I just
> wanted to make the point that the syntax should make it clear that a
> statement set is started).
>
> I think the important questions about streaming/batch queries and
> sync/async execution need to be discussed and solved.
> However, I think these points are orthogonal to the question about
> supporting statement sets.
> These issues exist today (without a SQL syntax for statement sets) and IMO
> such a syntax doesn't make the situation any worse or better (assuming that
> we agree on the limitation that all statements in a set are either
> streaming or batch queries).
> As I said before, from Flink's point of view a statement set can be
> replaced by a single INSERT INTO query (either streaming or batch,
> depending on the type of queries in the set).
>
> Best, Fabian
>
>
> Am Mo., 22. Juni 2020 um 10:55 Uhr schrieb Timo Walther <
> [hidden email]
> >:
>
> > Hi Godfrey,
> >
> > 1) Of course we should have unified behavior for API and SQL file.
> > However, this doesn't mean that `executeSql` needs to become blocking or
> > support multi-statements. In a programmatic API, async is more useful as
> > a user can control long running jobs (regardless of batch or streaming).
> > Sync behavior can be expressed on an async API (e.g.
> > TableResult.await()). If we support multi-statements in the API, it will
> > not be supported through `executeSql`, this part of the API has been
> > finalized in the last release. We need to come up with a new API method.
> >
> > 3) I think forcing async execution also for multiline batch queries in
> > SQL can be future work. Either we enable those using a flag or special
> > syntax in a SQL file. Or do we want this flecibility already in the
> > first multi-statement support version?
> >
> > Regards,
> > Timo
> >
> > On 17.06.20 15:27, godfrey he wrote:
> > > Hi Fabian, Jack, Timo
> > >
> > > Thanks for the suggestions.
> > >
> > > Regarding the SQL syntax, BEGIN is more popular than START. I'm fine
> with
> > > the syntax Timo suggested.
> > >
> > > Regarding whether this should be implemented in Flink's SQL core. I
> think
> > > there are three things to consider:
> > >
> > > First one, do we need to unify the default behavior of API and sql
> file?
> > > The execution of `TableEnvironment#executeSql` method and
> > > `StatementSet#execute` method is asynchronous
> > > for both batch and streaming, which means these methods just submit the
> > job
> > > and then return a `TableResult`.
> > >   While for batch processing (e.g. hive, traditional databases), the
> > default
> > > behavior is sync mode.
> > > So this behavior is different from the APIs. I think it's better we can
> > > unify the default behavior.
> > >
> > > Second one, how to determine the execution behavior of each statement
> in
> > a
> > > file which contains both
> > > batch sql and streaming sql. Currently, we have a flag to tell the
> > planner
> > > that the TableEnvironment is
> > > batch env or stream env which can determine the default behavior. We
> want
> > > to remove
> > > the flag and unify the TableEnvironment in the future. Then
> > > TableEnvironment can execute both
> > > batch sql and streaming sql. Timo and I have a discussion about this on
> > > slack: for DML & DQL,
> > > if a statement has keywords like `EMIT STREAM`, it's streaming sql and
> > will
> > > be executed in async mode.
> > > otherwise it's a batch sql and will be executed in sync mode.
> > >
> > > Three one, how to flexibly support execution mode switching for batch
> > sql.
> > > For streaming sql, all DMLs & DQLs should be in async mode because the
> > job
> > > may be never finished.
> > > While for batch sql, I think both modes are needed. I know some
> platforms
> > > execute batch sql
> > > in async mode, and then continuously monitor the job status. Do we need
> > > introduce `set execute-mode=xx` command
> > >   or new sql syntax like `START SYNC EXECUTION` ?
> > >
> > > For sql-client or other projects, we can easily decide what behavior an
> > app
> > > can support.
> > > Just as Jark said, many downstream projects have the same requirement
> for
> > > multiple statement support,
> > > but they may have different execution behaviors. It's great if flink
> can
> > > support flexible execution modes.
> > > Or Flink core just defines the syntax, provides parser and supports a
> > > default execution mode.
> > > The downstream projects can use the APIs and parsed results to decide
> how
> > > to execute a sql.
> > >
> > > Best,
> > > Godfrey
> > >
> > > Timo Walther <[hidden email]> 于2020年6月17日周三 下午6:32写道:
> > >
> > >> Hi Fabian,
> > >>
> > >> thanks for the proposal. I agree that we should have consensus on the
> > >> SQL syntax as well and thus finalize the concepts introduced in
> FLIP-84.
> > >>
> > >> I would favor Jark's proposal. I would like to propose the following
> > >> syntax:
> > >>
> > >> BEGIN STATEMENT SET;
> > >>     INSERT INTO ...;
> > >>     INSERT INTO ...;
> > >> END;
> > >>
> > >> 1) BEGIN and END are commonly used for blocks in SQL.
> > >>
> > >> 2) We should not start mixing START/BEGIN for different kind of
> blocks.
> > >> Because that can also be confusing for users. There is no additional
> > >> helpful semantic in using START over BEGIN.
> > >>
> > >> 3) Instead, we should rather parameterize the block statament with
> > >> `STATEMENT SET` and keep the END of the block simple (also similar to
> > >> CASE ... WHEN ... END).
> > >>
> > >> 4) If we look at Jark's example in SQL Server, the BEGIN is also
> > >> parameterized by `BEGIN { TRAN | TRANSACTION }`.
> > >>
> > >> 5) Also in Java curly braces are used for both classes, methods, and
> > >> loops for different purposes parameterized by the preceding code.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 17.06.20 11:36, Fabian Hueske wrote:
> > >>> Thanks for joining this discussion Jark!
> > >>>
> > >>> This feature is a bit different from BEGIN TRANSACTION / COMMIT and
> > >> BEGIN /
> > >>> END.
> > >>>
> > >>> The only commonality is that all three group multiple statements.
> > >>> * BEGIN TRANSACTION / COMMIT creates a transactional context that
> > >>> guarantees atomicity, consistency, and isolation. Statements and
> > queries
> > >>> are sequentially executed.
> > >>> * BEGIN / END defines a block of statements just like curly braces ({
> > and
> > >>> }) do in Java. The statements (which can also include variable
> > >> definitions
> > >>> and printing) are sequentially executed.
> > >>> * A statement set defines a group of statements that are optimized
> > >> together
> > >>> and jointly executed at the same time, i.e., there is no sequence or
> > >> order.
> > >>>
> > >>> A statement set (consisting of multiple INSERT INTO statements)
> behaves
> > >>> just like a single INSERT INTO statement.
> > >>> Everywhere where an INSERT INTO statement can be executed, it should
> be
> > >>> possible to execute a statement set consisting of multiple INSERT
> INTO
> > >>> statements.
> > >>> That's also why I think that statement sets are orthogonal to
> > >>> multi-statement execution.
> > >>>
> > >>> As I said before, I'm happy to discuss syntax proposals for statement
> > >> sets.
> > >>> However, I think a BEGIN / END syntax for statement sets would
> confuse
> > >>> users who know this syntax from MySQL, SQL Server, or another DBMS.
> > >>>
> > >>> Thanks,
> > >>> Fabian
> > >>>
> > >>>
> > >>> Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <[hidden email]
> >:
> > >>>
> > >>>> Hi Fabian,
> > >>>>
> > >>>> Thanks for starting this discussion. I think this is a very
> important
> > >>>> syntax to support file mode and multi-statement for SQL Client.
> > >>>> I'm +1 to introduce a syntax to group SQL statements to execute
> > >> together.
> > >>>>
> > >>>> As a reference, traditional database systems also have similar
> syntax,
> > >> such
> > >>>> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
> > >>>> transaction [1],
> > >>>> and also "BEGIN ... END" [2] [3] to group a set of SQL statements
> that
> > >>>> execute together.
> > >>>>
> > >>>> Maybe we can also use "BEGIN ... END" syntax which is much simpler?
> > >>>>
> > >>>> Regarding where to implement, I also prefer to have it in Flink SQL
> > >> core,
> > >>>> here are some reasons from my side:
> > >>>> 1) I think many downstream projects (e.g Zeppelin) will have the
> same
> > >>>> requirement. It would be better to have it in core instead of
> > >> reinventing
> > >>>> the wheel by users.
> > >>>> 2) Having it in SQL CLI means it is a standard syntax to support
> > >> statement
> > >>>> set in Flink. So I think it makes sense to have it in core too,
> > >> otherwise,
> > >>>> it looks like a broken feature.
> > >>>>       In 1.10, CREATE VIEW is only supported in SQL CLI, not
> > supported in
> > >>>> TableEnvironment, which confuses many users.
> > >>>> 3) Currently, we are moving statement parsing to use sql-parser
> > >>>> (FLINK-17728). Calcite has a good support for parsing
> > multi-statements.
> > >>>>       It will be tricky to parse multi-statements only in SQL
> Client.
> > >>>>
> > >>>> Best,
> > >>>> Jark
> > >>>>
> > >>>> [1]:
> > >>>>
> > >>>>
> > >>
> >
> https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
> > >>>> [2]:
> > >>>>
> > >>>>
> > >>
> >
> https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
> > >>>> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
> > >>>>
> > >>>> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <[hidden email]>
> > wrote:
> > >>>>
> > >>>>> Hi everyone,
> > >>>>>
> > >>>>> FLIP-84 [1] added the concept of a "statement set" to group
> multiple
> > >>>> INSERT
> > >>>>> INTO statements (SQL or Table API) together. The statements in a
> > >>>> statement
> > >>>>> set are jointly optimized and executed as a single Flink job.
> > >>>>>
> > >>>>> I would like to start a discussion about a SQL syntax to group
> > multiple
> > >>>>> INSERT INTO statements in a statement set. The use case would be to
> > >>>> expose
> > >>>>> the statement set feature to a solely text based client for Flink
> SQL
> > >>>> such
> > >>>>> as Flink's SQL CLI [1].
> > >>>>>
> > >>>>> During the discussion of FLIP-84, we had briefly talked about such
> a
> > >>>> syntax
> > >>>>> [3].
> > >>>>>
> > >>>>> START STATEMENT SET;
> > >>>>> INSERT INTO ... SELECT ...;
> > >>>>> INSERT INTO ... SELECT ...;
> > >>>>> ...
> > >>>>> END STATEMENT SET;
> > >>>>>
> > >>>>> We didn't follow up on this proposal, to keep the focus on the
> > FLIP-84
> > >>>>> Table API changes and to not dive into a discussion about multiline
> > SQL
> > >>>>> query support [4].
> > >>>>>
> > >>>>> While this feature is clearly based on multiple SQL queries, I
> think
> > it
> > >>>> is
> > >>>>> a bit different from what we usually understand as multiline SQL
> > >> support.
> > >>>>> That's because a statement set ends up to be a single Flink job.
> > Hence,
> > >>>>> there is no need on the Flink side to coordinate the execution of
> > >>>> multiple
> > >>>>> jobs (incl. the discussion about blocking or async execution of
> > >> queries).
> > >>>>> Flink would treat the queries in a STATEMENT SET as a single query.
> > >>>>>
> > >>>>> I would like to start a discussion about supporting the [START|END]
> > >>>>> STATEMENT SET syntax (or a different syntax with equivalent
> > semantics)
> > >> in
> > >>>>> Flink.
> > >>>>> I don't have a strong preference whether this should be implemented
> > in
> > >>>>> Flink's SQL core or be a purely client side implementation in the
> CLI
> > >>>>> client. It would be good though to have parser support in Flink for
> > >> this.
> > >>>>>
> > >>>>> What do others think?
> > >>>>>
> > >>>>> [1]
> > >>>>>
> > >>>>
> > >>
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> > >>>>> [2]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
> > >>>>> [3]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
> > >>>>> [4]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>