What is the suggested way to validate SQL?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

What is the suggested way to validate SQL?

Kaibo Zhou
Hi,

As a platform user, I want to integrate Flink SQL with the platform. The usage
scenario is:users register table/udf to catalog service, and then write SQL
scripts like: "insert into xxx select from xxx" through Web SQLEditor, the
platform need to validate the SQL script after each time the user changes
the SQL.

One problem I encountered is SQL validate depend on connector jar which
lead to many problem. More details can see the issue[1] I just submitted.

Another problem I found is when I use `tEnv.sqlUpdate("INSERT INTO
sinkTable SELECT f1,f2 FROM sourceTable");` to do SQL validation, I found
it NOT validate the sinkTable includes schema and table name.

I am confused what is the suggested way to validate a FLINK SQL? Maybe
Flink could provide a suggested way to let SQL be easily integrated by
external platforms.

[1]: https://issues.apache.org/jira/browse/FLINK-15419

Best,
Kaibo
Reply | Threaded
Open this post in threaded view
|

Re: What is the suggested way to validate SQL?

Jingsong Li
Hi Kaibo,

> Validate SQL syntax not need to depend on connector jar
At present, sql function strongly need jar dependent support too , but the
overall approach is still under discussion, and there is no clear plan at
present.
But you are right, it really important for platform users.
Another way is to start a process for each SQL, which contains the user's
jars.

> what is the suggested way to validate a FLINK SQL?
- If you use "StreamTableEnvironment.create", every "sqlUpdate" will
generate execution plan, so it should contains validation.
- If you use "TableEnvironment.create(EnvironmentSettings)". After
"sqlUpdate", will buffer modifyOperations in table env. And I think you can
use "TableEnvironment.explain(boolean)" to validate SQL, it will generate
execution plan, will validate sink too.

Best,
Jingsong Lee

On Fri, Dec 27, 2019 at 5:44 PM Kaibo Zhou <[hidden email]> wrote:

> Hi,
>
> As a platform user, I want to integrate Flink SQL with the platform. The
> usage
> scenario is:users register table/udf to catalog service, and then write SQL
> scripts like: "insert into xxx select from xxx" through Web SQLEditor, the
> platform need to validate the SQL script after each time the user changes
> the SQL.
>
> One problem I encountered is SQL validate depend on connector jar which
> lead to many problem. More details can see the issue[1] I just submitted.
>
> Another problem I found is when I use `tEnv.sqlUpdate("INSERT INTO
> sinkTable SELECT f1,f2 FROM sourceTable");` to do SQL validation, I found
> it NOT validate the sinkTable includes schema and table name.
>
> I am confused what is the suggested way to validate a FLINK SQL? Maybe
> Flink could provide a suggested way to let SQL be easily integrated by
> external platforms.
>
> [1]: https://issues.apache.org/jira/browse/FLINK-15419
>
> Best,
> Kaibo
>


--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: What is the suggested way to validate SQL?

Danny Chan
In reply to this post by Kaibo Zhou
Hi, Kaibo Zhou ~

There are several phrases that a SQL text get to execution graph what can be run with Flink runtime:


1. Sql Parse: parse the sql text to AST(sql node tree)
2. Sql node(row type) validation, this includes the tables/schema inference
3. Sql-to-rel conversion, convert the sql node to RelNode(relational algebra)
4. Promote the relational expression with planner(Volcano or Hep) then converts to execution convention nodes
5. Genegate the code and the execution graph

For the first 3 steps, Apache Flink uses the Apache Calcite as the implementation, that means a SQL test passed to table environment would always have a SQL parse/validation/sql-to-rel conversion.

For example, a code snippet like tableEnv.sqlQuery("INSERT INTO sinkTable SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM sourceTable” was validated.

But you are right, for Flink SQL, an insert statement target table is not validated during the validation phrase, actually we validate the “select” clause first, extract the target table identifier and we validate the schema of “select” clause and target table are the same when we invoke write to sink(after step 4).


For most of the cases this is okey, can you share your cases ? What kind of validation do you want for the insert target table ?

We are planning to include the insert target table validation in the step2 for 2 reasons:

• The computed column validation(stored or virtual)
• The insert implicit type coercion

But this would comes for Flink version 1.11 ~


Best,
Danny Chan
在 2019年12月27日 +0800 PM5:44,[hidden email],写道:
>
> "INSERT INTO
> sinkTable SELECT f1,f2 FROM sourceTable"
Reply | Threaded
Open this post in threaded view
|

Re: What is the suggested way to validate SQL?

Kaibo Zhou
Hi, Jingsong,

Thank you very much for your suggestion.

I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` to do
validation, it works.
But this method needs the connector jar, which is very inconvenient to use.


Hi, Danny,

Many thanks for providing very useful explanations.

The user case is users will register some source/sink tables, udf to
catalog service first, and then they will write and modify SQL like "insert
into sinkTable select * from sourceTable where a>1" on Web SQLEditor. The
platform wants to tell the user whether the SQL is valid includes the
detailed position if an error occurs.

For the `insert target table`, the platform wants to validate the table
exists, field name and field type.

Best,
Kaibo

Danny Chan <[hidden email]> 于2019年12月30日周一 下午5:37写道:

> Hi, Kaibo Zhou ~
>
> There are several phrases that a SQL text get to execution graph what can
> be run with Flink runtime:
>
>
> 1. Sql Parse: parse the sql text to AST(sql node tree)
> 2. Sql node(row type) validation, this includes the tables/schema inference
> 3. Sql-to-rel conversion, convert the sql node to RelNode(relational
> algebra)
> 4. Promote the relational expression with planner(Volcano or Hep) then
> converts to execution convention nodes
> 5. Genegate the code and the execution graph
>
> For the first 3 steps, Apache Flink uses the Apache Calcite as the
> implementation, that means a SQL test passed to table environment would
> always have a SQL parse/validation/sql-to-rel conversion.
>
> For example, a code snippet like tableEnv.sqlQuery("INSERT INTO sinkTable
> SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM
> sourceTable” was validated.
>
> But you are right, for Flink SQL, an insert statement target table is not
> validated during the validation phrase, actually we validate the “select”
> clause first, extract the target table identifier and we validate the
> schema of “select” clause and target table are the same when we invoke
> write to sink(after step 4).
>
>
> For most of the cases this is okey, can you share your cases ? What kind
> of validation do you want for the insert target table ?
>
> We are planning to include the insert target table validation in the step2
> for 2 reasons:
>
> • The computed column validation(stored or virtual)
> • The insert implicit type coercion
>
> But this would comes for Flink version 1.11 ~
>
>
> Best,
> Danny Chan
> 在 2019年12月27日 +0800 PM5:44,[hidden email],写道:
> >
> > "INSERT INTO
> > sinkTable SELECT f1,f2 FROM sourceTable"
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the suggested way to validate SQL?

Arvid Heise-3
A common approach is to add the connector jar as test dependencies and have
a smoke test that just starts the job with a temporary external system
spawned with docker. I usually use test containers [1]. Then you simply
need to execute the integration tests in your IDE and usually can even
debug non-obvious errors.

[1] https://www.testcontainers.org/

On Mon, Dec 30, 2019 at 1:39 PM Kaibo Zhou <[hidden email]> wrote:

> Hi, Jingsong,
>
> Thank you very much for your suggestion.
>
> I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` to do
> validation, it works.
> But this method needs the connector jar, which is very inconvenient to use.
>
>
> Hi, Danny,
>
> Many thanks for providing very useful explanations.
>
> The user case is users will register some source/sink tables, udf to
> catalog service first, and then they will write and modify SQL like "insert
> into sinkTable select * from sourceTable where a>1" on Web SQLEditor. The
> platform wants to tell the user whether the SQL is valid includes the
> detailed position if an error occurs.
>
> For the `insert target table`, the platform wants to validate the table
> exists, field name and field type.
>
> Best,
> Kaibo
>
> Danny Chan <[hidden email]> 于2019年12月30日周一 下午5:37写道:
>
> > Hi, Kaibo Zhou ~
> >
> > There are several phrases that a SQL text get to execution graph what can
> > be run with Flink runtime:
> >
> >
> > 1. Sql Parse: parse the sql text to AST(sql node tree)
> > 2. Sql node(row type) validation, this includes the tables/schema
> inference
> > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational
> > algebra)
> > 4. Promote the relational expression with planner(Volcano or Hep) then
> > converts to execution convention nodes
> > 5. Genegate the code and the execution graph
> >
> > For the first 3 steps, Apache Flink uses the Apache Calcite as the
> > implementation, that means a SQL test passed to table environment would
> > always have a SQL parse/validation/sql-to-rel conversion.
> >
> > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO sinkTable
> > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM
> > sourceTable” was validated.
> >
> > But you are right, for Flink SQL, an insert statement target table is not
> > validated during the validation phrase, actually we validate the “select”
> > clause first, extract the target table identifier and we validate the
> > schema of “select” clause and target table are the same when we invoke
> > write to sink(after step 4).
> >
> >
> > For most of the cases this is okey, can you share your cases ? What kind
> > of validation do you want for the insert target table ?
> >
> > We are planning to include the insert target table validation in the
> step2
> > for 2 reasons:
> >
> > • The computed column validation(stored or virtual)
> > • The insert implicit type coercion
> >
> > But this would comes for Flink version 1.11 ~
> >
> >
> > Best,
> > Danny Chan
> > 在 2019年12月27日 +0800 PM5:44,[hidden email],写道:
> > >
> > > "INSERT INTO
> > > sinkTable SELECT f1,f2 FROM sourceTable"
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the suggested way to validate SQL?

godfreyhe
hi kaibo,
As we discuss offline, I think it's a clean way that flink-table provides
an interface (or a tool) to do the sql validation for platform users.
`tEnv.sqlUpdate` or `tEnv.explain(false)` is a temporary solution which
contains too many unrelated logic (just consider the functionality whether
a sql is valid).

Best,
godfrey



Arvid Heise <[hidden email]> 于2020年1月8日周三 下午9:40写道:

> A common approach is to add the connector jar as test dependencies and have
> a smoke test that just starts the job with a temporary external system
> spawned with docker. I usually use test containers [1]. Then you simply
> need to execute the integration tests in your IDE and usually can even
> debug non-obvious errors.
>
> [1] https://www.testcontainers.org/
>
> On Mon, Dec 30, 2019 at 1:39 PM Kaibo Zhou <[hidden email]> wrote:
>
> > Hi, Jingsong,
> >
> > Thank you very much for your suggestion.
> >
> > I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` to
> do
> > validation, it works.
> > But this method needs the connector jar, which is very inconvenient to
> use.
> >
> >
> > Hi, Danny,
> >
> > Many thanks for providing very useful explanations.
> >
> > The user case is users will register some source/sink tables, udf to
> > catalog service first, and then they will write and modify SQL like
> "insert
> > into sinkTable select * from sourceTable where a>1" on Web SQLEditor. The
> > platform wants to tell the user whether the SQL is valid includes the
> > detailed position if an error occurs.
> >
> > For the `insert target table`, the platform wants to validate the table
> > exists, field name and field type.
> >
> > Best,
> > Kaibo
> >
> > Danny Chan <[hidden email]> 于2019年12月30日周一 下午5:37写道:
> >
> > > Hi, Kaibo Zhou ~
> > >
> > > There are several phrases that a SQL text get to execution graph what
> can
> > > be run with Flink runtime:
> > >
> > >
> > > 1. Sql Parse: parse the sql text to AST(sql node tree)
> > > 2. Sql node(row type) validation, this includes the tables/schema
> > inference
> > > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational
> > > algebra)
> > > 4. Promote the relational expression with planner(Volcano or Hep) then
> > > converts to execution convention nodes
> > > 5. Genegate the code and the execution graph
> > >
> > > For the first 3 steps, Apache Flink uses the Apache Calcite as the
> > > implementation, that means a SQL test passed to table environment would
> > > always have a SQL parse/validation/sql-to-rel conversion.
> > >
> > > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO
> sinkTable
> > > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM
> > > sourceTable” was validated.
> > >
> > > But you are right, for Flink SQL, an insert statement target table is
> not
> > > validated during the validation phrase, actually we validate the
> “select”
> > > clause first, extract the target table identifier and we validate the
> > > schema of “select” clause and target table are the same when we invoke
> > > write to sink(after step 4).
> > >
> > >
> > > For most of the cases this is okey, can you share your cases ? What
> kind
> > > of validation do you want for the insert target table ?
> > >
> > > We are planning to include the insert target table validation in the
> > step2
> > > for 2 reasons:
> > >
> > > • The computed column validation(stored or virtual)
> > > • The insert implicit type coercion
> > >
> > > But this would comes for Flink version 1.11 ~
> > >
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年12月27日 +0800 PM5:44,[hidden email],写道:
> > > >
> > > > "INSERT INTO
> > > > sinkTable SELECT f1,f2 FROM sourceTable"
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: What is the suggested way to validate SQL?

Fabian Hueske-2
Hi all,

Yes, I agree. It would be good to have dedicated methods to check the
validity of SQL queries.
I would propose to have two validation methods:

1. syntactic and semantic validation of a SQL query, i.e., SQL keywords,
catalog information, types in expressions and functions, etc. This is a
lightweight check and Calcite can give us good error messages.
2. validation that the query can be executed by Flink. This check involves
calling the optimizer, which can be quite expensive. This check could also
include a check whether a query produces append-only or retraction results.

The second method checks everything that the first one checks but is much
more expensive as it calls the optimizer.

Best, Fabian

Am Do., 9. Jan. 2020 um 05:20 Uhr schrieb godfrey he <[hidden email]>:

> hi kaibo,
> As we discuss offline, I think it's a clean way that flink-table provides
> an interface (or a tool) to do the sql validation for platform users.
> `tEnv.sqlUpdate` or `tEnv.explain(false)` is a temporary solution which
> contains too many unrelated logic (just consider the functionality whether
> a sql is valid).
>
> Best,
> godfrey
>
>
>
> Arvid Heise <[hidden email]> 于2020年1月8日周三 下午9:40写道:
>
> > A common approach is to add the connector jar as test dependencies and
> have
> > a smoke test that just starts the job with a temporary external system
> > spawned with docker. I usually use test containers [1]. Then you simply
> > need to execute the integration tests in your IDE and usually can even
> > debug non-obvious errors.
> >
> > [1] https://www.testcontainers.org/
> >
> > On Mon, Dec 30, 2019 at 1:39 PM Kaibo Zhou <[hidden email]> wrote:
> >
> > > Hi, Jingsong,
> > >
> > > Thank you very much for your suggestion.
> > >
> > > I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)`
> to
> > do
> > > validation, it works.
> > > But this method needs the connector jar, which is very inconvenient to
> > use.
> > >
> > >
> > > Hi, Danny,
> > >
> > > Many thanks for providing very useful explanations.
> > >
> > > The user case is users will register some source/sink tables, udf to
> > > catalog service first, and then they will write and modify SQL like
> > "insert
> > > into sinkTable select * from sourceTable where a>1" on Web SQLEditor.
> The
> > > platform wants to tell the user whether the SQL is valid includes the
> > > detailed position if an error occurs.
> > >
> > > For the `insert target table`, the platform wants to validate the table
> > > exists, field name and field type.
> > >
> > > Best,
> > > Kaibo
> > >
> > > Danny Chan <[hidden email]> 于2019年12月30日周一 下午5:37写道:
> > >
> > > > Hi, Kaibo Zhou ~
> > > >
> > > > There are several phrases that a SQL text get to execution graph what
> > can
> > > > be run with Flink runtime:
> > > >
> > > >
> > > > 1. Sql Parse: parse the sql text to AST(sql node tree)
> > > > 2. Sql node(row type) validation, this includes the tables/schema
> > > inference
> > > > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational
> > > > algebra)
> > > > 4. Promote the relational expression with planner(Volcano or Hep)
> then
> > > > converts to execution convention nodes
> > > > 5. Genegate the code and the execution graph
> > > >
> > > > For the first 3 steps, Apache Flink uses the Apache Calcite as the
> > > > implementation, that means a SQL test passed to table environment
> would
> > > > always have a SQL parse/validation/sql-to-rel conversion.
> > > >
> > > > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO
> > sinkTable
> > > > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM
> > > > sourceTable” was validated.
> > > >
> > > > But you are right, for Flink SQL, an insert statement target table is
> > not
> > > > validated during the validation phrase, actually we validate the
> > “select”
> > > > clause first, extract the target table identifier and we validate the
> > > > schema of “select” clause and target table are the same when we
> invoke
> > > > write to sink(after step 4).
> > > >
> > > >
> > > > For most of the cases this is okey, can you share your cases ? What
> > kind
> > > > of validation do you want for the insert target table ?
> > > >
> > > > We are planning to include the insert target table validation in the
> > > step2
> > > > for 2 reasons:
> > > >
> > > > • The computed column validation(stored or virtual)
> > > > • The insert implicit type coercion
> > > >
> > > > But this would comes for Flink version 1.11 ~
> > > >
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年12月27日 +0800 PM5:44,[hidden email],写道:
> > > > >
> > > > > "INSERT INTO
> > > > > sinkTable SELECT f1,f2 FROM sourceTable"
> > > >
> > >
> >
>