Hi,
As a platform user, I want to integrate Flink SQL with the platform. The usage scenario is:users register table/udf to catalog service, and then write SQL scripts like: "insert into xxx select from xxx" through Web SQLEditor, the platform need to validate the SQL script after each time the user changes the SQL. One problem I encountered is SQL validate depend on connector jar which lead to many problem. More details can see the issue[1] I just submitted. Another problem I found is when I use `tEnv.sqlUpdate("INSERT INTO sinkTable SELECT f1,f2 FROM sourceTable");` to do SQL validation, I found it NOT validate the sinkTable includes schema and table name. I am confused what is the suggested way to validate a FLINK SQL? Maybe Flink could provide a suggested way to let SQL be easily integrated by external platforms. [1]: https://issues.apache.org/jira/browse/FLINK-15419 Best, Kaibo |
Hi Kaibo,
> Validate SQL syntax not need to depend on connector jar At present, sql function strongly need jar dependent support too , but the overall approach is still under discussion, and there is no clear plan at present. But you are right, it really important for platform users. Another way is to start a process for each SQL, which contains the user's jars. > what is the suggested way to validate a FLINK SQL? - If you use "StreamTableEnvironment.create", every "sqlUpdate" will generate execution plan, so it should contains validation. - If you use "TableEnvironment.create(EnvironmentSettings)". After "sqlUpdate", will buffer modifyOperations in table env. And I think you can use "TableEnvironment.explain(boolean)" to validate SQL, it will generate execution plan, will validate sink too. Best, Jingsong Lee On Fri, Dec 27, 2019 at 5:44 PM Kaibo Zhou <[hidden email]> wrote: > Hi, > > As a platform user, I want to integrate Flink SQL with the platform. The > usage > scenario is:users register table/udf to catalog service, and then write SQL > scripts like: "insert into xxx select from xxx" through Web SQLEditor, the > platform need to validate the SQL script after each time the user changes > the SQL. > > One problem I encountered is SQL validate depend on connector jar which > lead to many problem. More details can see the issue[1] I just submitted. > > Another problem I found is when I use `tEnv.sqlUpdate("INSERT INTO > sinkTable SELECT f1,f2 FROM sourceTable");` to do SQL validation, I found > it NOT validate the sinkTable includes schema and table name. > > I am confused what is the suggested way to validate a FLINK SQL? Maybe > Flink could provide a suggested way to let SQL be easily integrated by > external platforms. > > [1]: https://issues.apache.org/jira/browse/FLINK-15419 > > Best, > Kaibo > -- Best, Jingsong Lee |
In reply to this post by Kaibo Zhou
Hi, Kaibo Zhou ~
There are several phrases that a SQL text get to execution graph what can be run with Flink runtime: 1. Sql Parse: parse the sql text to AST(sql node tree) 2. Sql node(row type) validation, this includes the tables/schema inference 3. Sql-to-rel conversion, convert the sql node to RelNode(relational algebra) 4. Promote the relational expression with planner(Volcano or Hep) then converts to execution convention nodes 5. Genegate the code and the execution graph For the first 3 steps, Apache Flink uses the Apache Calcite as the implementation, that means a SQL test passed to table environment would always have a SQL parse/validation/sql-to-rel conversion. For example, a code snippet like tableEnv.sqlQuery("INSERT INTO sinkTable SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM sourceTable” was validated. But you are right, for Flink SQL, an insert statement target table is not validated during the validation phrase, actually we validate the “select” clause first, extract the target table identifier and we validate the schema of “select” clause and target table are the same when we invoke write to sink(after step 4). For most of the cases this is okey, can you share your cases ? What kind of validation do you want for the insert target table ? We are planning to include the insert target table validation in the step2 for 2 reasons: • The computed column validation(stored or virtual) • The insert implicit type coercion But this would comes for Flink version 1.11 ~ Best, Danny Chan 在 2019年12月27日 +0800 PM5:44,[hidden email],写道: > > "INSERT INTO > sinkTable SELECT f1,f2 FROM sourceTable" |
Hi, Jingsong,
Thank you very much for your suggestion. I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` to do validation, it works. But this method needs the connector jar, which is very inconvenient to use. Hi, Danny, Many thanks for providing very useful explanations. The user case is users will register some source/sink tables, udf to catalog service first, and then they will write and modify SQL like "insert into sinkTable select * from sourceTable where a>1" on Web SQLEditor. The platform wants to tell the user whether the SQL is valid includes the detailed position if an error occurs. For the `insert target table`, the platform wants to validate the table exists, field name and field type. Best, Kaibo Danny Chan <[hidden email]> 于2019年12月30日周一 下午5:37写道: > Hi, Kaibo Zhou ~ > > There are several phrases that a SQL text get to execution graph what can > be run with Flink runtime: > > > 1. Sql Parse: parse the sql text to AST(sql node tree) > 2. Sql node(row type) validation, this includes the tables/schema inference > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational > algebra) > 4. Promote the relational expression with planner(Volcano or Hep) then > converts to execution convention nodes > 5. Genegate the code and the execution graph > > For the first 3 steps, Apache Flink uses the Apache Calcite as the > implementation, that means a SQL test passed to table environment would > always have a SQL parse/validation/sql-to-rel conversion. > > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO sinkTable > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM > sourceTable” was validated. > > But you are right, for Flink SQL, an insert statement target table is not > validated during the validation phrase, actually we validate the “select” > clause first, extract the target table identifier and we validate the > schema of “select” clause and target table are the same when we invoke > write to sink(after step 4). > > > For most of the cases this is okey, can you share your cases ? What kind > of validation do you want for the insert target table ? > > We are planning to include the insert target table validation in the step2 > for 2 reasons: > > • The computed column validation(stored or virtual) > • The insert implicit type coercion > > But this would comes for Flink version 1.11 ~ > > > Best, > Danny Chan > 在 2019年12月27日 +0800 PM5:44,[hidden email],写道: > > > > "INSERT INTO > > sinkTable SELECT f1,f2 FROM sourceTable" > |
A common approach is to add the connector jar as test dependencies and have
a smoke test that just starts the job with a temporary external system spawned with docker. I usually use test containers [1]. Then you simply need to execute the integration tests in your IDE and usually can even debug non-obvious errors. [1] https://www.testcontainers.org/ On Mon, Dec 30, 2019 at 1:39 PM Kaibo Zhou <[hidden email]> wrote: > Hi, Jingsong, > > Thank you very much for your suggestion. > > I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` to do > validation, it works. > But this method needs the connector jar, which is very inconvenient to use. > > > Hi, Danny, > > Many thanks for providing very useful explanations. > > The user case is users will register some source/sink tables, udf to > catalog service first, and then they will write and modify SQL like "insert > into sinkTable select * from sourceTable where a>1" on Web SQLEditor. The > platform wants to tell the user whether the SQL is valid includes the > detailed position if an error occurs. > > For the `insert target table`, the platform wants to validate the table > exists, field name and field type. > > Best, > Kaibo > > Danny Chan <[hidden email]> 于2019年12月30日周一 下午5:37写道: > > > Hi, Kaibo Zhou ~ > > > > There are several phrases that a SQL text get to execution graph what can > > be run with Flink runtime: > > > > > > 1. Sql Parse: parse the sql text to AST(sql node tree) > > 2. Sql node(row type) validation, this includes the tables/schema > inference > > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational > > algebra) > > 4. Promote the relational expression with planner(Volcano or Hep) then > > converts to execution convention nodes > > 5. Genegate the code and the execution graph > > > > For the first 3 steps, Apache Flink uses the Apache Calcite as the > > implementation, that means a SQL test passed to table environment would > > always have a SQL parse/validation/sql-to-rel conversion. > > > > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO sinkTable > > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM > > sourceTable” was validated. > > > > But you are right, for Flink SQL, an insert statement target table is not > > validated during the validation phrase, actually we validate the “select” > > clause first, extract the target table identifier and we validate the > > schema of “select” clause and target table are the same when we invoke > > write to sink(after step 4). > > > > > > For most of the cases this is okey, can you share your cases ? What kind > > of validation do you want for the insert target table ? > > > > We are planning to include the insert target table validation in the > step2 > > for 2 reasons: > > > > • The computed column validation(stored or virtual) > > • The insert implicit type coercion > > > > But this would comes for Flink version 1.11 ~ > > > > > > Best, > > Danny Chan > > 在 2019年12月27日 +0800 PM5:44,[hidden email],写道: > > > > > > "INSERT INTO > > > sinkTable SELECT f1,f2 FROM sourceTable" > > > |
hi kaibo,
As we discuss offline, I think it's a clean way that flink-table provides an interface (or a tool) to do the sql validation for platform users. `tEnv.sqlUpdate` or `tEnv.explain(false)` is a temporary solution which contains too many unrelated logic (just consider the functionality whether a sql is valid). Best, godfrey Arvid Heise <[hidden email]> 于2020年1月8日周三 下午9:40写道: > A common approach is to add the connector jar as test dependencies and have > a smoke test that just starts the job with a temporary external system > spawned with docker. I usually use test containers [1]. Then you simply > need to execute the integration tests in your IDE and usually can even > debug non-obvious errors. > > [1] https://www.testcontainers.org/ > > On Mon, Dec 30, 2019 at 1:39 PM Kaibo Zhou <[hidden email]> wrote: > > > Hi, Jingsong, > > > > Thank you very much for your suggestion. > > > > I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` to > do > > validation, it works. > > But this method needs the connector jar, which is very inconvenient to > use. > > > > > > Hi, Danny, > > > > Many thanks for providing very useful explanations. > > > > The user case is users will register some source/sink tables, udf to > > catalog service first, and then they will write and modify SQL like > "insert > > into sinkTable select * from sourceTable where a>1" on Web SQLEditor. The > > platform wants to tell the user whether the SQL is valid includes the > > detailed position if an error occurs. > > > > For the `insert target table`, the platform wants to validate the table > > exists, field name and field type. > > > > Best, > > Kaibo > > > > Danny Chan <[hidden email]> 于2019年12月30日周一 下午5:37写道: > > > > > Hi, Kaibo Zhou ~ > > > > > > There are several phrases that a SQL text get to execution graph what > can > > > be run with Flink runtime: > > > > > > > > > 1. Sql Parse: parse the sql text to AST(sql node tree) > > > 2. Sql node(row type) validation, this includes the tables/schema > > inference > > > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational > > > algebra) > > > 4. Promote the relational expression with planner(Volcano or Hep) then > > > converts to execution convention nodes > > > 5. Genegate the code and the execution graph > > > > > > For the first 3 steps, Apache Flink uses the Apache Calcite as the > > > implementation, that means a SQL test passed to table environment would > > > always have a SQL parse/validation/sql-to-rel conversion. > > > > > > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO > sinkTable > > > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM > > > sourceTable” was validated. > > > > > > But you are right, for Flink SQL, an insert statement target table is > not > > > validated during the validation phrase, actually we validate the > “select” > > > clause first, extract the target table identifier and we validate the > > > schema of “select” clause and target table are the same when we invoke > > > write to sink(after step 4). > > > > > > > > > For most of the cases this is okey, can you share your cases ? What > kind > > > of validation do you want for the insert target table ? > > > > > > We are planning to include the insert target table validation in the > > step2 > > > for 2 reasons: > > > > > > • The computed column validation(stored or virtual) > > > • The insert implicit type coercion > > > > > > But this would comes for Flink version 1.11 ~ > > > > > > > > > Best, > > > Danny Chan > > > 在 2019年12月27日 +0800 PM5:44,[hidden email],写道: > > > > > > > > "INSERT INTO > > > > sinkTable SELECT f1,f2 FROM sourceTable" > > > > > > |
Hi all,
Yes, I agree. It would be good to have dedicated methods to check the validity of SQL queries. I would propose to have two validation methods: 1. syntactic and semantic validation of a SQL query, i.e., SQL keywords, catalog information, types in expressions and functions, etc. This is a lightweight check and Calcite can give us good error messages. 2. validation that the query can be executed by Flink. This check involves calling the optimizer, which can be quite expensive. This check could also include a check whether a query produces append-only or retraction results. The second method checks everything that the first one checks but is much more expensive as it calls the optimizer. Best, Fabian Am Do., 9. Jan. 2020 um 05:20 Uhr schrieb godfrey he <[hidden email]>: > hi kaibo, > As we discuss offline, I think it's a clean way that flink-table provides > an interface (or a tool) to do the sql validation for platform users. > `tEnv.sqlUpdate` or `tEnv.explain(false)` is a temporary solution which > contains too many unrelated logic (just consider the functionality whether > a sql is valid). > > Best, > godfrey > > > > Arvid Heise <[hidden email]> 于2020年1月8日周三 下午9:40写道: > > > A common approach is to add the connector jar as test dependencies and > have > > a smoke test that just starts the job with a temporary external system > > spawned with docker. I usually use test containers [1]. Then you simply > > need to execute the integration tests in your IDE and usually can even > > debug non-obvious errors. > > > > [1] https://www.testcontainers.org/ > > > > On Mon, Dec 30, 2019 at 1:39 PM Kaibo Zhou <[hidden email]> wrote: > > > > > Hi, Jingsong, > > > > > > Thank you very much for your suggestion. > > > > > > I verified that use `tEnv.sqlUpdate("xxx")` and `tEnv.explain(false)` > to > > do > > > validation, it works. > > > But this method needs the connector jar, which is very inconvenient to > > use. > > > > > > > > > Hi, Danny, > > > > > > Many thanks for providing very useful explanations. > > > > > > The user case is users will register some source/sink tables, udf to > > > catalog service first, and then they will write and modify SQL like > > "insert > > > into sinkTable select * from sourceTable where a>1" on Web SQLEditor. > The > > > platform wants to tell the user whether the SQL is valid includes the > > > detailed position if an error occurs. > > > > > > For the `insert target table`, the platform wants to validate the table > > > exists, field name and field type. > > > > > > Best, > > > Kaibo > > > > > > Danny Chan <[hidden email]> 于2019年12月30日周一 下午5:37写道: > > > > > > > Hi, Kaibo Zhou ~ > > > > > > > > There are several phrases that a SQL text get to execution graph what > > can > > > > be run with Flink runtime: > > > > > > > > > > > > 1. Sql Parse: parse the sql text to AST(sql node tree) > > > > 2. Sql node(row type) validation, this includes the tables/schema > > > inference > > > > 3. Sql-to-rel conversion, convert the sql node to RelNode(relational > > > > algebra) > > > > 4. Promote the relational expression with planner(Volcano or Hep) > then > > > > converts to execution convention nodes > > > > 5. Genegate the code and the execution graph > > > > > > > > For the first 3 steps, Apache Flink uses the Apache Calcite as the > > > > implementation, that means a SQL test passed to table environment > would > > > > always have a SQL parse/validation/sql-to-rel conversion. > > > > > > > > For example, a code snippet like tableEnv.sqlQuery("INSERT INTO > > sinkTable > > > > SELECT f1,f2 FROM sourceTable”), the query part “SELECT f1,f2 FROM > > > > sourceTable” was validated. > > > > > > > > But you are right, for Flink SQL, an insert statement target table is > > not > > > > validated during the validation phrase, actually we validate the > > “select” > > > > clause first, extract the target table identifier and we validate the > > > > schema of “select” clause and target table are the same when we > invoke > > > > write to sink(after step 4). > > > > > > > > > > > > For most of the cases this is okey, can you share your cases ? What > > kind > > > > of validation do you want for the insert target table ? > > > > > > > > We are planning to include the insert target table validation in the > > > step2 > > > > for 2 reasons: > > > > > > > > • The computed column validation(stored or virtual) > > > > • The insert implicit type coercion > > > > > > > > But this would comes for Flink version 1.11 ~ > > > > > > > > > > > > Best, > > > > Danny Chan > > > > 在 2019年12月27日 +0800 PM5:44,[hidden email],写道: > > > > > > > > > > "INSERT INTO > > > > > sinkTable SELECT f1,f2 FROM sourceTable" > > > > > > > > > > |
Free forum by Nabble | Edit this page |