Hi Shengkai,
thanks for updating the FLIP. I have one last comment for the option `table.execution.mode`. Should we already use the global Flink option `execution.runtime-mode` instead? We are using Flink's options where possible (e.g. `pipeline.name` and `parallism.default`) why not also for batch/streaming mode? The description of the option matches to the Blink planner behavior: ``` Among other things, this controls task scheduling, network shuffle behavior, and time semantics. ``` Regards, Timo On 10.02.21 06:30, Shengkai Fang wrote: > Hi, guys. > > I have updated the FLIP. It seems we have reached agreement. Maybe we can > start the vote soon. If anyone has other questions, please leave your > comments. > > Best, > Shengkai > > Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > >> Hi guys, >> >> The conclusion sounds good to me. >> >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> wrote: >> >>> Hi, Timo, Jark. >>> >>> I am fine with the new option name. >>> >>> Best, >>> Shengkai >>> >>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: >>> >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. >>>> >>>> @Rui, Shengkai: Are you also fine with this conclusion? >>>> >>>> Thanks, >>>> Timo >>>> >>>> On 09.02.21 10:14, Jark Wu wrote: >>>>> I'm fine with `table.multi-dml-sync`. >>>>> >>>>> My previous concern about "multi" is that DML in CLI looks like >> single >>>>> statement. >>>>> But we can treat CLI as a multi-line accepting statements from >> opening >>> to >>>>> closing. >>>>> Thus, I'm fine with `table.multi-dml-sync`. >>>>> >>>>> So the conclusion is `table.multi-dml-sync` (false by default), and >> we >>>> will >>>>> support this config >>>>> in SQL CLI first, will support it in >> TableEnvironment#executeMultiSql() >>>> in >>>>> the future, right? >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email]> >> wrote: >>>>> >>>>>> Hi everyone, >>>>>> >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply to >>>>>> regular `executeSql`. Actually, this option makes only sense when >>>>>> executing multi statements. Once we have a >>>>>> `TableEnvironment.executeMultiSql()` this config could be >> considered. >>>>>> >>>>>> Maybe we can find a better generic name? Other platforms will also >>> need >>>>>> to have this config option, which is why I would like to avoid a SQL >>>>>> Client specific option. Otherwise every platform has to come up with >>>>>> this important config option separately. >>>>>> >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other >>> opinions? >>>>>> >>>>>> Regards, >>>>>> Timo >>>>>> >>>>>> On 09.02.21 08:50, Shengkai Fang wrote: >>>>>>> Hi, all. >>>>>>> >>>>>>> I think it may cause user confused. The main problem is we have no >>>> means >>>>>>> to detect the conflict configuration, e.g. users set the option >> true >>>> and >>>>>>> use `TableResult#await` together. >>>>>>> >>>>>>> Best, >>>>>>> Shengkai. >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> >> -- >> Best regards! >> Rui Li >> > |
Hi all,
I have a couple questions about the SHOW CREATE TABLE statement. 1) Contrary to the example in the FLIP I think the returned DDL should always have the table identifier fully-qualified. Otherwise the DDL depends on the current context (catalog/database), which could be surprising, especially since "the same" table can behave differently if created in different catalogs. 2) How should this handle tables which cannot be fully characterized by properties only? I don't know if there's an example for this yet, but hypothetically this is not currently a requirement, right? This isn't as much of a problem if this syntax is SQL-client-specific, but if it's general Flink SQL syntax we should consider this (one way or another). Regards Ingo On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> wrote: > Hi Shengkai, > > thanks for updating the FLIP. > > I have one last comment for the option `table.execution.mode`. Should we > already use the global Flink option `execution.runtime-mode` instead? > > We are using Flink's options where possible (e.g. `pipeline.name` and > `parallism.default`) why not also for batch/streaming mode? > > The description of the option matches to the Blink planner behavior: > > ``` > Among other things, this controls task scheduling, network shuffle > behavior, and time semantics. > ``` > > Regards, > Timo > > On 10.02.21 06:30, Shengkai Fang wrote: > > Hi, guys. > > > > I have updated the FLIP. It seems we have reached agreement. Maybe we > can > > start the vote soon. If anyone has other questions, please leave your > > comments. > > > > Best, > > Shengkai > > > > Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > > > >> Hi guys, > >> > >> The conclusion sounds good to me. > >> > >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> wrote: > >> > >>> Hi, Timo, Jark. > >>> > >>> I am fine with the new option name. > >>> > >>> Best, > >>> Shengkai > >>> > >>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > >>> > >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. > >>>> > >>>> @Rui, Shengkai: Are you also fine with this conclusion? > >>>> > >>>> Thanks, > >>>> Timo > >>>> > >>>> On 09.02.21 10:14, Jark Wu wrote: > >>>>> I'm fine with `table.multi-dml-sync`. > >>>>> > >>>>> My previous concern about "multi" is that DML in CLI looks like > >> single > >>>>> statement. > >>>>> But we can treat CLI as a multi-line accepting statements from > >> opening > >>> to > >>>>> closing. > >>>>> Thus, I'm fine with `table.multi-dml-sync`. > >>>>> > >>>>> So the conclusion is `table.multi-dml-sync` (false by default), and > >> we > >>>> will > >>>>> support this config > >>>>> in SQL CLI first, will support it in > >> TableEnvironment#executeMultiSql() > >>>> in > >>>>> the future, right? > >>>>> > >>>>> Best, > >>>>> Jark > >>>>> > >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email]> > >> wrote: > >>>>> > >>>>>> Hi everyone, > >>>>>> > >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply to > >>>>>> regular `executeSql`. Actually, this option makes only sense when > >>>>>> executing multi statements. Once we have a > >>>>>> `TableEnvironment.executeMultiSql()` this config could be > >> considered. > >>>>>> > >>>>>> Maybe we can find a better generic name? Other platforms will also > >>> need > >>>>>> to have this config option, which is why I would like to avoid a SQL > >>>>>> Client specific option. Otherwise every platform has to come up with > >>>>>> this important config option separately. > >>>>>> > >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other > >>> opinions? > >>>>>> > >>>>>> Regards, > >>>>>> Timo > >>>>>> > >>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > >>>>>>> Hi, all. > >>>>>>> > >>>>>>> I think it may cause user confused. The main problem is we have no > >>>> means > >>>>>>> to detect the conflict configuration, e.g. users set the option > >> true > >>>> and > >>>>>>> use `TableResult#await` together. > >>>>>>> > >>>>>>> Best, > >>>>>>> Shengkai. > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > >> > >> -- > >> Best regards! > >> Rui Li > >> > > > > |
Hi Ingo,
1) I think you are right, the table path should be fully-qualified. 2) I think this is also a good point. The SHOW CREATE TABLE only aims to print DDL for the tables registered using SQL CREATE TABLE DDL. If a table is registered using Table API, e.g. `StreamTableEnvironment#createTemporaryView(String, DataStream)`, currently it's not possible to print DDL for such tables. I think we should point it out in the FLIP. Best, Jark On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: > Hi all, > > I have a couple questions about the SHOW CREATE TABLE statement. > > 1) Contrary to the example in the FLIP I think the returned DDL should > always have the table identifier fully-qualified. Otherwise the DDL depends > on the current context (catalog/database), which could be surprising, > especially since "the same" table can behave differently if created in > different catalogs. > 2) How should this handle tables which cannot be fully characterized by > properties only? I don't know if there's an example for this yet, but > hypothetically this is not currently a requirement, right? This isn't as > much of a problem if this syntax is SQL-client-specific, but if it's > general Flink SQL syntax we should consider this (one way or another). > > > Regards > Ingo > > On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> wrote: > > > Hi Shengkai, > > > > thanks for updating the FLIP. > > > > I have one last comment for the option `table.execution.mode`. Should we > > already use the global Flink option `execution.runtime-mode` instead? > > > > We are using Flink's options where possible (e.g. `pipeline.name` and > > `parallism.default`) why not also for batch/streaming mode? > > > > The description of the option matches to the Blink planner behavior: > > > > ``` > > Among other things, this controls task scheduling, network shuffle > > behavior, and time semantics. > > ``` > > > > Regards, > > Timo > > > > On 10.02.21 06:30, Shengkai Fang wrote: > > > Hi, guys. > > > > > > I have updated the FLIP. It seems we have reached agreement. Maybe we > > can > > > start the vote soon. If anyone has other questions, please leave your > > > comments. > > > > > > Best, > > > Shengkai > > > > > > Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > > > > > >> Hi guys, > > >> > > >> The conclusion sounds good to me. > > >> > > >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> > wrote: > > >> > > >>> Hi, Timo, Jark. > > >>> > > >>> I am fine with the new option name. > > >>> > > >>> Best, > > >>> Shengkai > > >>> > > >>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > > >>> > > >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. > > >>>> > > >>>> @Rui, Shengkai: Are you also fine with this conclusion? > > >>>> > > >>>> Thanks, > > >>>> Timo > > >>>> > > >>>> On 09.02.21 10:14, Jark Wu wrote: > > >>>>> I'm fine with `table.multi-dml-sync`. > > >>>>> > > >>>>> My previous concern about "multi" is that DML in CLI looks like > > >> single > > >>>>> statement. > > >>>>> But we can treat CLI as a multi-line accepting statements from > > >> opening > > >>> to > > >>>>> closing. > > >>>>> Thus, I'm fine with `table.multi-dml-sync`. > > >>>>> > > >>>>> So the conclusion is `table.multi-dml-sync` (false by default), and > > >> we > > >>>> will > > >>>>> support this config > > >>>>> in SQL CLI first, will support it in > > >> TableEnvironment#executeMultiSql() > > >>>> in > > >>>>> the future, right? > > >>>>> > > >>>>> Best, > > >>>>> Jark > > >>>>> > > >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email]> > > >> wrote: > > >>>>> > > >>>>>> Hi everyone, > > >>>>>> > > >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply to > > >>>>>> regular `executeSql`. Actually, this option makes only sense when > > >>>>>> executing multi statements. Once we have a > > >>>>>> `TableEnvironment.executeMultiSql()` this config could be > > >> considered. > > >>>>>> > > >>>>>> Maybe we can find a better generic name? Other platforms will also > > >>> need > > >>>>>> to have this config option, which is why I would like to avoid a > SQL > > >>>>>> Client specific option. Otherwise every platform has to come up > with > > >>>>>> this important config option separately. > > >>>>>> > > >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other > > >>> opinions? > > >>>>>> > > >>>>>> Regards, > > >>>>>> Timo > > >>>>>> > > >>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > > >>>>>>> Hi, all. > > >>>>>>> > > >>>>>>> I think it may cause user confused. The main problem is we have > no > > >>>> means > > >>>>>>> to detect the conflict configuration, e.g. users set the option > > >> true > > >>>> and > > >>>>>>> use `TableResult#await` together. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Shengkai. > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>> > > >> > > >> > > >> -- > > >> Best regards! > > >> Rui Li > > >> > > > > > > > > |
Hi everyone.
Sorry for the late response. For `execution.runtime-mode`, I think it's much better than `table.execution.mode`. Thanks for Timo's suggestions! For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should clarify the usage of the SHOW CREATE TABLE statements. It should be allowed to specify the table that is fully qualified and only works for the table that is created by the sql statements. I have updated the FLIP with suggestions. It seems we have reached a consensus, I'd like to start a formal vote for the FLIP. Please vote +1 to approve the FLIP, or -1 with a comment. Best, Shengkai Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > Hi Ingo, > > 1) I think you are right, the table path should be fully-qualified. > > 2) I think this is also a good point. The SHOW CREATE TABLE > only aims to print DDL for the tables registered using SQL CREATE TABLE > DDL. > If a table is registered using Table API, e.g. > `StreamTableEnvironment#createTemporaryView(String, DataStream)`, > currently it's not possible to print DDL for such tables. > I think we should point it out in the FLIP. > > Best, > Jark > > > > On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: > > > Hi all, > > > > I have a couple questions about the SHOW CREATE TABLE statement. > > > > 1) Contrary to the example in the FLIP I think the returned DDL should > > always have the table identifier fully-qualified. Otherwise the DDL > depends > > on the current context (catalog/database), which could be surprising, > > especially since "the same" table can behave differently if created in > > different catalogs. > > 2) How should this handle tables which cannot be fully characterized by > > properties only? I don't know if there's an example for this yet, but > > hypothetically this is not currently a requirement, right? This isn't as > > much of a problem if this syntax is SQL-client-specific, but if it's > > general Flink SQL syntax we should consider this (one way or another). > > > > > > Regards > > Ingo > > > > On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> wrote: > > > > > Hi Shengkai, > > > > > > thanks for updating the FLIP. > > > > > > I have one last comment for the option `table.execution.mode`. Should > we > > > already use the global Flink option `execution.runtime-mode` instead? > > > > > > We are using Flink's options where possible (e.g. `pipeline.name` and > > > `parallism.default`) why not also for batch/streaming mode? > > > > > > The description of the option matches to the Blink planner behavior: > > > > > > ``` > > > Among other things, this controls task scheduling, network shuffle > > > behavior, and time semantics. > > > ``` > > > > > > Regards, > > > Timo > > > > > > On 10.02.21 06:30, Shengkai Fang wrote: > > > > Hi, guys. > > > > > > > > I have updated the FLIP. It seems we have reached agreement. Maybe > we > > > can > > > > start the vote soon. If anyone has other questions, please leave your > > > > comments. > > > > > > > > Best, > > > > Shengkai > > > > > > > > Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > > > > > > > >> Hi guys, > > > >> > > > >> The conclusion sounds good to me. > > > >> > > > >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> > > wrote: > > > >> > > > >>> Hi, Timo, Jark. > > > >>> > > > >>> I am fine with the new option name. > > > >>> > > > >>> Best, > > > >>> Shengkai > > > >>> > > > >>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > > > >>> > > > >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. > > > >>>> > > > >>>> @Rui, Shengkai: Are you also fine with this conclusion? > > > >>>> > > > >>>> Thanks, > > > >>>> Timo > > > >>>> > > > >>>> On 09.02.21 10:14, Jark Wu wrote: > > > >>>>> I'm fine with `table.multi-dml-sync`. > > > >>>>> > > > >>>>> My previous concern about "multi" is that DML in CLI looks like > > > >> single > > > >>>>> statement. > > > >>>>> But we can treat CLI as a multi-line accepting statements from > > > >> opening > > > >>> to > > > >>>>> closing. > > > >>>>> Thus, I'm fine with `table.multi-dml-sync`. > > > >>>>> > > > >>>>> So the conclusion is `table.multi-dml-sync` (false by default), > and > > > >> we > > > >>>> will > > > >>>>> support this config > > > >>>>> in SQL CLI first, will support it in > > > >> TableEnvironment#executeMultiSql() > > > >>>> in > > > >>>>> the future, right? > > > >>>>> > > > >>>>> Best, > > > >>>>> Jark > > > >>>>> > > > >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email]> > > > >> wrote: > > > >>>>> > > > >>>>>> Hi everyone, > > > >>>>>> > > > >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply > to > > > >>>>>> regular `executeSql`. Actually, this option makes only sense > when > > > >>>>>> executing multi statements. Once we have a > > > >>>>>> `TableEnvironment.executeMultiSql()` this config could be > > > >> considered. > > > >>>>>> > > > >>>>>> Maybe we can find a better generic name? Other platforms will > also > > > >>> need > > > >>>>>> to have this config option, which is why I would like to avoid a > > SQL > > > >>>>>> Client specific option. Otherwise every platform has to come up > > with > > > >>>>>> this important config option separately. > > > >>>>>> > > > >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other > > > >>> opinions? > > > >>>>>> > > > >>>>>> Regards, > > > >>>>>> Timo > > > >>>>>> > > > >>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > > > >>>>>>> Hi, all. > > > >>>>>>> > > > >>>>>>> I think it may cause user confused. The main problem is we > have > > no > > > >>>> means > > > >>>>>>> to detect the conflict configuration, e.g. users set the option > > > >> true > > > >>>> and > > > >>>>>>> use `TableResult#await` together. > > > >>>>>>> > > > >>>>>>> Best, > > > >>>>>>> Shengkai. > > > >>>>>>> > > > >>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>>> > > > >>> > > > >> > > > >> > > > >> -- > > > >> Best regards! > > > >> Rui Li > > > >> > > > > > > > > > > > > > |
Sorry for the late reply, but I'm confused by `table.multi-dml-sync`.
IIUC this config will take effect with 2 use cases: 1. SQL client, either interactive mode or executing multiple statements via -f. In most cases, there will be only one INSERT INTO statement but we are controlling the sync/async behavior with "*multi-dml*-sync". I think this will confuse a lot of users. Besides, 2. TableEnvironment#executeMultiSql(), but this is future work, we are also not sure if we will really introduce this in the future. I would prefer to introduce this option for only sql client. For platforms Timo mentioned which need to control such behavior, I think it's easy and flexible to introduce one on their own. Best, Kurt On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> wrote: > Hi everyone. > > Sorry for the late response. > > For `execution.runtime-mode`, I think it's much better than > `table.execution.mode`. Thanks for Timo's suggestions! > > For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should clarify the > usage of the SHOW CREATE TABLE statements. It should be allowed to specify > the table that is fully qualified and only works for the table that is > created by the sql statements. > > I have updated the FLIP with suggestions. It seems we have reached a > consensus, I'd like to start a formal vote for the FLIP. > > Please vote +1 to approve the FLIP, or -1 with a comment. > > Best, > Shengkai > > Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > > > Hi Ingo, > > > > 1) I think you are right, the table path should be fully-qualified. > > > > 2) I think this is also a good point. The SHOW CREATE TABLE > > only aims to print DDL for the tables registered using SQL CREATE TABLE > > DDL. > > If a table is registered using Table API, e.g. > > `StreamTableEnvironment#createTemporaryView(String, DataStream)`, > > currently it's not possible to print DDL for such tables. > > I think we should point it out in the FLIP. > > > > Best, > > Jark > > > > > > > > On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: > > > > > Hi all, > > > > > > I have a couple questions about the SHOW CREATE TABLE statement. > > > > > > 1) Contrary to the example in the FLIP I think the returned DDL should > > > always have the table identifier fully-qualified. Otherwise the DDL > > depends > > > on the current context (catalog/database), which could be surprising, > > > especially since "the same" table can behave differently if created in > > > different catalogs. > > > 2) How should this handle tables which cannot be fully characterized by > > > properties only? I don't know if there's an example for this yet, but > > > hypothetically this is not currently a requirement, right? This isn't > as > > > much of a problem if this syntax is SQL-client-specific, but if it's > > > general Flink SQL syntax we should consider this (one way or another). > > > > > > > > > Regards > > > Ingo > > > > > > On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> > wrote: > > > > > > > Hi Shengkai, > > > > > > > > thanks for updating the FLIP. > > > > > > > > I have one last comment for the option `table.execution.mode`. Should > > we > > > > already use the global Flink option `execution.runtime-mode` instead? > > > > > > > > We are using Flink's options where possible (e.g. `pipeline.name` > and > > > > `parallism.default`) why not also for batch/streaming mode? > > > > > > > > The description of the option matches to the Blink planner behavior: > > > > > > > > ``` > > > > Among other things, this controls task scheduling, network shuffle > > > > behavior, and time semantics. > > > > ``` > > > > > > > > Regards, > > > > Timo > > > > > > > > On 10.02.21 06:30, Shengkai Fang wrote: > > > > > Hi, guys. > > > > > > > > > > I have updated the FLIP. It seems we have reached agreement. Maybe > > we > > > > can > > > > > start the vote soon. If anyone has other questions, please leave > your > > > > > comments. > > > > > > > > > > Best, > > > > > Shengkai > > > > > > > > > > Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > > > > > > > > > >> Hi guys, > > > > >> > > > > >> The conclusion sounds good to me. > > > > >> > > > > >> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> > > > wrote: > > > > >> > > > > >>> Hi, Timo, Jark. > > > > >>> > > > > >>> I am fine with the new option name. > > > > >>> > > > > >>> Best, > > > > >>> Shengkai > > > > >>> > > > > >>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > > > > >>> > > > > >>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. > > > > >>>> > > > > >>>> @Rui, Shengkai: Are you also fine with this conclusion? > > > > >>>> > > > > >>>> Thanks, > > > > >>>> Timo > > > > >>>> > > > > >>>> On 09.02.21 10:14, Jark Wu wrote: > > > > >>>>> I'm fine with `table.multi-dml-sync`. > > > > >>>>> > > > > >>>>> My previous concern about "multi" is that DML in CLI looks like > > > > >> single > > > > >>>>> statement. > > > > >>>>> But we can treat CLI as a multi-line accepting statements from > > > > >> opening > > > > >>> to > > > > >>>>> closing. > > > > >>>>> Thus, I'm fine with `table.multi-dml-sync`. > > > > >>>>> > > > > >>>>> So the conclusion is `table.multi-dml-sync` (false by default), > > and > > > > >> we > > > > >>>> will > > > > >>>>> support this config > > > > >>>>> in SQL CLI first, will support it in > > > > >> TableEnvironment#executeMultiSql() > > > > >>>> in > > > > >>>>> the future, right? > > > > >>>>> > > > > >>>>> Best, > > > > >>>>> Jark > > > > >>>>> > > > > >>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email]> > > > > >> wrote: > > > > >>>>> > > > > >>>>>> Hi everyone, > > > > >>>>>> > > > > >>>>>> I understand Rui's concerns. `table.dml-sync` should not apply > > to > > > > >>>>>> regular `executeSql`. Actually, this option makes only sense > > when > > > > >>>>>> executing multi statements. Once we have a > > > > >>>>>> `TableEnvironment.executeMultiSql()` this config could be > > > > >> considered. > > > > >>>>>> > > > > >>>>>> Maybe we can find a better generic name? Other platforms will > > also > > > > >>> need > > > > >>>>>> to have this config option, which is why I would like to > avoid a > > > SQL > > > > >>>>>> Client specific option. Otherwise every platform has to come > up > > > with > > > > >>>>>> this important config option separately. > > > > >>>>>> > > > > >>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other > > > > >>> opinions? > > > > >>>>>> > > > > >>>>>> Regards, > > > > >>>>>> Timo > > > > >>>>>> > > > > >>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > > > > >>>>>>> Hi, all. > > > > >>>>>>> > > > > >>>>>>> I think it may cause user confused. The main problem is we > > have > > > no > > > > >>>> means > > > > >>>>>>> to detect the conflict configuration, e.g. users set the > option > > > > >> true > > > > >>>> and > > > > >>>>>>> use `TableResult#await` together. > > > > >>>>>>> > > > > >>>>>>> Best, > > > > >>>>>>> Shengkai. > > > > >>>>>>> > > > > >>>>>> > > > > >>>>>> > > > > >>>>> > > > > >>>> > > > > >>>> > > > > >>> > > > > >> > > > > >> > > > > >> -- > > > > >> Best regards! > > > > >> Rui Li > > > > >> > > > > > > > > > > > > > > > > > > > |
Hi Kurt,
we can also shorten it to `table.dml-sync` if that would help. Then it would confuse users that do a regular `.executeSql("INSERT INTO")` in a notebook session. In any case users will need to learn the semantics of this option. `table.multi-dml-sync` should be described as "If a you are in a multi statement environment, execute DMLs synchrounous.". I don't have a strong opinion on shortening it to `table.dml-sync`. Just to clarify the implementation: The option should be handled by the SQL Client only, but the name can be shared accross platforms. Regards, Timo On 23.02.21 09:54, Kurt Young wrote: > Sorry for the late reply, but I'm confused by `table.multi-dml-sync`. > > IIUC this config will take effect with 2 use cases: > 1. SQL client, either interactive mode or executing multiple statements via > -f. In most cases, > there will be only one INSERT INTO statement but we are controlling the > sync/async behavior > with "*multi-dml*-sync". I think this will confuse a lot of users. Besides, > > 2. TableEnvironment#executeMultiSql(), but this is future work, we are also > not sure if we will > really introduce this in the future. > > I would prefer to introduce this option for only sql client. For platforms > Timo mentioned which > need to control such behavior, I think it's easy and flexible to introduce > one on their own. > > Best, > Kurt > > > On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> wrote: > >> Hi everyone. >> >> Sorry for the late response. >> >> For `execution.runtime-mode`, I think it's much better than >> `table.execution.mode`. Thanks for Timo's suggestions! >> >> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should clarify the >> usage of the SHOW CREATE TABLE statements. It should be allowed to specify >> the table that is fully qualified and only works for the table that is >> created by the sql statements. >> >> I have updated the FLIP with suggestions. It seems we have reached a >> consensus, I'd like to start a formal vote for the FLIP. >> >> Please vote +1 to approve the FLIP, or -1 with a comment. >> >> Best, >> Shengkai >> >> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: >> >>> Hi Ingo, >>> >>> 1) I think you are right, the table path should be fully-qualified. >>> >>> 2) I think this is also a good point. The SHOW CREATE TABLE >>> only aims to print DDL for the tables registered using SQL CREATE TABLE >>> DDL. >>> If a table is registered using Table API, e.g. >>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`, >>> currently it's not possible to print DDL for such tables. >>> I think we should point it out in the FLIP. >>> >>> Best, >>> Jark >>> >>> >>> >>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: >>> >>>> Hi all, >>>> >>>> I have a couple questions about the SHOW CREATE TABLE statement. >>>> >>>> 1) Contrary to the example in the FLIP I think the returned DDL should >>>> always have the table identifier fully-qualified. Otherwise the DDL >>> depends >>>> on the current context (catalog/database), which could be surprising, >>>> especially since "the same" table can behave differently if created in >>>> different catalogs. >>>> 2) How should this handle tables which cannot be fully characterized by >>>> properties only? I don't know if there's an example for this yet, but >>>> hypothetically this is not currently a requirement, right? This isn't >> as >>>> much of a problem if this syntax is SQL-client-specific, but if it's >>>> general Flink SQL syntax we should consider this (one way or another). >>>> >>>> >>>> Regards >>>> Ingo >>>> >>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> >> wrote: >>>> >>>>> Hi Shengkai, >>>>> >>>>> thanks for updating the FLIP. >>>>> >>>>> I have one last comment for the option `table.execution.mode`. Should >>> we >>>>> already use the global Flink option `execution.runtime-mode` instead? >>>>> >>>>> We are using Flink's options where possible (e.g. `pipeline.name` >> and >>>>> `parallism.default`) why not also for batch/streaming mode? >>>>> >>>>> The description of the option matches to the Blink planner behavior: >>>>> >>>>> ``` >>>>> Among other things, this controls task scheduling, network shuffle >>>>> behavior, and time semantics. >>>>> ``` >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> On 10.02.21 06:30, Shengkai Fang wrote: >>>>>> Hi, guys. >>>>>> >>>>>> I have updated the FLIP. It seems we have reached agreement. Maybe >>> we >>>>> can >>>>>> start the vote soon. If anyone has other questions, please leave >> your >>>>>> comments. >>>>>> >>>>>> Best, >>>>>> Shengkai >>>>>> >>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: >>>>>> >>>>>>> Hi guys, >>>>>>> >>>>>>> The conclusion sounds good to me. >>>>>>> >>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> >>>> wrote: >>>>>>> >>>>>>>> Hi, Timo, Jark. >>>>>>>> >>>>>>>> I am fine with the new option name. >>>>>>>> >>>>>>>> Best, >>>>>>>> Shengkai >>>>>>>> >>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: >>>>>>>> >>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. >>>>>>>>> >>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Timo >>>>>>>>> >>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: >>>>>>>>>> I'm fine with `table.multi-dml-sync`. >>>>>>>>>> >>>>>>>>>> My previous concern about "multi" is that DML in CLI looks like >>>>>>> single >>>>>>>>>> statement. >>>>>>>>>> But we can treat CLI as a multi-line accepting statements from >>>>>>> opening >>>>>>>> to >>>>>>>>>> closing. >>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. >>>>>>>>>> >>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by default), >>> and >>>>>>> we >>>>>>>>> will >>>>>>>>>> support this config >>>>>>>>>> in SQL CLI first, will support it in >>>>>>> TableEnvironment#executeMultiSql() >>>>>>>>> in >>>>>>>>>> the future, right? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jark >>>>>>>>>> >>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email]> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi everyone, >>>>>>>>>>> >>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not apply >>> to >>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense >>> when >>>>>>>>>>> executing multi statements. Once we have a >>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be >>>>>>> considered. >>>>>>>>>>> >>>>>>>>>>> Maybe we can find a better generic name? Other platforms will >>> also >>>>>>>> need >>>>>>>>>>> to have this config option, which is why I would like to >> avoid a >>>> SQL >>>>>>>>>>> Client specific option. Otherwise every platform has to come >> up >>>> with >>>>>>>>>>> this important config option separately. >>>>>>>>>>> >>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other >>>>>>>> opinions? >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: >>>>>>>>>>>> Hi, all. >>>>>>>>>>>> >>>>>>>>>>>> I think it may cause user confused. The main problem is we >>> have >>>> no >>>>>>>>> means >>>>>>>>>>>> to detect the conflict configuration, e.g. users set the >> option >>>>>>> true >>>>>>>>> and >>>>>>>>>>>> use `TableResult#await` together. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Shengkai. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best regards! >>>>>>> Rui Li >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> > |
If we all agree the option should only be handled by sql client, then why
don't we just call it `sql-client.dml-sync`? As you said, calling it `table.dml-sync` but has no affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big confusion for users. The only concern I saw is if we introduce "TableEnvironment.executeMultiSql()" in the future, how do we control the synchronization between statements? TBH I don't really see a strong requirement for such interfaces. Right now, we have a pretty clear semantic of `TableEnv.executeSql`, and it's very convenient for users if they want to execute multiple sql statements. They can simulate either synced or async execution with this building block. This will introduce slight overhead for users, but compared to the confusion we might cause if we introduce such a method of our own, I think it's better to wait for some more feedback. Best, Kurt On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> wrote: > Hi Kurt, > > we can also shorten it to `table.dml-sync` if that would help. Then it > would confuse users that do a regular `.executeSql("INSERT INTO")` in a > notebook session. > > In any case users will need to learn the semantics of this option. > `table.multi-dml-sync` should be described as "If a you are in a multi > statement environment, execute DMLs synchrounous.". I don't have a > strong opinion on shortening it to `table.dml-sync`. > > Just to clarify the implementation: The option should be handled by the > SQL Client only, but the name can be shared accross platforms. > > Regards, > Timo > > > On 23.02.21 09:54, Kurt Young wrote: > > Sorry for the late reply, but I'm confused by `table.multi-dml-sync`. > > > > IIUC this config will take effect with 2 use cases: > > 1. SQL client, either interactive mode or executing multiple statements > via > > -f. In most cases, > > there will be only one INSERT INTO statement but we are controlling the > > sync/async behavior > > with "*multi-dml*-sync". I think this will confuse a lot of users. > Besides, > > > > 2. TableEnvironment#executeMultiSql(), but this is future work, we are > also > > not sure if we will > > really introduce this in the future. > > > > I would prefer to introduce this option for only sql client. For > platforms > > Timo mentioned which > > need to control such behavior, I think it's easy and flexible to > introduce > > one on their own. > > > > Best, > > Kurt > > > > > > On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> > wrote: > > > >> Hi everyone. > >> > >> Sorry for the late response. > >> > >> For `execution.runtime-mode`, I think it's much better than > >> `table.execution.mode`. Thanks for Timo's suggestions! > >> > >> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should clarify > the > >> usage of the SHOW CREATE TABLE statements. It should be allowed to > specify > >> the table that is fully qualified and only works for the table that is > >> created by the sql statements. > >> > >> I have updated the FLIP with suggestions. It seems we have reached a > >> consensus, I'd like to start a formal vote for the FLIP. > >> > >> Please vote +1 to approve the FLIP, or -1 with a comment. > >> > >> Best, > >> Shengkai > >> > >> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > >> > >>> Hi Ingo, > >>> > >>> 1) I think you are right, the table path should be fully-qualified. > >>> > >>> 2) I think this is also a good point. The SHOW CREATE TABLE > >>> only aims to print DDL for the tables registered using SQL CREATE TABLE > >>> DDL. > >>> If a table is registered using Table API, e.g. > >>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`, > >>> currently it's not possible to print DDL for such tables. > >>> I think we should point it out in the FLIP. > >>> > >>> Best, > >>> Jark > >>> > >>> > >>> > >>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: > >>> > >>>> Hi all, > >>>> > >>>> I have a couple questions about the SHOW CREATE TABLE statement. > >>>> > >>>> 1) Contrary to the example in the FLIP I think the returned DDL should > >>>> always have the table identifier fully-qualified. Otherwise the DDL > >>> depends > >>>> on the current context (catalog/database), which could be surprising, > >>>> especially since "the same" table can behave differently if created in > >>>> different catalogs. > >>>> 2) How should this handle tables which cannot be fully characterized > by > >>>> properties only? I don't know if there's an example for this yet, but > >>>> hypothetically this is not currently a requirement, right? This isn't > >> as > >>>> much of a problem if this syntax is SQL-client-specific, but if it's > >>>> general Flink SQL syntax we should consider this (one way or another). > >>>> > >>>> > >>>> Regards > >>>> Ingo > >>>> > >>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> > >> wrote: > >>>> > >>>>> Hi Shengkai, > >>>>> > >>>>> thanks for updating the FLIP. > >>>>> > >>>>> I have one last comment for the option `table.execution.mode`. Should > >>> we > >>>>> already use the global Flink option `execution.runtime-mode` instead? > >>>>> > >>>>> We are using Flink's options where possible (e.g. `pipeline.name` > >> and > >>>>> `parallism.default`) why not also for batch/streaming mode? > >>>>> > >>>>> The description of the option matches to the Blink planner behavior: > >>>>> > >>>>> ``` > >>>>> Among other things, this controls task scheduling, network shuffle > >>>>> behavior, and time semantics. > >>>>> ``` > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> On 10.02.21 06:30, Shengkai Fang wrote: > >>>>>> Hi, guys. > >>>>>> > >>>>>> I have updated the FLIP. It seems we have reached agreement. Maybe > >>> we > >>>>> can > >>>>>> start the vote soon. If anyone has other questions, please leave > >> your > >>>>>> comments. > >>>>>> > >>>>>> Best, > >>>>>> Shengkai > >>>>>> > >>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > >>>>>> > >>>>>>> Hi guys, > >>>>>>> > >>>>>>> The conclusion sounds good to me. > >>>>>>> > >>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> > >>>> wrote: > >>>>>>> > >>>>>>>> Hi, Timo, Jark. > >>>>>>>> > >>>>>>>> I am fine with the new option name. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Shengkai > >>>>>>>> > >>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > >>>>>>>> > >>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. > >>>>>>>>> > >>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> Timo > >>>>>>>>> > >>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: > >>>>>>>>>> I'm fine with `table.multi-dml-sync`. > >>>>>>>>>> > >>>>>>>>>> My previous concern about "multi" is that DML in CLI looks like > >>>>>>> single > >>>>>>>>>> statement. > >>>>>>>>>> But we can treat CLI as a multi-line accepting statements from > >>>>>>> opening > >>>>>>>> to > >>>>>>>>>> closing. > >>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. > >>>>>>>>>> > >>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by default), > >>> and > >>>>>>> we > >>>>>>>>> will > >>>>>>>>>> support this config > >>>>>>>>>> in SQL CLI first, will support it in > >>>>>>> TableEnvironment#executeMultiSql() > >>>>>>>>> in > >>>>>>>>>> the future, right? > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jark > >>>>>>>>>> > >>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email]> > >>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi everyone, > >>>>>>>>>>> > >>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not apply > >>> to > >>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense > >>> when > >>>>>>>>>>> executing multi statements. Once we have a > >>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be > >>>>>>> considered. > >>>>>>>>>>> > >>>>>>>>>>> Maybe we can find a better generic name? Other platforms will > >>> also > >>>>>>>> need > >>>>>>>>>>> to have this config option, which is why I would like to > >> avoid a > >>>> SQL > >>>>>>>>>>> Client specific option. Otherwise every platform has to come > >> up > >>>> with > >>>>>>>>>>> this important config option separately. > >>>>>>>>>>> > >>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or other > >>>>>>>> opinions? > >>>>>>>>>>> > >>>>>>>>>>> Regards, > >>>>>>>>>>> Timo > >>>>>>>>>>> > >>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > >>>>>>>>>>>> Hi, all. > >>>>>>>>>>>> > >>>>>>>>>>>> I think it may cause user confused. The main problem is we > >>> have > >>>> no > >>>>>>>>> means > >>>>>>>>>>>> to detect the conflict configuration, e.g. users set the > >> option > >>>>>>> true > >>>>>>>>> and > >>>>>>>>>>>> use `TableResult#await` together. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Shengkai. > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> Best regards! > >>>>>>> Rui Li > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >> > > > > |
From my point of view, I also prefer "sql-client.dml-sync",
because the behavior of this configuration is very clear. Even if we introduce a new config in the future, e.g. `table.dml-sync`, we can also deprecate the sql-client one. Introducing a "table." configuration without any implementation will confuse users a lot, as they expect it should take effect on the Table API. If we want to introduce an unified "table.dml-sync" option, I prefer it should be implemented on Table API and affect all the DMLs on Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`), as I have mentioned before [1]. > It would be very straightforward that it affects all the DMLs on SQL CLI and TableEnvironment (including `executeSql`, `StatementSet`, `Table#executeInsert`, etc.). This can also make SQL CLI easy to support this configuration by passing through to the TableEnv. Best, Jark [1]: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> wrote: > If we all agree the option should only be handled by sql client, then why > don't we > just call it `sql-client.dml-sync`? As you said, calling it > `table.dml-sync` but has no > affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big > confusion for > users. > > The only concern I saw is if we introduce > "TableEnvironment.executeMultiSql()" in the > future, how do we control the synchronization between statements? TBH I > don't really > see a strong requirement for such interfaces. Right now, we have a pretty > clear semantic > of `TableEnv.executeSql`, and it's very convenient for users if they want > to execute multiple > sql statements. They can simulate either synced or async execution with > this building block. > > This will introduce slight overhead for users, but compared to the > confusion we might > cause if we introduce such a method of our own, I think it's better to wait > for some more > feedback. > > Best, > Kurt > > > On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> wrote: > > > Hi Kurt, > > > > we can also shorten it to `table.dml-sync` if that would help. Then it > > would confuse users that do a regular `.executeSql("INSERT INTO")` in a > > notebook session. > > > > In any case users will need to learn the semantics of this option. > > `table.multi-dml-sync` should be described as "If a you are in a multi > > statement environment, execute DMLs synchrounous.". I don't have a > > strong opinion on shortening it to `table.dml-sync`. > > > > Just to clarify the implementation: The option should be handled by the > > SQL Client only, but the name can be shared accross platforms. > > > > Regards, > > Timo > > > > > > On 23.02.21 09:54, Kurt Young wrote: > > > Sorry for the late reply, but I'm confused by `table.multi-dml-sync`. > > > > > > IIUC this config will take effect with 2 use cases: > > > 1. SQL client, either interactive mode or executing multiple statements > > via > > > -f. In most cases, > > > there will be only one INSERT INTO statement but we are controlling the > > > sync/async behavior > > > with "*multi-dml*-sync". I think this will confuse a lot of users. > > Besides, > > > > > > 2. TableEnvironment#executeMultiSql(), but this is future work, we are > > also > > > not sure if we will > > > really introduce this in the future. > > > > > > I would prefer to introduce this option for only sql client. For > > platforms > > > Timo mentioned which > > > need to control such behavior, I think it's easy and flexible to > > introduce > > > one on their own. > > > > > > Best, > > > Kurt > > > > > > > > > On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> > > wrote: > > > > > >> Hi everyone. > > >> > > >> Sorry for the late response. > > >> > > >> For `execution.runtime-mode`, I think it's much better than > > >> `table.execution.mode`. Thanks for Timo's suggestions! > > >> > > >> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should > clarify > > the > > >> usage of the SHOW CREATE TABLE statements. It should be allowed to > > specify > > >> the table that is fully qualified and only works for the table that is > > >> created by the sql statements. > > >> > > >> I have updated the FLIP with suggestions. It seems we have reached a > > >> consensus, I'd like to start a formal vote for the FLIP. > > >> > > >> Please vote +1 to approve the FLIP, or -1 with a comment. > > >> > > >> Best, > > >> Shengkai > > >> > > >> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > > >> > > >>> Hi Ingo, > > >>> > > >>> 1) I think you are right, the table path should be fully-qualified. > > >>> > > >>> 2) I think this is also a good point. The SHOW CREATE TABLE > > >>> only aims to print DDL for the tables registered using SQL CREATE > TABLE > > >>> DDL. > > >>> If a table is registered using Table API, e.g. > > >>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`, > > >>> currently it's not possible to print DDL for such tables. > > >>> I think we should point it out in the FLIP. > > >>> > > >>> Best, > > >>> Jark > > >>> > > >>> > > >>> > > >>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: > > >>> > > >>>> Hi all, > > >>>> > > >>>> I have a couple questions about the SHOW CREATE TABLE statement. > > >>>> > > >>>> 1) Contrary to the example in the FLIP I think the returned DDL > should > > >>>> always have the table identifier fully-qualified. Otherwise the DDL > > >>> depends > > >>>> on the current context (catalog/database), which could be > surprising, > > >>>> especially since "the same" table can behave differently if created > in > > >>>> different catalogs. > > >>>> 2) How should this handle tables which cannot be fully characterized > > by > > >>>> properties only? I don't know if there's an example for this yet, > but > > >>>> hypothetically this is not currently a requirement, right? This > isn't > > >> as > > >>>> much of a problem if this syntax is SQL-client-specific, but if it's > > >>>> general Flink SQL syntax we should consider this (one way or > another). > > >>>> > > >>>> > > >>>> Regards > > >>>> Ingo > > >>>> > > >>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> > > >> wrote: > > >>>> > > >>>>> Hi Shengkai, > > >>>>> > > >>>>> thanks for updating the FLIP. > > >>>>> > > >>>>> I have one last comment for the option `table.execution.mode`. > Should > > >>> we > > >>>>> already use the global Flink option `execution.runtime-mode` > instead? > > >>>>> > > >>>>> We are using Flink's options where possible (e.g. `pipeline.name` > > >> and > > >>>>> `parallism.default`) why not also for batch/streaming mode? > > >>>>> > > >>>>> The description of the option matches to the Blink planner > behavior: > > >>>>> > > >>>>> ``` > > >>>>> Among other things, this controls task scheduling, network shuffle > > >>>>> behavior, and time semantics. > > >>>>> ``` > > >>>>> > > >>>>> Regards, > > >>>>> Timo > > >>>>> > > >>>>> On 10.02.21 06:30, Shengkai Fang wrote: > > >>>>>> Hi, guys. > > >>>>>> > > >>>>>> I have updated the FLIP. It seems we have reached agreement. > Maybe > > >>> we > > >>>>> can > > >>>>>> start the vote soon. If anyone has other questions, please leave > > >> your > > >>>>>> comments. > > >>>>>> > > >>>>>> Best, > > >>>>>> Shengkai > > >>>>>> > > >>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > > >>>>>> > > >>>>>>> Hi guys, > > >>>>>>> > > >>>>>>> The conclusion sounds good to me. > > >>>>>>> > > >>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> > > >>>> wrote: > > >>>>>>> > > >>>>>>>> Hi, Timo, Jark. > > >>>>>>>> > > >>>>>>>> I am fine with the new option name. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Shengkai > > >>>>>>>> > > >>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > > >>>>>>>> > > >>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. > > >>>>>>>>> > > >>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? > > >>>>>>>>> > > >>>>>>>>> Thanks, > > >>>>>>>>> Timo > > >>>>>>>>> > > >>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: > > >>>>>>>>>> I'm fine with `table.multi-dml-sync`. > > >>>>>>>>>> > > >>>>>>>>>> My previous concern about "multi" is that DML in CLI looks > like > > >>>>>>> single > > >>>>>>>>>> statement. > > >>>>>>>>>> But we can treat CLI as a multi-line accepting statements from > > >>>>>>> opening > > >>>>>>>> to > > >>>>>>>>>> closing. > > >>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. > > >>>>>>>>>> > > >>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by > default), > > >>> and > > >>>>>>> we > > >>>>>>>>> will > > >>>>>>>>>> support this config > > >>>>>>>>>> in SQL CLI first, will support it in > > >>>>>>> TableEnvironment#executeMultiSql() > > >>>>>>>>> in > > >>>>>>>>>> the future, right? > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Jark > > >>>>>>>>>> > > >>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email] > > > > >>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Hi everyone, > > >>>>>>>>>>> > > >>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not > apply > > >>> to > > >>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense > > >>> when > > >>>>>>>>>>> executing multi statements. Once we have a > > >>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be > > >>>>>>> considered. > > >>>>>>>>>>> > > >>>>>>>>>>> Maybe we can find a better generic name? Other platforms will > > >>> also > > >>>>>>>> need > > >>>>>>>>>>> to have this config option, which is why I would like to > > >> avoid a > > >>>> SQL > > >>>>>>>>>>> Client specific option. Otherwise every platform has to come > > >> up > > >>>> with > > >>>>>>>>>>> this important config option separately. > > >>>>>>>>>>> > > >>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or > other > > >>>>>>>> opinions? > > >>>>>>>>>>> > > >>>>>>>>>>> Regards, > > >>>>>>>>>>> Timo > > >>>>>>>>>>> > > >>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > > >>>>>>>>>>>> Hi, all. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I think it may cause user confused. The main problem is we > > >>> have > > >>>> no > > >>>>>>>>> means > > >>>>>>>>>>>> to detect the conflict configuration, e.g. users set the > > >> option > > >>>>>>> true > > >>>>>>>>> and > > >>>>>>>>>>>> use `TableResult#await` together. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Best, > > >>>>>>>>>>>> Shengkai. > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> -- > > >>>>>>> Best regards! > > >>>>>>> Rui Li > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > > > > > |
The `table.` prefix is meant to be a general option in the table
ecosystem. Not necessarily attached to Table API or SQL Client. That's why SQL Client is also located in the `flink-table` module. My main concern is the SQL script portability. Declaring the sync/async behavior will happen in many SQL scripts. And users should be easily switch from SQL Client to some commercial product without the need of changing the script again. Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later but that would mean introducing future confusion. An app name (what `sql-client` kind of is) should not be part of a config option key if other apps will need the same kind of option. Regards, Timo On 24.02.21 08:59, Jark Wu wrote: >>From my point of view, I also prefer "sql-client.dml-sync", > because the behavior of this configuration is very clear. > Even if we introduce a new config in the future, e.g. `table.dml-sync`, > we can also deprecate the sql-client one. > > Introducing a "table." configuration without any implementation > will confuse users a lot, as they expect it should take effect on > the Table API. > > If we want to introduce an unified "table.dml-sync" option, I prefer > it should be implemented on Table API and affect all the DMLs on > Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`), > as I have mentioned before [1]. > >> It would be very straightforward that it affects all the DMLs on SQL CLI > and > TableEnvironment (including `executeSql`, `StatementSet`, > `Table#executeInsert`, etc.). > This can also make SQL CLI easy to support this configuration by passing > through to the TableEnv. > > Best, > Jark > > > [1]: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html > > > On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> wrote: > >> If we all agree the option should only be handled by sql client, then why >> don't we >> just call it `sql-client.dml-sync`? As you said, calling it >> `table.dml-sync` but has no >> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big >> confusion for >> users. >> >> The only concern I saw is if we introduce >> "TableEnvironment.executeMultiSql()" in the >> future, how do we control the synchronization between statements? TBH I >> don't really >> see a strong requirement for such interfaces. Right now, we have a pretty >> clear semantic >> of `TableEnv.executeSql`, and it's very convenient for users if they want >> to execute multiple >> sql statements. They can simulate either synced or async execution with >> this building block. >> >> This will introduce slight overhead for users, but compared to the >> confusion we might >> cause if we introduce such a method of our own, I think it's better to wait >> for some more >> feedback. >> >> Best, >> Kurt >> >> >> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> wrote: >> >>> Hi Kurt, >>> >>> we can also shorten it to `table.dml-sync` if that would help. Then it >>> would confuse users that do a regular `.executeSql("INSERT INTO")` in a >>> notebook session. >>> >>> In any case users will need to learn the semantics of this option. >>> `table.multi-dml-sync` should be described as "If a you are in a multi >>> statement environment, execute DMLs synchrounous.". I don't have a >>> strong opinion on shortening it to `table.dml-sync`. >>> >>> Just to clarify the implementation: The option should be handled by the >>> SQL Client only, but the name can be shared accross platforms. >>> >>> Regards, >>> Timo >>> >>> >>> On 23.02.21 09:54, Kurt Young wrote: >>>> Sorry for the late reply, but I'm confused by `table.multi-dml-sync`. >>>> >>>> IIUC this config will take effect with 2 use cases: >>>> 1. SQL client, either interactive mode or executing multiple statements >>> via >>>> -f. In most cases, >>>> there will be only one INSERT INTO statement but we are controlling the >>>> sync/async behavior >>>> with "*multi-dml*-sync". I think this will confuse a lot of users. >>> Besides, >>>> >>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we are >>> also >>>> not sure if we will >>>> really introduce this in the future. >>>> >>>> I would prefer to introduce this option for only sql client. For >>> platforms >>>> Timo mentioned which >>>> need to control such behavior, I think it's easy and flexible to >>> introduce >>>> one on their own. >>>> >>>> Best, >>>> Kurt >>>> >>>> >>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> >>> wrote: >>>> >>>>> Hi everyone. >>>>> >>>>> Sorry for the late response. >>>>> >>>>> For `execution.runtime-mode`, I think it's much better than >>>>> `table.execution.mode`. Thanks for Timo's suggestions! >>>>> >>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should >> clarify >>> the >>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to >>> specify >>>>> the table that is fully qualified and only works for the table that is >>>>> created by the sql statements. >>>>> >>>>> I have updated the FLIP with suggestions. It seems we have reached a >>>>> consensus, I'd like to start a formal vote for the FLIP. >>>>> >>>>> Please vote +1 to approve the FLIP, or -1 with a comment. >>>>> >>>>> Best, >>>>> Shengkai >>>>> >>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: >>>>> >>>>>> Hi Ingo, >>>>>> >>>>>> 1) I think you are right, the table path should be fully-qualified. >>>>>> >>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE >>>>>> only aims to print DDL for the tables registered using SQL CREATE >> TABLE >>>>>> DDL. >>>>>> If a table is registered using Table API, e.g. >>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`, >>>>>> currently it's not possible to print DDL for such tables. >>>>>> I think we should point it out in the FLIP. >>>>>> >>>>>> Best, >>>>>> Jark >>>>>> >>>>>> >>>>>> >>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have a couple questions about the SHOW CREATE TABLE statement. >>>>>>> >>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL >> should >>>>>>> always have the table identifier fully-qualified. Otherwise the DDL >>>>>> depends >>>>>>> on the current context (catalog/database), which could be >> surprising, >>>>>>> especially since "the same" table can behave differently if created >> in >>>>>>> different catalogs. >>>>>>> 2) How should this handle tables which cannot be fully characterized >>> by >>>>>>> properties only? I don't know if there's an example for this yet, >> but >>>>>>> hypothetically this is not currently a requirement, right? This >> isn't >>>>> as >>>>>>> much of a problem if this syntax is SQL-client-specific, but if it's >>>>>>> general Flink SQL syntax we should consider this (one way or >> another). >>>>>>> >>>>>>> >>>>>>> Regards >>>>>>> Ingo >>>>>>> >>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> >>>>> wrote: >>>>>>> >>>>>>>> Hi Shengkai, >>>>>>>> >>>>>>>> thanks for updating the FLIP. >>>>>>>> >>>>>>>> I have one last comment for the option `table.execution.mode`. >> Should >>>>>> we >>>>>>>> already use the global Flink option `execution.runtime-mode` >> instead? >>>>>>>> >>>>>>>> We are using Flink's options where possible (e.g. `pipeline.name` >>>>> and >>>>>>>> `parallism.default`) why not also for batch/streaming mode? >>>>>>>> >>>>>>>> The description of the option matches to the Blink planner >> behavior: >>>>>>>> >>>>>>>> ``` >>>>>>>> Among other things, this controls task scheduling, network shuffle >>>>>>>> behavior, and time semantics. >>>>>>>> ``` >>>>>>>> >>>>>>>> Regards, >>>>>>>> Timo >>>>>>>> >>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: >>>>>>>>> Hi, guys. >>>>>>>>> >>>>>>>>> I have updated the FLIP. It seems we have reached agreement. >> Maybe >>>>>> we >>>>>>>> can >>>>>>>>> start the vote soon. If anyone has other questions, please leave >>>>> your >>>>>>>>> comments. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Shengkai >>>>>>>>> >>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: >>>>>>>>> >>>>>>>>>> Hi guys, >>>>>>>>>> >>>>>>>>>> The conclusion sounds good to me. >>>>>>>>>> >>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email]> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, Timo, Jark. >>>>>>>>>>> >>>>>>>>>>> I am fine with the new option name. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Shengkai >>>>>>>>>>> >>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: >>>>>>>>>>> >>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. >>>>>>>>>>>> >>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: >>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. >>>>>>>>>>>>> >>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks >> like >>>>>>>>>> single >>>>>>>>>>>>> statement. >>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements from >>>>>>>>>> opening >>>>>>>>>>> to >>>>>>>>>>>>> closing. >>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. >>>>>>>>>>>>> >>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by >> default), >>>>>> and >>>>>>>>>> we >>>>>>>>>>>> will >>>>>>>>>>>>> support this config >>>>>>>>>>>>> in SQL CLI first, will support it in >>>>>>>>>> TableEnvironment#executeMultiSql() >>>>>>>>>>>> in >>>>>>>>>>>>> the future, right? >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Jark >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther <[hidden email] >>> >>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not >> apply >>>>>> to >>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense >>>>>> when >>>>>>>>>>>>>> executing multi statements. Once we have a >>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be >>>>>>>>>> considered. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms will >>>>>> also >>>>>>>>>>> need >>>>>>>>>>>>>> to have this config option, which is why I would like to >>>>> avoid a >>>>>>> SQL >>>>>>>>>>>>>> Client specific option. Otherwise every platform has to come >>>>> up >>>>>>> with >>>>>>>>>>>>>> this important config option separately. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or >> other >>>>>>>>>>> opinions? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Timo >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: >>>>>>>>>>>>>>> Hi, all. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I think it may cause user confused. The main problem is we >>>>>> have >>>>>>> no >>>>>>>>>>>> means >>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the >>>>> option >>>>>>>>>> true >>>>>>>>>>>> and >>>>>>>>>>>>>>> use `TableResult#await` together. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Shengkai. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Best regards! >>>>>>>>>> Rui Li >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> >> > |
Hi, everyone.
I do some summaries about the discussion about the option. If the summary has errors, please correct me. `table.dml-sync`: - take effect for `executeMultiSql` and sql client - benefit: SQL script portability. One script for all platforms. - drawback: Don't work for `TableEnvironment#executeSql`. `table.multi-dml-sync`: - take effect for `executeMultiSql` and sql client - benefit: SQL script portability - drawback: It's confused when the sql script has one dml statement but need to set option `table.multi-dml-sync` `client.dml-sync`: - take effect for sql client only - benefit: clear definition. - drawback: Every platform needs to define its own option. Bad SQL script portability. Just as Jark said, I think the `table.dml-sync` is a good choice if we can extend its scope and make this option works for `executeSql`. It's straightforward and users can use this option now in table api. The drawback is the `TableResult#await` plays the same role as the option. I don't think the drawback is really critical because many systems have commands play the same role with the different names. Best, Shengkai Timo Walther <[hidden email]> 于2021年2月25日周四 下午4:23写道: > The `table.` prefix is meant to be a general option in the table > ecosystem. Not necessarily attached to Table API or SQL Client. That's > why SQL Client is also located in the `flink-table` module. > > My main concern is the SQL script portability. Declaring the sync/async > behavior will happen in many SQL scripts. And users should be easily > switch from SQL Client to some commercial product without the need of > changing the script again. > > Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later > but that would mean introducing future confusion. An app name (what > `sql-client` kind of is) should not be part of a config option key if > other apps will need the same kind of option. > > Regards, > Timo > > > On 24.02.21 08:59, Jark Wu wrote: > >>From my point of view, I also prefer "sql-client.dml-sync", > > because the behavior of this configuration is very clear. > > Even if we introduce a new config in the future, e.g. `table.dml-sync`, > > we can also deprecate the sql-client one. > > > > Introducing a "table." configuration without any implementation > > will confuse users a lot, as they expect it should take effect on > > the Table API. > > > > If we want to introduce an unified "table.dml-sync" option, I prefer > > it should be implemented on Table API and affect all the DMLs on > > Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`), > > as I have mentioned before [1]. > > > >> It would be very straightforward that it affects all the DMLs on SQL CLI > > and > > TableEnvironment (including `executeSql`, `StatementSet`, > > `Table#executeInsert`, etc.). > > This can also make SQL CLI easy to support this configuration by passing > > through to the TableEnv. > > > > Best, > > Jark > > > > > > [1]: > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html > > > > > > On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> wrote: > > > >> If we all agree the option should only be handled by sql client, then > why > >> don't we > >> just call it `sql-client.dml-sync`? As you said, calling it > >> `table.dml-sync` but has no > >> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big > >> confusion for > >> users. > >> > >> The only concern I saw is if we introduce > >> "TableEnvironment.executeMultiSql()" in the > >> future, how do we control the synchronization between statements? TBH I > >> don't really > >> see a strong requirement for such interfaces. Right now, we have a > pretty > >> clear semantic > >> of `TableEnv.executeSql`, and it's very convenient for users if they > want > >> to execute multiple > >> sql statements. They can simulate either synced or async execution with > >> this building block. > >> > >> This will introduce slight overhead for users, but compared to the > >> confusion we might > >> cause if we introduce such a method of our own, I think it's better to > wait > >> for some more > >> feedback. > >> > >> Best, > >> Kurt > >> > >> > >> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> > wrote: > >> > >>> Hi Kurt, > >>> > >>> we can also shorten it to `table.dml-sync` if that would help. Then it > >>> would confuse users that do a regular `.executeSql("INSERT INTO")` in a > >>> notebook session. > >>> > >>> In any case users will need to learn the semantics of this option. > >>> `table.multi-dml-sync` should be described as "If a you are in a multi > >>> statement environment, execute DMLs synchrounous.". I don't have a > >>> strong opinion on shortening it to `table.dml-sync`. > >>> > >>> Just to clarify the implementation: The option should be handled by the > >>> SQL Client only, but the name can be shared accross platforms. > >>> > >>> Regards, > >>> Timo > >>> > >>> > >>> On 23.02.21 09:54, Kurt Young wrote: > >>>> Sorry for the late reply, but I'm confused by `table.multi-dml-sync`. > >>>> > >>>> IIUC this config will take effect with 2 use cases: > >>>> 1. SQL client, either interactive mode or executing multiple > statements > >>> via > >>>> -f. In most cases, > >>>> there will be only one INSERT INTO statement but we are controlling > the > >>>> sync/async behavior > >>>> with "*multi-dml*-sync". I think this will confuse a lot of users. > >>> Besides, > >>>> > >>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we are > >>> also > >>>> not sure if we will > >>>> really introduce this in the future. > >>>> > >>>> I would prefer to introduce this option for only sql client. For > >>> platforms > >>>> Timo mentioned which > >>>> need to control such behavior, I think it's easy and flexible to > >>> introduce > >>>> one on their own. > >>>> > >>>> Best, > >>>> Kurt > >>>> > >>>> > >>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> > >>> wrote: > >>>> > >>>>> Hi everyone. > >>>>> > >>>>> Sorry for the late response. > >>>>> > >>>>> For `execution.runtime-mode`, I think it's much better than > >>>>> `table.execution.mode`. Thanks for Timo's suggestions! > >>>>> > >>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should > >> clarify > >>> the > >>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to > >>> specify > >>>>> the table that is fully qualified and only works for the table that > is > >>>>> created by the sql statements. > >>>>> > >>>>> I have updated the FLIP with suggestions. It seems we have reached a > >>>>> consensus, I'd like to start a formal vote for the FLIP. > >>>>> > >>>>> Please vote +1 to approve the FLIP, or -1 with a comment. > >>>>> > >>>>> Best, > >>>>> Shengkai > >>>>> > >>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > >>>>> > >>>>>> Hi Ingo, > >>>>>> > >>>>>> 1) I think you are right, the table path should be fully-qualified. > >>>>>> > >>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE > >>>>>> only aims to print DDL for the tables registered using SQL CREATE > >> TABLE > >>>>>> DDL. > >>>>>> If a table is registered using Table API, e.g. > >>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`, > >>>>>> currently it's not possible to print DDL for such tables. > >>>>>> I think we should point it out in the FLIP. > >>>>>> > >>>>>> Best, > >>>>>> Jark > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: > >>>>>> > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I have a couple questions about the SHOW CREATE TABLE statement. > >>>>>>> > >>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL > >> should > >>>>>>> always have the table identifier fully-qualified. Otherwise the DDL > >>>>>> depends > >>>>>>> on the current context (catalog/database), which could be > >> surprising, > >>>>>>> especially since "the same" table can behave differently if created > >> in > >>>>>>> different catalogs. > >>>>>>> 2) How should this handle tables which cannot be fully > characterized > >>> by > >>>>>>> properties only? I don't know if there's an example for this yet, > >> but > >>>>>>> hypothetically this is not currently a requirement, right? This > >> isn't > >>>>> as > >>>>>>> much of a problem if this syntax is SQL-client-specific, but if > it's > >>>>>>> general Flink SQL syntax we should consider this (one way or > >> another). > >>>>>>> > >>>>>>> > >>>>>>> Regards > >>>>>>> Ingo > >>>>>>> > >>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> > >>>>> wrote: > >>>>>>> > >>>>>>>> Hi Shengkai, > >>>>>>>> > >>>>>>>> thanks for updating the FLIP. > >>>>>>>> > >>>>>>>> I have one last comment for the option `table.execution.mode`. > >> Should > >>>>>> we > >>>>>>>> already use the global Flink option `execution.runtime-mode` > >> instead? > >>>>>>>> > >>>>>>>> We are using Flink's options where possible (e.g. `pipeline.name` > >>>>> and > >>>>>>>> `parallism.default`) why not also for batch/streaming mode? > >>>>>>>> > >>>>>>>> The description of the option matches to the Blink planner > >> behavior: > >>>>>>>> > >>>>>>>> ``` > >>>>>>>> Among other things, this controls task scheduling, network shuffle > >>>>>>>> behavior, and time semantics. > >>>>>>>> ``` > >>>>>>>> > >>>>>>>> Regards, > >>>>>>>> Timo > >>>>>>>> > >>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: > >>>>>>>>> Hi, guys. > >>>>>>>>> > >>>>>>>>> I have updated the FLIP. It seems we have reached agreement. > >> Maybe > >>>>>> we > >>>>>>>> can > >>>>>>>>> start the vote soon. If anyone has other questions, please leave > >>>>> your > >>>>>>>>> comments. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Shengkai > >>>>>>>>> > >>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > >>>>>>>>> > >>>>>>>>>> Hi guys, > >>>>>>>>>> > >>>>>>>>>> The conclusion sounds good to me. > >>>>>>>>>> > >>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email] > > > >>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi, Timo, Jark. > >>>>>>>>>>> > >>>>>>>>>>> I am fine with the new option name. > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Shengkai > >>>>>>>>>>> > >>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > >>>>>>>>>>> > >>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. > >>>>>>>>>>>> > >>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks, > >>>>>>>>>>>> Timo > >>>>>>>>>>>> > >>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: > >>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. > >>>>>>>>>>>>> > >>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks > >> like > >>>>>>>>>> single > >>>>>>>>>>>>> statement. > >>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements > from > >>>>>>>>>> opening > >>>>>>>>>>> to > >>>>>>>>>>>>> closing. > >>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. > >>>>>>>>>>>>> > >>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by > >> default), > >>>>>> and > >>>>>>>>>> we > >>>>>>>>>>>> will > >>>>>>>>>>>>> support this config > >>>>>>>>>>>>> in SQL CLI first, will support it in > >>>>>>>>>> TableEnvironment#executeMultiSql() > >>>>>>>>>>>> in > >>>>>>>>>>>>> the future, right? > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Jark > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther < > [hidden email] > >>> > >>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not > >> apply > >>>>>> to > >>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense > >>>>>> when > >>>>>>>>>>>>>> executing multi statements. Once we have a > >>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be > >>>>>>>>>> considered. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms > will > >>>>>> also > >>>>>>>>>>> need > >>>>>>>>>>>>>> to have this config option, which is why I would like to > >>>>> avoid a > >>>>>>> SQL > >>>>>>>>>>>>>> Client specific option. Otherwise every platform has to come > >>>>> up > >>>>>>> with > >>>>>>>>>>>>>> this important config option separately. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or > >> other > >>>>>>>>>>> opinions? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > >>>>>>>>>>>>>>> Hi, all. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I think it may cause user confused. The main problem is we > >>>>>> have > >>>>>>> no > >>>>>>>>>>>> means > >>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the > >>>>> option > >>>>>>>>>> true > >>>>>>>>>>>> and > >>>>>>>>>>>>>>> use `TableResult#await` together. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Shengkai. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Best regards! > >>>>>>>>>> Rui Li > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>> > >> > > > > |
Hi, all
Look like there’s only one divergence about option [ table | sql-client ].dml-sync in this thread, correct me if I’m wrong. 1. Leaving the context of this thread, from a user's perspective, the table.xx configurations should take effect in Table API & SQL, the sql-client.xx configurations should only take effect in sql-client. In my(the user's) opinion, other explanations are counterintuitive. 2. It should be pointed out that both all existed table.xx configurations like table.exec.state.ttl, table.optimizer.agg-phase-strategy, table.local-time-zone,etc.. and the proposed sql-client.xx configurations like sql-client.verbose, sql-client.execution.max-table-result.rows comply with this convention. 3. Considering the portability to support different CLI tools (sql-client, sql-gateway, etc.), I prefer table.dml-sync. In addition, I think sql-client/sql-gateway/other CLI tools can be placed out of flink-table module even in an external project, this should not affect our conclusion. Hope this can help you. Best, Leonard > 在 2021年2月25日,18:51,Shengkai Fang <[hidden email]> 写道: > > Hi, everyone. > > I do some summaries about the discussion about the option. If the summary > has errors, please correct me. > > `table.dml-sync`: > - take effect for `executeMultiSql` and sql client > - benefit: SQL script portability. One script for all platforms. > - drawback: Don't work for `TableEnvironment#executeSql`. > > `table.multi-dml-sync`: > - take effect for `executeMultiSql` and sql client > - benefit: SQL script portability > - drawback: It's confused when the sql script has one dml statement but > need to set option `table.multi-dml-sync` > > `client.dml-sync`: > - take effect for sql client only > - benefit: clear definition. > - drawback: Every platform needs to define its own option. Bad SQL script > portability. > > Just as Jark said, I think the `table.dml-sync` is a good choice if we can > extend its scope and make this option works for `executeSql`. > It's straightforward and users can use this option now in table api. The > drawback is the `TableResult#await` plays the same role as the option. I > don't think the drawback is really critical because many systems have > commands play the same role with the different names. > > Best, > Shengkai > > Timo Walther <[hidden email]> 于2021年2月25日周四 下午4:23写道: > >> The `table.` prefix is meant to be a general option in the table >> ecosystem. Not necessarily attached to Table API or SQL Client. That's >> why SQL Client is also located in the `flink-table` module. >> >> My main concern is the SQL script portability. Declaring the sync/async >> behavior will happen in many SQL scripts. And users should be easily >> switch from SQL Client to some commercial product without the need of >> changing the script again. >> >> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later >> but that would mean introducing future confusion. An app name (what >> `sql-client` kind of is) should not be part of a config option key if >> other apps will need the same kind of option. >> >> Regards, >> Timo >> >> >> On 24.02.21 08:59, Jark Wu wrote: >>>> From my point of view, I also prefer "sql-client.dml-sync", >>> because the behavior of this configuration is very clear. >>> Even if we introduce a new config in the future, e.g. `table.dml-sync`, >>> we can also deprecate the sql-client one. >>> >>> Introducing a "table." configuration without any implementation >>> will confuse users a lot, as they expect it should take effect on >>> the Table API. >>> >>> If we want to introduce an unified "table.dml-sync" option, I prefer >>> it should be implemented on Table API and affect all the DMLs on >>> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`), >>> as I have mentioned before [1]. >>> >>>> It would be very straightforward that it affects all the DMLs on SQL CLI >>> and >>> TableEnvironment (including `executeSql`, `StatementSet`, >>> `Table#executeInsert`, etc.). >>> This can also make SQL CLI easy to support this configuration by passing >>> through to the TableEnv. >>> >>> Best, >>> Jark >>> >>> >>> [1]: >>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html >>> >>> >>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> wrote: >>> >>>> If we all agree the option should only be handled by sql client, then >> why >>>> don't we >>>> just call it `sql-client.dml-sync`? As you said, calling it >>>> `table.dml-sync` but has no >>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a big >>>> confusion for >>>> users. >>>> >>>> The only concern I saw is if we introduce >>>> "TableEnvironment.executeMultiSql()" in the >>>> future, how do we control the synchronization between statements? TBH I >>>> don't really >>>> see a strong requirement for such interfaces. Right now, we have a >> pretty >>>> clear semantic >>>> of `TableEnv.executeSql`, and it's very convenient for users if they >> want >>>> to execute multiple >>>> sql statements. They can simulate either synced or async execution with >>>> this building block. >>>> >>>> This will introduce slight overhead for users, but compared to the >>>> confusion we might >>>> cause if we introduce such a method of our own, I think it's better to >> wait >>>> for some more >>>> feedback. >>>> >>>> Best, >>>> Kurt >>>> >>>> >>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> >> wrote: >>>> >>>>> Hi Kurt, >>>>> >>>>> we can also shorten it to `table.dml-sync` if that would help. Then it >>>>> would confuse users that do a regular `.executeSql("INSERT INTO")` in a >>>>> notebook session. >>>>> >>>>> In any case users will need to learn the semantics of this option. >>>>> `table.multi-dml-sync` should be described as "If a you are in a multi >>>>> statement environment, execute DMLs synchrounous.". I don't have a >>>>> strong opinion on shortening it to `table.dml-sync`. >>>>> >>>>> Just to clarify the implementation: The option should be handled by the >>>>> SQL Client only, but the name can be shared accross platforms. >>>>> >>>>> Regards, >>>>> Timo >>>>> >>>>> >>>>> On 23.02.21 09:54, Kurt Young wrote: >>>>>> Sorry for the late reply, but I'm confused by `table.multi-dml-sync`. >>>>>> >>>>>> IIUC this config will take effect with 2 use cases: >>>>>> 1. SQL client, either interactive mode or executing multiple >> statements >>>>> via >>>>>> -f. In most cases, >>>>>> there will be only one INSERT INTO statement but we are controlling >> the >>>>>> sync/async behavior >>>>>> with "*multi-dml*-sync". I think this will confuse a lot of users. >>>>> Besides, >>>>>> >>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we are >>>>> also >>>>>> not sure if we will >>>>>> really introduce this in the future. >>>>>> >>>>>> I would prefer to introduce this option for only sql client. For >>>>> platforms >>>>>> Timo mentioned which >>>>>> need to control such behavior, I think it's easy and flexible to >>>>> introduce >>>>>> one on their own. >>>>>> >>>>>> Best, >>>>>> Kurt >>>>>> >>>>>> >>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> >>>>> wrote: >>>>>> >>>>>>> Hi everyone. >>>>>>> >>>>>>> Sorry for the late response. >>>>>>> >>>>>>> For `execution.runtime-mode`, I think it's much better than >>>>>>> `table.execution.mode`. Thanks for Timo's suggestions! >>>>>>> >>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should >>>> clarify >>>>> the >>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to >>>>> specify >>>>>>> the table that is fully qualified and only works for the table that >> is >>>>>>> created by the sql statements. >>>>>>> >>>>>>> I have updated the FLIP with suggestions. It seems we have reached a >>>>>>> consensus, I'd like to start a formal vote for the FLIP. >>>>>>> >>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment. >>>>>>> >>>>>>> Best, >>>>>>> Shengkai >>>>>>> >>>>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: >>>>>>> >>>>>>>> Hi Ingo, >>>>>>>> >>>>>>>> 1) I think you are right, the table path should be fully-qualified. >>>>>>>> >>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE >>>>>>>> only aims to print DDL for the tables registered using SQL CREATE >>>> TABLE >>>>>>>> DDL. >>>>>>>> If a table is registered using Table API, e.g. >>>>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`, >>>>>>>> currently it's not possible to print DDL for such tables. >>>>>>>> I think we should point it out in the FLIP. >>>>>>>> >>>>>>>> Best, >>>>>>>> Jark >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I have a couple questions about the SHOW CREATE TABLE statement. >>>>>>>>> >>>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL >>>> should >>>>>>>>> always have the table identifier fully-qualified. Otherwise the DDL >>>>>>>> depends >>>>>>>>> on the current context (catalog/database), which could be >>>> surprising, >>>>>>>>> especially since "the same" table can behave differently if created >>>> in >>>>>>>>> different catalogs. >>>>>>>>> 2) How should this handle tables which cannot be fully >> characterized >>>>> by >>>>>>>>> properties only? I don't know if there's an example for this yet, >>>> but >>>>>>>>> hypothetically this is not currently a requirement, right? This >>>> isn't >>>>>>> as >>>>>>>>> much of a problem if this syntax is SQL-client-specific, but if >> it's >>>>>>>>> general Flink SQL syntax we should consider this (one way or >>>> another). >>>>>>>>> >>>>>>>>> >>>>>>>>> Regards >>>>>>>>> Ingo >>>>>>>>> >>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email]> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi Shengkai, >>>>>>>>>> >>>>>>>>>> thanks for updating the FLIP. >>>>>>>>>> >>>>>>>>>> I have one last comment for the option `table.execution.mode`. >>>> Should >>>>>>>> we >>>>>>>>>> already use the global Flink option `execution.runtime-mode` >>>> instead? >>>>>>>>>> >>>>>>>>>> We are using Flink's options where possible (e.g. `pipeline.name` >>>>>>> and >>>>>>>>>> `parallism.default`) why not also for batch/streaming mode? >>>>>>>>>> >>>>>>>>>> The description of the option matches to the Blink planner >>>> behavior: >>>>>>>>>> >>>>>>>>>> ``` >>>>>>>>>> Among other things, this controls task scheduling, network shuffle >>>>>>>>>> behavior, and time semantics. >>>>>>>>>> ``` >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> Timo >>>>>>>>>> >>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: >>>>>>>>>>> Hi, guys. >>>>>>>>>>> >>>>>>>>>>> I have updated the FLIP. It seems we have reached agreement. >>>> Maybe >>>>>>>> we >>>>>>>>>> can >>>>>>>>>>> start the vote soon. If anyone has other questions, please leave >>>>>>> your >>>>>>>>>>> comments. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Shengkai >>>>>>>>>>> >>>>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: >>>>>>>>>>> >>>>>>>>>>>> Hi guys, >>>>>>>>>>>> >>>>>>>>>>>> The conclusion sounds good to me. >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang <[hidden email] >>> >>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, Timo, Jark. >>>>>>>>>>>>> >>>>>>>>>>>>> I am fine with the new option name. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Shengkai >>>>>>>>>>>>> >>>>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: >>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future work. >>>>>>>>>>>>>> >>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Timo >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: >>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks >>>> like >>>>>>>>>>>> single >>>>>>>>>>>>>>> statement. >>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements >> from >>>>>>>>>>>> opening >>>>>>>>>>>>> to >>>>>>>>>>>>>>> closing. >>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by >>>> default), >>>>>>>> and >>>>>>>>>>>> we >>>>>>>>>>>>>> will >>>>>>>>>>>>>>> support this config >>>>>>>>>>>>>>> in SQL CLI first, will support it in >>>>>>>>>>>> TableEnvironment#executeMultiSql() >>>>>>>>>>>>>> in >>>>>>>>>>>>>>> the future, right? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther < >> [hidden email] >>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not >>>> apply >>>>>>>> to >>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only sense >>>>>>>> when >>>>>>>>>>>>>>>> executing multi statements. Once we have a >>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be >>>>>>>>>>>> considered. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms >> will >>>>>>>> also >>>>>>>>>>>>> need >>>>>>>>>>>>>>>> to have this config option, which is why I would like to >>>>>>> avoid a >>>>>>>>> SQL >>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to come >>>>>>> up >>>>>>>>> with >>>>>>>>>>>>>>>> this important config option separately. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or >>>> other >>>>>>>>>>>>> opinions? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: >>>>>>>>>>>>>>>>> Hi, all. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is we >>>>>>>> have >>>>>>>>> no >>>>>>>>>>>>>> means >>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the >>>>>>> option >>>>>>>>>>>> true >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>> use `TableResult#await` together. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Shengkai. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Best regards! >>>>>>>>>>>> Rui Li >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >> |
I also asked some users about their opinion that if we introduce some
config prefixed with "table" but doesn't have affection with methods in Table API and SQL. All of them are kind of shocked by such question, asking why we would do anything like this. This kind of reaction actually doesn't surprise me a lot, so I jumped in and challenged this config option even after the FLIP had already been accepted. If we only have to define the execution behavior for multiple statements in SQL client, we should only introduce a config option which would tell users it's affection scope by its name. Prefixing with "table" is definitely not a good idea here. Best, Kurt On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <[hidden email]> wrote: > Hi, all > > Look like there’s only one divergence about option [ table | sql-client > ].dml-sync in this thread, correct me if I’m wrong. > > 1. Leaving the context of this thread, from a user's perspective, > the table.xx configurations should take effect in Table API & SQL, > the sql-client.xx configurations should only take effect in sql-client. > In my(the user's) opinion, other explanations are counterintuitive. > > 2. It should be pointed out that both all existed table.xx configurations > like table.exec.state.ttl, table.optimizer.agg-phase-strategy, > table.local-time-zone,etc.. and the proposed sql-client.xx configurations > like sql-client.verbose, sql-client.execution.max-table-result.rows > comply with this convention. > > 3. Considering the portability to support different CLI tools (sql-client, > sql-gateway, etc.), I prefer table.dml-sync. > > In addition, I think sql-client/sql-gateway/other CLI tools can be placed > out of flink-table module even in an external project, this should not > affect our conclusion. > > > Hope this can help you. > > > Best, > Leonard > > > > > 在 2021年2月25日,18:51,Shengkai Fang <[hidden email]> 写道: > > > > Hi, everyone. > > > > I do some summaries about the discussion about the option. If the summary > > has errors, please correct me. > > > > `table.dml-sync`: > > - take effect for `executeMultiSql` and sql client > > - benefit: SQL script portability. One script for all platforms. > > - drawback: Don't work for `TableEnvironment#executeSql`. > > > > `table.multi-dml-sync`: > > - take effect for `executeMultiSql` and sql client > > - benefit: SQL script portability > > - drawback: It's confused when the sql script has one dml statement but > > need to set option `table.multi-dml-sync` > > > > `client.dml-sync`: > > - take effect for sql client only > > - benefit: clear definition. > > - drawback: Every platform needs to define its own option. Bad SQL script > > portability. > > > > Just as Jark said, I think the `table.dml-sync` is a good choice if we > can > > extend its scope and make this option works for `executeSql`. > > It's straightforward and users can use this option now in table api. The > > drawback is the `TableResult#await` plays the same role as the option. > I > > don't think the drawback is really critical because many systems have > > commands play the same role with the different names. > > > > Best, > > Shengkai > > > > Timo Walther <[hidden email]> 于2021年2月25日周四 下午4:23写道: > > > >> The `table.` prefix is meant to be a general option in the table > >> ecosystem. Not necessarily attached to Table API or SQL Client. That's > >> why SQL Client is also located in the `flink-table` module. > >> > >> My main concern is the SQL script portability. Declaring the sync/async > >> behavior will happen in many SQL scripts. And users should be easily > >> switch from SQL Client to some commercial product without the need of > >> changing the script again. > >> > >> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later > >> but that would mean introducing future confusion. An app name (what > >> `sql-client` kind of is) should not be part of a config option key if > >> other apps will need the same kind of option. > >> > >> Regards, > >> Timo > >> > >> > >> On 24.02.21 08:59, Jark Wu wrote: > >>>> From my point of view, I also prefer "sql-client.dml-sync", > >>> because the behavior of this configuration is very clear. > >>> Even if we introduce a new config in the future, e.g. `table.dml-sync`, > >>> we can also deprecate the sql-client one. > >>> > >>> Introducing a "table." configuration without any implementation > >>> will confuse users a lot, as they expect it should take effect on > >>> the Table API. > >>> > >>> If we want to introduce an unified "table.dml-sync" option, I prefer > >>> it should be implemented on Table API and affect all the DMLs on > >>> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`), > >>> as I have mentioned before [1]. > >>> > >>>> It would be very straightforward that it affects all the DMLs on SQL > CLI > >>> and > >>> TableEnvironment (including `executeSql`, `StatementSet`, > >>> `Table#executeInsert`, etc.). > >>> This can also make SQL CLI easy to support this configuration by > passing > >>> through to the TableEnv. > >>> > >>> Best, > >>> Jark > >>> > >>> > >>> [1]: > >>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html > >>> > >>> > >>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> wrote: > >>> > >>>> If we all agree the option should only be handled by sql client, then > >> why > >>>> don't we > >>>> just call it `sql-client.dml-sync`? As you said, calling it > >>>> `table.dml-sync` but has no > >>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a > big > >>>> confusion for > >>>> users. > >>>> > >>>> The only concern I saw is if we introduce > >>>> "TableEnvironment.executeMultiSql()" in the > >>>> future, how do we control the synchronization between statements? TBH > I > >>>> don't really > >>>> see a strong requirement for such interfaces. Right now, we have a > >> pretty > >>>> clear semantic > >>>> of `TableEnv.executeSql`, and it's very convenient for users if they > >> want > >>>> to execute multiple > >>>> sql statements. They can simulate either synced or async execution > with > >>>> this building block. > >>>> > >>>> This will introduce slight overhead for users, but compared to the > >>>> confusion we might > >>>> cause if we introduce such a method of our own, I think it's better to > >> wait > >>>> for some more > >>>> feedback. > >>>> > >>>> Best, > >>>> Kurt > >>>> > >>>> > >>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> > >> wrote: > >>>> > >>>>> Hi Kurt, > >>>>> > >>>>> we can also shorten it to `table.dml-sync` if that would help. Then > it > >>>>> would confuse users that do a regular `.executeSql("INSERT INTO")` > in a > >>>>> notebook session. > >>>>> > >>>>> In any case users will need to learn the semantics of this option. > >>>>> `table.multi-dml-sync` should be described as "If a you are in a > multi > >>>>> statement environment, execute DMLs synchrounous.". I don't have a > >>>>> strong opinion on shortening it to `table.dml-sync`. > >>>>> > >>>>> Just to clarify the implementation: The option should be handled by > the > >>>>> SQL Client only, but the name can be shared accross platforms. > >>>>> > >>>>> Regards, > >>>>> Timo > >>>>> > >>>>> > >>>>> On 23.02.21 09:54, Kurt Young wrote: > >>>>>> Sorry for the late reply, but I'm confused by > `table.multi-dml-sync`. > >>>>>> > >>>>>> IIUC this config will take effect with 2 use cases: > >>>>>> 1. SQL client, either interactive mode or executing multiple > >> statements > >>>>> via > >>>>>> -f. In most cases, > >>>>>> there will be only one INSERT INTO statement but we are controlling > >> the > >>>>>> sync/async behavior > >>>>>> with "*multi-dml*-sync". I think this will confuse a lot of users. > >>>>> Besides, > >>>>>> > >>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we > are > >>>>> also > >>>>>> not sure if we will > >>>>>> really introduce this in the future. > >>>>>> > >>>>>> I would prefer to introduce this option for only sql client. For > >>>>> platforms > >>>>>> Timo mentioned which > >>>>>> need to control such behavior, I think it's easy and flexible to > >>>>> introduce > >>>>>> one on their own. > >>>>>> > >>>>>> Best, > >>>>>> Kurt > >>>>>> > >>>>>> > >>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> > >>>>> wrote: > >>>>>> > >>>>>>> Hi everyone. > >>>>>>> > >>>>>>> Sorry for the late response. > >>>>>>> > >>>>>>> For `execution.runtime-mode`, I think it's much better than > >>>>>>> `table.execution.mode`. Thanks for Timo's suggestions! > >>>>>>> > >>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should > >>>> clarify > >>>>> the > >>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to > >>>>> specify > >>>>>>> the table that is fully qualified and only works for the table that > >> is > >>>>>>> created by the sql statements. > >>>>>>> > >>>>>>> I have updated the FLIP with suggestions. It seems we have reached > a > >>>>>>> consensus, I'd like to start a formal vote for the FLIP. > >>>>>>> > >>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment. > >>>>>>> > >>>>>>> Best, > >>>>>>> Shengkai > >>>>>>> > >>>>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > >>>>>>> > >>>>>>>> Hi Ingo, > >>>>>>>> > >>>>>>>> 1) I think you are right, the table path should be > fully-qualified. > >>>>>>>> > >>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE > >>>>>>>> only aims to print DDL for the tables registered using SQL CREATE > >>>> TABLE > >>>>>>>> DDL. > >>>>>>>> If a table is registered using Table API, e.g. > >>>>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`, > >>>>>>>> currently it's not possible to print DDL for such tables. > >>>>>>>> I think we should point it out in the FLIP. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Jark > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> > wrote: > >>>>>>>> > >>>>>>>>> Hi all, > >>>>>>>>> > >>>>>>>>> I have a couple questions about the SHOW CREATE TABLE statement. > >>>>>>>>> > >>>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL > >>>> should > >>>>>>>>> always have the table identifier fully-qualified. Otherwise the > DDL > >>>>>>>> depends > >>>>>>>>> on the current context (catalog/database), which could be > >>>> surprising, > >>>>>>>>> especially since "the same" table can behave differently if > created > >>>> in > >>>>>>>>> different catalogs. > >>>>>>>>> 2) How should this handle tables which cannot be fully > >> characterized > >>>>> by > >>>>>>>>> properties only? I don't know if there's an example for this yet, > >>>> but > >>>>>>>>> hypothetically this is not currently a requirement, right? This > >>>> isn't > >>>>>>> as > >>>>>>>>> much of a problem if this syntax is SQL-client-specific, but if > >> it's > >>>>>>>>> general Flink SQL syntax we should consider this (one way or > >>>> another). > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Regards > >>>>>>>>> Ingo > >>>>>>>>> > >>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email] > > > >>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Hi Shengkai, > >>>>>>>>>> > >>>>>>>>>> thanks for updating the FLIP. > >>>>>>>>>> > >>>>>>>>>> I have one last comment for the option `table.execution.mode`. > >>>> Should > >>>>>>>> we > >>>>>>>>>> already use the global Flink option `execution.runtime-mode` > >>>> instead? > >>>>>>>>>> > >>>>>>>>>> We are using Flink's options where possible (e.g. ` > pipeline.name` > >>>>>>> and > >>>>>>>>>> `parallism.default`) why not also for batch/streaming mode? > >>>>>>>>>> > >>>>>>>>>> The description of the option matches to the Blink planner > >>>> behavior: > >>>>>>>>>> > >>>>>>>>>> ``` > >>>>>>>>>> Among other things, this controls task scheduling, network > shuffle > >>>>>>>>>> behavior, and time semantics. > >>>>>>>>>> ``` > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Timo > >>>>>>>>>> > >>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: > >>>>>>>>>>> Hi, guys. > >>>>>>>>>>> > >>>>>>>>>>> I have updated the FLIP. It seems we have reached agreement. > >>>> Maybe > >>>>>>>> we > >>>>>>>>>> can > >>>>>>>>>>> start the vote soon. If anyone has other questions, please > leave > >>>>>>> your > >>>>>>>>>>> comments. > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> Shengkai > >>>>>>>>>>> > >>>>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > >>>>>>>>>>> > >>>>>>>>>>>> Hi guys, > >>>>>>>>>>>> > >>>>>>>>>>>> The conclusion sounds good to me. > >>>>>>>>>>>> > >>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang < > [hidden email] > >>> > >>>>>>>>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi, Timo, Jark. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I am fine with the new option name. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Shengkai > >>>>>>>>>>>>> > >>>>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future > work. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: > >>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks > >>>> like > >>>>>>>>>>>> single > >>>>>>>>>>>>>>> statement. > >>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements > >> from > >>>>>>>>>>>> opening > >>>>>>>>>>>>> to > >>>>>>>>>>>>>>> closing. > >>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by > >>>> default), > >>>>>>>> and > >>>>>>>>>>>> we > >>>>>>>>>>>>>> will > >>>>>>>>>>>>>>> support this config > >>>>>>>>>>>>>>> in SQL CLI first, will support it in > >>>>>>>>>>>> TableEnvironment#executeMultiSql() > >>>>>>>>>>>>>> in > >>>>>>>>>>>>>>> the future, right? > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther < > >> [hidden email] > >>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not > >>>> apply > >>>>>>>> to > >>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only > sense > >>>>>>>> when > >>>>>>>>>>>>>>>> executing multi statements. Once we have a > >>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be > >>>>>>>>>>>> considered. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms > >> will > >>>>>>>> also > >>>>>>>>>>>>> need > >>>>>>>>>>>>>>>> to have this config option, which is why I would like to > >>>>>>> avoid a > >>>>>>>>> SQL > >>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to > come > >>>>>>> up > >>>>>>>>> with > >>>>>>>>>>>>>>>> this important config option separately. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or > >>>> other > >>>>>>>>>>>>> opinions? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > >>>>>>>>>>>>>>>>> Hi, all. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is > we > >>>>>>>> have > >>>>>>>>> no > >>>>>>>>>>>>>> means > >>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the > >>>>>>> option > >>>>>>>>>>>> true > >>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>> use `TableResult#await` together. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Shengkai. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Best regards! > >>>>>>>>>>>> Rui Li > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>> > >> > >> > > |
We could also think about reading this config option in Table API. The
effect would be to call `await()` directly in an execute call. I could also imagine this to be useful esp. when you fire a lot of insert into queries. We had the case before that users where confused that the execution happens asynchronously, such an option could prevent this to happen again. Regards, Timo On 01.03.21 05:14, Kurt Young wrote: > I also asked some users about their opinion that if we introduce some > config prefixed with "table" but doesn't > have affection with methods in Table API and SQL. All of them are kind of > shocked by such question, asking > why we would do anything like this. > > This kind of reaction actually doesn't surprise me a lot, so I jumped in > and challenged this config option even > after the FLIP had already been accepted. > > If we only have to define the execution behavior for multiple statements in > SQL client, we should only introduce > a config option which would tell users it's affection scope by its name. > Prefixing with "table" is definitely not a good > idea here. > > Best, > Kurt > > > On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <[hidden email]> wrote: > >> Hi, all >> >> Look like there’s only one divergence about option [ table | sql-client >> ].dml-sync in this thread, correct me if I’m wrong. >> >> 1. Leaving the context of this thread, from a user's perspective, >> the table.xx configurations should take effect in Table API & SQL, >> the sql-client.xx configurations should only take effect in sql-client. >> In my(the user's) opinion, other explanations are counterintuitive. >> >> 2. It should be pointed out that both all existed table.xx configurations >> like table.exec.state.ttl, table.optimizer.agg-phase-strategy, >> table.local-time-zone,etc.. and the proposed sql-client.xx configurations >> like sql-client.verbose, sql-client.execution.max-table-result.rows >> comply with this convention. >> >> 3. Considering the portability to support different CLI tools (sql-client, >> sql-gateway, etc.), I prefer table.dml-sync. >> >> In addition, I think sql-client/sql-gateway/other CLI tools can be placed >> out of flink-table module even in an external project, this should not >> affect our conclusion. >> >> >> Hope this can help you. >> >> >> Best, >> Leonard >> >> >> >>> 在 2021年2月25日,18:51,Shengkai Fang <[hidden email]> 写道: >>> >>> Hi, everyone. >>> >>> I do some summaries about the discussion about the option. If the summary >>> has errors, please correct me. >>> >>> `table.dml-sync`: >>> - take effect for `executeMultiSql` and sql client >>> - benefit: SQL script portability. One script for all platforms. >>> - drawback: Don't work for `TableEnvironment#executeSql`. >>> >>> `table.multi-dml-sync`: >>> - take effect for `executeMultiSql` and sql client >>> - benefit: SQL script portability >>> - drawback: It's confused when the sql script has one dml statement but >>> need to set option `table.multi-dml-sync` >>> >>> `client.dml-sync`: >>> - take effect for sql client only >>> - benefit: clear definition. >>> - drawback: Every platform needs to define its own option. Bad SQL script >>> portability. >>> >>> Just as Jark said, I think the `table.dml-sync` is a good choice if we >> can >>> extend its scope and make this option works for `executeSql`. >>> It's straightforward and users can use this option now in table api. The >>> drawback is the `TableResult#await` plays the same role as the option. >> I >>> don't think the drawback is really critical because many systems have >>> commands play the same role with the different names. >>> >>> Best, >>> Shengkai >>> >>> Timo Walther <[hidden email]> 于2021年2月25日周四 下午4:23写道: >>> >>>> The `table.` prefix is meant to be a general option in the table >>>> ecosystem. Not necessarily attached to Table API or SQL Client. That's >>>> why SQL Client is also located in the `flink-table` module. >>>> >>>> My main concern is the SQL script portability. Declaring the sync/async >>>> behavior will happen in many SQL scripts. And users should be easily >>>> switch from SQL Client to some commercial product without the need of >>>> changing the script again. >>>> >>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` later >>>> but that would mean introducing future confusion. An app name (what >>>> `sql-client` kind of is) should not be part of a config option key if >>>> other apps will need the same kind of option. >>>> >>>> Regards, >>>> Timo >>>> >>>> >>>> On 24.02.21 08:59, Jark Wu wrote: >>>>>> From my point of view, I also prefer "sql-client.dml-sync", >>>>> because the behavior of this configuration is very clear. >>>>> Even if we introduce a new config in the future, e.g. `table.dml-sync`, >>>>> we can also deprecate the sql-client one. >>>>> >>>>> Introducing a "table." configuration without any implementation >>>>> will confuse users a lot, as they expect it should take effect on >>>>> the Table API. >>>>> >>>>> If we want to introduce an unified "table.dml-sync" option, I prefer >>>>> it should be implemented on Table API and affect all the DMLs on >>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`), >>>>> as I have mentioned before [1]. >>>>> >>>>>> It would be very straightforward that it affects all the DMLs on SQL >> CLI >>>>> and >>>>> TableEnvironment (including `executeSql`, `StatementSet`, >>>>> `Table#executeInsert`, etc.). >>>>> This can also make SQL CLI easy to support this configuration by >> passing >>>>> through to the TableEnv. >>>>> >>>>> Best, >>>>> Jark >>>>> >>>>> >>>>> [1]: >>>>> >>>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html >>>>> >>>>> >>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> wrote: >>>>> >>>>>> If we all agree the option should only be handled by sql client, then >>>> why >>>>>> don't we >>>>>> just call it `sql-client.dml-sync`? As you said, calling it >>>>>> `table.dml-sync` but has no >>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a >> big >>>>>> confusion for >>>>>> users. >>>>>> >>>>>> The only concern I saw is if we introduce >>>>>> "TableEnvironment.executeMultiSql()" in the >>>>>> future, how do we control the synchronization between statements? TBH >> I >>>>>> don't really >>>>>> see a strong requirement for such interfaces. Right now, we have a >>>> pretty >>>>>> clear semantic >>>>>> of `TableEnv.executeSql`, and it's very convenient for users if they >>>> want >>>>>> to execute multiple >>>>>> sql statements. They can simulate either synced or async execution >> with >>>>>> this building block. >>>>>> >>>>>> This will introduce slight overhead for users, but compared to the >>>>>> confusion we might >>>>>> cause if we introduce such a method of our own, I think it's better to >>>> wait >>>>>> for some more >>>>>> feedback. >>>>>> >>>>>> Best, >>>>>> Kurt >>>>>> >>>>>> >>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> >>>> wrote: >>>>>> >>>>>>> Hi Kurt, >>>>>>> >>>>>>> we can also shorten it to `table.dml-sync` if that would help. Then >> it >>>>>>> would confuse users that do a regular `.executeSql("INSERT INTO")` >> in a >>>>>>> notebook session. >>>>>>> >>>>>>> In any case users will need to learn the semantics of this option. >>>>>>> `table.multi-dml-sync` should be described as "If a you are in a >> multi >>>>>>> statement environment, execute DMLs synchrounous.". I don't have a >>>>>>> strong opinion on shortening it to `table.dml-sync`. >>>>>>> >>>>>>> Just to clarify the implementation: The option should be handled by >> the >>>>>>> SQL Client only, but the name can be shared accross platforms. >>>>>>> >>>>>>> Regards, >>>>>>> Timo >>>>>>> >>>>>>> >>>>>>> On 23.02.21 09:54, Kurt Young wrote: >>>>>>>> Sorry for the late reply, but I'm confused by >> `table.multi-dml-sync`. >>>>>>>> >>>>>>>> IIUC this config will take effect with 2 use cases: >>>>>>>> 1. SQL client, either interactive mode or executing multiple >>>> statements >>>>>>> via >>>>>>>> -f. In most cases, >>>>>>>> there will be only one INSERT INTO statement but we are controlling >>>> the >>>>>>>> sync/async behavior >>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of users. >>>>>>> Besides, >>>>>>>> >>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we >> are >>>>>>> also >>>>>>>> not sure if we will >>>>>>>> really introduce this in the future. >>>>>>>> >>>>>>>> I would prefer to introduce this option for only sql client. For >>>>>>> platforms >>>>>>>> Timo mentioned which >>>>>>>> need to control such behavior, I think it's easy and flexible to >>>>>>> introduce >>>>>>>> one on their own. >>>>>>>> >>>>>>>> Best, >>>>>>>> Kurt >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email]> >>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi everyone. >>>>>>>>> >>>>>>>>> Sorry for the late response. >>>>>>>>> >>>>>>>>> For `execution.runtime-mode`, I think it's much better than >>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions! >>>>>>>>> >>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should >>>>>> clarify >>>>>>> the >>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed to >>>>>>> specify >>>>>>>>> the table that is fully qualified and only works for the table that >>>> is >>>>>>>>> created by the sql statements. >>>>>>>>> >>>>>>>>> I have updated the FLIP with suggestions. It seems we have reached >> a >>>>>>>>> consensus, I'd like to start a formal vote for the FLIP. >>>>>>>>> >>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Shengkai >>>>>>>>> >>>>>>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: >>>>>>>>> >>>>>>>>>> Hi Ingo, >>>>>>>>>> >>>>>>>>>> 1) I think you are right, the table path should be >> fully-qualified. >>>>>>>>>> >>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE >>>>>>>>>> only aims to print DDL for the tables registered using SQL CREATE >>>>>> TABLE >>>>>>>>>> DDL. >>>>>>>>>> If a table is registered using Table API, e.g. >>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String, DataStream)`, >>>>>>>>>> currently it's not possible to print DDL for such tables. >>>>>>>>>> I think we should point it out in the FLIP. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Jark >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> >> wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE statement. >>>>>>>>>>> >>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL >>>>>> should >>>>>>>>>>> always have the table identifier fully-qualified. Otherwise the >> DDL >>>>>>>>>> depends >>>>>>>>>>> on the current context (catalog/database), which could be >>>>>> surprising, >>>>>>>>>>> especially since "the same" table can behave differently if >> created >>>>>> in >>>>>>>>>>> different catalogs. >>>>>>>>>>> 2) How should this handle tables which cannot be fully >>>> characterized >>>>>>> by >>>>>>>>>>> properties only? I don't know if there's an example for this yet, >>>>>> but >>>>>>>>>>> hypothetically this is not currently a requirement, right? This >>>>>> isn't >>>>>>>>> as >>>>>>>>>>> much of a problem if this syntax is SQL-client-specific, but if >>>> it's >>>>>>>>>>> general Flink SQL syntax we should consider this (one way or >>>>>> another). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> Ingo >>>>>>>>>>> >>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther <[hidden email] >>> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Shengkai, >>>>>>>>>>>> >>>>>>>>>>>> thanks for updating the FLIP. >>>>>>>>>>>> >>>>>>>>>>>> I have one last comment for the option `table.execution.mode`. >>>>>> Should >>>>>>>>>> we >>>>>>>>>>>> already use the global Flink option `execution.runtime-mode` >>>>>> instead? >>>>>>>>>>>> >>>>>>>>>>>> We are using Flink's options where possible (e.g. ` >> pipeline.name` >>>>>>>>> and >>>>>>>>>>>> `parallism.default`) why not also for batch/streaming mode? >>>>>>>>>>>> >>>>>>>>>>>> The description of the option matches to the Blink planner >>>>>> behavior: >>>>>>>>>>>> >>>>>>>>>>>> ``` >>>>>>>>>>>> Among other things, this controls task scheduling, network >> shuffle >>>>>>>>>>>> behavior, and time semantics. >>>>>>>>>>>> ``` >>>>>>>>>>>> >>>>>>>>>>>> Regards, >>>>>>>>>>>> Timo >>>>>>>>>>>> >>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: >>>>>>>>>>>>> Hi, guys. >>>>>>>>>>>>> >>>>>>>>>>>>> I have updated the FLIP. It seems we have reached agreement. >>>>>> Maybe >>>>>>>>>> we >>>>>>>>>>>> can >>>>>>>>>>>>> start the vote soon. If anyone has other questions, please >> leave >>>>>>>>> your >>>>>>>>>>>>> comments. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Shengkai >>>>>>>>>>>>> >>>>>>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi guys, >>>>>>>>>>>>>> >>>>>>>>>>>>>> The conclusion sounds good to me. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang < >> [hidden email] >>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, Timo, Jark. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I am fine with the new option name. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Shengkai >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future >> work. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: >>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI looks >>>>>> like >>>>>>>>>>>>>> single >>>>>>>>>>>>>>>>> statement. >>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements >>>> from >>>>>>>>>>>>>> opening >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>> closing. >>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by >>>>>> default), >>>>>>>>>> and >>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>> support this config >>>>>>>>>>>>>>>>> in SQL CLI first, will support it in >>>>>>>>>>>>>> TableEnvironment#executeMultiSql() >>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>> the future, right? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther < >>>> [hidden email] >>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not >>>>>> apply >>>>>>>>>> to >>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only >> sense >>>>>>>>>> when >>>>>>>>>>>>>>>>>> executing multi statements. Once we have a >>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could be >>>>>>>>>>>>>> considered. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms >>>> will >>>>>>>>>> also >>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>> to have this config option, which is why I would like to >>>>>>>>> avoid a >>>>>>>>>>> SQL >>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to >> come >>>>>>>>> up >>>>>>>>>>> with >>>>>>>>>>>>>>>>>> this important config option separately. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or >>>>>> other >>>>>>>>>>>>>>> opinions? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: >>>>>>>>>>>>>>>>>>> Hi, all. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is >> we >>>>>>>>>> have >>>>>>>>>>> no >>>>>>>>>>>>>>>> means >>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set the >>>>>>>>> option >>>>>>>>>>>>>> true >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>> use `TableResult#await` together. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> Shengkai. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Best regards! >>>>>>>>>>>>>> Rui Li >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>>> >> >> > |
I'm +1 for either:
1. introduce a sql client specific option, or 2. Introduce a table config option and make it apply to both table module & sql client. It would be the FLIP owner's call to decide. Best, Kurt On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <[hidden email]> wrote: > We could also think about reading this config option in Table API. The > effect would be to call `await()` directly in an execute call. I could > also imagine this to be useful esp. when you fire a lot of insert into > queries. We had the case before that users where confused that the > execution happens asynchronously, such an option could prevent this to > happen again. > > Regards, > Timo > > On 01.03.21 05:14, Kurt Young wrote: > > I also asked some users about their opinion that if we introduce some > > config prefixed with "table" but doesn't > > have affection with methods in Table API and SQL. All of them are kind of > > shocked by such question, asking > > why we would do anything like this. > > > > This kind of reaction actually doesn't surprise me a lot, so I jumped in > > and challenged this config option even > > after the FLIP had already been accepted. > > > > If we only have to define the execution behavior for multiple statements > in > > SQL client, we should only introduce > > a config option which would tell users it's affection scope by its name. > > Prefixing with "table" is definitely not a good > > idea here. > > > > Best, > > Kurt > > > > > > On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <[hidden email]> wrote: > > > >> Hi, all > >> > >> Look like there’s only one divergence about option [ table | sql-client > >> ].dml-sync in this thread, correct me if I’m wrong. > >> > >> 1. Leaving the context of this thread, from a user's perspective, > >> the table.xx configurations should take effect in Table API & SQL, > >> the sql-client.xx configurations should only take effect in sql-client. > >> In my(the user's) opinion, other explanations are counterintuitive. > >> > >> 2. It should be pointed out that both all existed table.xx > configurations > >> like table.exec.state.ttl, table.optimizer.agg-phase-strategy, > >> table.local-time-zone,etc.. and the proposed sql-client.xx > configurations > >> like sql-client.verbose, sql-client.execution.max-table-result.rows > >> comply with this convention. > >> > >> 3. Considering the portability to support different CLI tools > (sql-client, > >> sql-gateway, etc.), I prefer table.dml-sync. > >> > >> In addition, I think sql-client/sql-gateway/other CLI tools can be > placed > >> out of flink-table module even in an external project, this should not > >> affect our conclusion. > >> > >> > >> Hope this can help you. > >> > >> > >> Best, > >> Leonard > >> > >> > >> > >>> 在 2021年2月25日,18:51,Shengkai Fang <[hidden email]> 写道: > >>> > >>> Hi, everyone. > >>> > >>> I do some summaries about the discussion about the option. If the > summary > >>> has errors, please correct me. > >>> > >>> `table.dml-sync`: > >>> - take effect for `executeMultiSql` and sql client > >>> - benefit: SQL script portability. One script for all platforms. > >>> - drawback: Don't work for `TableEnvironment#executeSql`. > >>> > >>> `table.multi-dml-sync`: > >>> - take effect for `executeMultiSql` and sql client > >>> - benefit: SQL script portability > >>> - drawback: It's confused when the sql script has one dml statement but > >>> need to set option `table.multi-dml-sync` > >>> > >>> `client.dml-sync`: > >>> - take effect for sql client only > >>> - benefit: clear definition. > >>> - drawback: Every platform needs to define its own option. Bad SQL > script > >>> portability. > >>> > >>> Just as Jark said, I think the `table.dml-sync` is a good choice if we > >> can > >>> extend its scope and make this option works for `executeSql`. > >>> It's straightforward and users can use this option now in table api. > The > >>> drawback is the `TableResult#await` plays the same role as the option. > >> I > >>> don't think the drawback is really critical because many systems have > >>> commands play the same role with the different names. > >>> > >>> Best, > >>> Shengkai > >>> > >>> Timo Walther <[hidden email]> 于2021年2月25日周四 下午4:23写道: > >>> > >>>> The `table.` prefix is meant to be a general option in the table > >>>> ecosystem. Not necessarily attached to Table API or SQL Client. That's > >>>> why SQL Client is also located in the `flink-table` module. > >>>> > >>>> My main concern is the SQL script portability. Declaring the > sync/async > >>>> behavior will happen in many SQL scripts. And users should be easily > >>>> switch from SQL Client to some commercial product without the need of > >>>> changing the script again. > >>>> > >>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` > later > >>>> but that would mean introducing future confusion. An app name (what > >>>> `sql-client` kind of is) should not be part of a config option key if > >>>> other apps will need the same kind of option. > >>>> > >>>> Regards, > >>>> Timo > >>>> > >>>> > >>>> On 24.02.21 08:59, Jark Wu wrote: > >>>>>> From my point of view, I also prefer "sql-client.dml-sync", > >>>>> because the behavior of this configuration is very clear. > >>>>> Even if we introduce a new config in the future, e.g. > `table.dml-sync`, > >>>>> we can also deprecate the sql-client one. > >>>>> > >>>>> Introducing a "table." configuration without any implementation > >>>>> will confuse users a lot, as they expect it should take effect on > >>>>> the Table API. > >>>>> > >>>>> If we want to introduce an unified "table.dml-sync" option, I prefer > >>>>> it should be implemented on Table API and affect all the DMLs on > >>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`, `StatementSet`), > >>>>> as I have mentioned before [1]. > >>>>> > >>>>>> It would be very straightforward that it affects all the DMLs on SQL > >> CLI > >>>>> and > >>>>> TableEnvironment (including `executeSql`, `StatementSet`, > >>>>> `Table#executeInsert`, etc.). > >>>>> This can also make SQL CLI easy to support this configuration by > >> passing > >>>>> through to the TableEnv. > >>>>> > >>>>> Best, > >>>>> Jark > >>>>> > >>>>> > >>>>> [1]: > >>>>> > >>>> > >> > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html > >>>>> > >>>>> > >>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> wrote: > >>>>> > >>>>>> If we all agree the option should only be handled by sql client, > then > >>>> why > >>>>>> don't we > >>>>>> just call it `sql-client.dml-sync`? As you said, calling it > >>>>>> `table.dml-sync` but has no > >>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause a > >> big > >>>>>> confusion for > >>>>>> users. > >>>>>> > >>>>>> The only concern I saw is if we introduce > >>>>>> "TableEnvironment.executeMultiSql()" in the > >>>>>> future, how do we control the synchronization between statements? > TBH > >> I > >>>>>> don't really > >>>>>> see a strong requirement for such interfaces. Right now, we have a > >>>> pretty > >>>>>> clear semantic > >>>>>> of `TableEnv.executeSql`, and it's very convenient for users if they > >>>> want > >>>>>> to execute multiple > >>>>>> sql statements. They can simulate either synced or async execution > >> with > >>>>>> this building block. > >>>>>> > >>>>>> This will introduce slight overhead for users, but compared to the > >>>>>> confusion we might > >>>>>> cause if we introduce such a method of our own, I think it's better > to > >>>> wait > >>>>>> for some more > >>>>>> feedback. > >>>>>> > >>>>>> Best, > >>>>>> Kurt > >>>>>> > >>>>>> > >>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> > >>>> wrote: > >>>>>> > >>>>>>> Hi Kurt, > >>>>>>> > >>>>>>> we can also shorten it to `table.dml-sync` if that would help. Then > >> it > >>>>>>> would confuse users that do a regular `.executeSql("INSERT INTO")` > >> in a > >>>>>>> notebook session. > >>>>>>> > >>>>>>> In any case users will need to learn the semantics of this option. > >>>>>>> `table.multi-dml-sync` should be described as "If a you are in a > >> multi > >>>>>>> statement environment, execute DMLs synchrounous.". I don't have a > >>>>>>> strong opinion on shortening it to `table.dml-sync`. > >>>>>>> > >>>>>>> Just to clarify the implementation: The option should be handled by > >> the > >>>>>>> SQL Client only, but the name can be shared accross platforms. > >>>>>>> > >>>>>>> Regards, > >>>>>>> Timo > >>>>>>> > >>>>>>> > >>>>>>> On 23.02.21 09:54, Kurt Young wrote: > >>>>>>>> Sorry for the late reply, but I'm confused by > >> `table.multi-dml-sync`. > >>>>>>>> > >>>>>>>> IIUC this config will take effect with 2 use cases: > >>>>>>>> 1. SQL client, either interactive mode or executing multiple > >>>> statements > >>>>>>> via > >>>>>>>> -f. In most cases, > >>>>>>>> there will be only one INSERT INTO statement but we are > controlling > >>>> the > >>>>>>>> sync/async behavior > >>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of users. > >>>>>>> Besides, > >>>>>>>> > >>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, we > >> are > >>>>>>> also > >>>>>>>> not sure if we will > >>>>>>>> really introduce this in the future. > >>>>>>>> > >>>>>>>> I would prefer to introduce this option for only sql client. For > >>>>>>> platforms > >>>>>>>> Timo mentioned which > >>>>>>>> need to control such behavior, I think it's easy and flexible to > >>>>>>> introduce > >>>>>>>> one on their own. > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> Kurt > >>>>>>>> > >>>>>>>> > >>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang <[hidden email] > > > >>>>>>> wrote: > >>>>>>>> > >>>>>>>>> Hi everyone. > >>>>>>>>> > >>>>>>>>> Sorry for the late response. > >>>>>>>>> > >>>>>>>>> For `execution.runtime-mode`, I think it's much better than > >>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions! > >>>>>>>>> > >>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should > >>>>>> clarify > >>>>>>> the > >>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed > to > >>>>>>> specify > >>>>>>>>> the table that is fully qualified and only works for the table > that > >>>> is > >>>>>>>>> created by the sql statements. > >>>>>>>>> > >>>>>>>>> I have updated the FLIP with suggestions. It seems we have > reached > >> a > >>>>>>>>> consensus, I'd like to start a formal vote for the FLIP. > >>>>>>>>> > >>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment. > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> Shengkai > >>>>>>>>> > >>>>>>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > >>>>>>>>> > >>>>>>>>>> Hi Ingo, > >>>>>>>>>> > >>>>>>>>>> 1) I think you are right, the table path should be > >> fully-qualified. > >>>>>>>>>> > >>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE > >>>>>>>>>> only aims to print DDL for the tables registered using SQL > CREATE > >>>>>> TABLE > >>>>>>>>>> DDL. > >>>>>>>>>> If a table is registered using Table API, e.g. > >>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String, > DataStream)`, > >>>>>>>>>> currently it's not possible to print DDL for such tables. > >>>>>>>>>> I think we should point it out in the FLIP. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Jark > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> > >> wrote: > >>>>>>>>>> > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE > statement. > >>>>>>>>>>> > >>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned DDL > >>>>>> should > >>>>>>>>>>> always have the table identifier fully-qualified. Otherwise the > >> DDL > >>>>>>>>>> depends > >>>>>>>>>>> on the current context (catalog/database), which could be > >>>>>> surprising, > >>>>>>>>>>> especially since "the same" table can behave differently if > >> created > >>>>>> in > >>>>>>>>>>> different catalogs. > >>>>>>>>>>> 2) How should this handle tables which cannot be fully > >>>> characterized > >>>>>>> by > >>>>>>>>>>> properties only? I don't know if there's an example for this > yet, > >>>>>> but > >>>>>>>>>>> hypothetically this is not currently a requirement, right? This > >>>>>> isn't > >>>>>>>>> as > >>>>>>>>>>> much of a problem if this syntax is SQL-client-specific, but if > >>>> it's > >>>>>>>>>>> general Flink SQL syntax we should consider this (one way or > >>>>>> another). > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Regards > >>>>>>>>>>> Ingo > >>>>>>>>>>> > >>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther < > [hidden email] > >>> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Hi Shengkai, > >>>>>>>>>>>> > >>>>>>>>>>>> thanks for updating the FLIP. > >>>>>>>>>>>> > >>>>>>>>>>>> I have one last comment for the option `table.execution.mode`. > >>>>>> Should > >>>>>>>>>> we > >>>>>>>>>>>> already use the global Flink option `execution.runtime-mode` > >>>>>> instead? > >>>>>>>>>>>> > >>>>>>>>>>>> We are using Flink's options where possible (e.g. ` > >> pipeline.name` > >>>>>>>>> and > >>>>>>>>>>>> `parallism.default`) why not also for batch/streaming mode? > >>>>>>>>>>>> > >>>>>>>>>>>> The description of the option matches to the Blink planner > >>>>>> behavior: > >>>>>>>>>>>> > >>>>>>>>>>>> ``` > >>>>>>>>>>>> Among other things, this controls task scheduling, network > >> shuffle > >>>>>>>>>>>> behavior, and time semantics. > >>>>>>>>>>>> ``` > >>>>>>>>>>>> > >>>>>>>>>>>> Regards, > >>>>>>>>>>>> Timo > >>>>>>>>>>>> > >>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: > >>>>>>>>>>>>> Hi, guys. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have updated the FLIP. It seems we have reached agreement. > >>>>>> Maybe > >>>>>>>>>> we > >>>>>>>>>>>> can > >>>>>>>>>>>>> start the vote soon. If anyone has other questions, please > >> leave > >>>>>>>>> your > >>>>>>>>>>>>> comments. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> Shengkai > >>>>>>>>>>>>> > >>>>>>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi guys, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The conclusion sounds good to me. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang < > >> [hidden email] > >>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi, Timo, Jark. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I am fine with the new option name. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Shengkai > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future > >> work. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: > >>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI > looks > >>>>>> like > >>>>>>>>>>>>>> single > >>>>>>>>>>>>>>>>> statement. > >>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting statements > >>>> from > >>>>>>>>>>>>>> opening > >>>>>>>>>>>>>>> to > >>>>>>>>>>>>>>>>> closing. > >>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by > >>>>>> default), > >>>>>>>>>> and > >>>>>>>>>>>>>> we > >>>>>>>>>>>>>>>> will > >>>>>>>>>>>>>>>>> support this config > >>>>>>>>>>>>>>>>> in SQL CLI first, will support it in > >>>>>>>>>>>>>> TableEnvironment#executeMultiSql() > >>>>>>>>>>>>>>>> in > >>>>>>>>>>>>>>>>> the future, right? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> Jark > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther < > >>>> [hidden email] > >>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should not > >>>>>> apply > >>>>>>>>>> to > >>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only > >> sense > >>>>>>>>>> when > >>>>>>>>>>>>>>>>>> executing multi statements. Once we have a > >>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could > be > >>>>>>>>>>>>>> considered. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other platforms > >>>> will > >>>>>>>>>> also > >>>>>>>>>>>>>>> need > >>>>>>>>>>>>>>>>>> to have this config option, which is why I would like to > >>>>>>>>> avoid a > >>>>>>>>>>> SQL > >>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has to > >> come > >>>>>>>>> up > >>>>>>>>>>> with > >>>>>>>>>>>>>>>>>> this important config option separately. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? Or > >>>>>> other > >>>>>>>>>>>>>>> opinions? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Regards, > >>>>>>>>>>>>>>>>>> Timo > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > >>>>>>>>>>>>>>>>>>> Hi, all. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem is > >> we > >>>>>>>>>> have > >>>>>>>>>>> no > >>>>>>>>>>>>>>>> means > >>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set > the > >>>>>>>>> option > >>>>>>>>>>>>>> true > >>>>>>>>>>>>>>>> and > >>>>>>>>>>>>>>>>>>> use `TableResult#await` together. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>>>> Shengkai. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -- > >>>>>>>>>>>>>> Best regards! > >>>>>>>>>>>>>> Rui Li > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >> > >> > > > > |
Hi, everyone.
After the long discussion, I am fine with both choices. But I prefer the second option that applies to both table modules and sql client. Just as Timo said, the option `table.dml-sync` can improve the SQL script portability. Users don't need to modify the script and execute the script in different platforms e.g gateway. What do you think? CC Timo, Jark, Leonard. Best, Shengkai. Kurt Young <[hidden email]> 于2021年3月1日周一 下午5:11写道: > I'm +1 for either: > 1. introduce a sql client specific option, or > 2. Introduce a table config option and make it apply to both table module & > sql client. > > It would be the FLIP owner's call to decide. > > Best, > Kurt > > > On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <[hidden email]> wrote: > > > We could also think about reading this config option in Table API. The > > effect would be to call `await()` directly in an execute call. I could > > also imagine this to be useful esp. when you fire a lot of insert into > > queries. We had the case before that users where confused that the > > execution happens asynchronously, such an option could prevent this to > > happen again. > > > > Regards, > > Timo > > > > On 01.03.21 05:14, Kurt Young wrote: > > > I also asked some users about their opinion that if we introduce some > > > config prefixed with "table" but doesn't > > > have affection with methods in Table API and SQL. All of them are kind > of > > > shocked by such question, asking > > > why we would do anything like this. > > > > > > This kind of reaction actually doesn't surprise me a lot, so I jumped > in > > > and challenged this config option even > > > after the FLIP had already been accepted. > > > > > > If we only have to define the execution behavior for multiple > statements > > in > > > SQL client, we should only introduce > > > a config option which would tell users it's affection scope by its > name. > > > Prefixing with "table" is definitely not a good > > > idea here. > > > > > > Best, > > > Kurt > > > > > > > > > On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <[hidden email]> wrote: > > > > > >> Hi, all > > >> > > >> Look like there’s only one divergence about option [ table | > sql-client > > >> ].dml-sync in this thread, correct me if I’m wrong. > > >> > > >> 1. Leaving the context of this thread, from a user's perspective, > > >> the table.xx configurations should take effect in Table API & SQL, > > >> the sql-client.xx configurations should only take effect in > sql-client. > > >> In my(the user's) opinion, other explanations are counterintuitive. > > >> > > >> 2. It should be pointed out that both all existed table.xx > > configurations > > >> like table.exec.state.ttl, table.optimizer.agg-phase-strategy, > > >> table.local-time-zone,etc.. and the proposed sql-client.xx > > configurations > > >> like sql-client.verbose, sql-client.execution.max-table-result.rows > > >> comply with this convention. > > >> > > >> 3. Considering the portability to support different CLI tools > > (sql-client, > > >> sql-gateway, etc.), I prefer table.dml-sync. > > >> > > >> In addition, I think sql-client/sql-gateway/other CLI tools can be > > placed > > >> out of flink-table module even in an external project, this should not > > >> affect our conclusion. > > >> > > >> > > >> Hope this can help you. > > >> > > >> > > >> Best, > > >> Leonard > > >> > > >> > > >> > > >>> 在 2021年2月25日,18:51,Shengkai Fang <[hidden email]> 写道: > > >>> > > >>> Hi, everyone. > > >>> > > >>> I do some summaries about the discussion about the option. If the > > summary > > >>> has errors, please correct me. > > >>> > > >>> `table.dml-sync`: > > >>> - take effect for `executeMultiSql` and sql client > > >>> - benefit: SQL script portability. One script for all platforms. > > >>> - drawback: Don't work for `TableEnvironment#executeSql`. > > >>> > > >>> `table.multi-dml-sync`: > > >>> - take effect for `executeMultiSql` and sql client > > >>> - benefit: SQL script portability > > >>> - drawback: It's confused when the sql script has one dml statement > but > > >>> need to set option `table.multi-dml-sync` > > >>> > > >>> `client.dml-sync`: > > >>> - take effect for sql client only > > >>> - benefit: clear definition. > > >>> - drawback: Every platform needs to define its own option. Bad SQL > > script > > >>> portability. > > >>> > > >>> Just as Jark said, I think the `table.dml-sync` is a good choice if > we > > >> can > > >>> extend its scope and make this option works for `executeSql`. > > >>> It's straightforward and users can use this option now in table api. > > The > > >>> drawback is the `TableResult#await` plays the same role as the > option. > > >> I > > >>> don't think the drawback is really critical because many systems have > > >>> commands play the same role with the different names. > > >>> > > >>> Best, > > >>> Shengkai > > >>> > > >>> Timo Walther <[hidden email]> 于2021年2月25日周四 下午4:23写道: > > >>> > > >>>> The `table.` prefix is meant to be a general option in the table > > >>>> ecosystem. Not necessarily attached to Table API or SQL Client. > That's > > >>>> why SQL Client is also located in the `flink-table` module. > > >>>> > > >>>> My main concern is the SQL script portability. Declaring the > > sync/async > > >>>> behavior will happen in many SQL scripts. And users should be easily > > >>>> switch from SQL Client to some commercial product without the need > of > > >>>> changing the script again. > > >>>> > > >>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` > > later > > >>>> but that would mean introducing future confusion. An app name (what > > >>>> `sql-client` kind of is) should not be part of a config option key > if > > >>>> other apps will need the same kind of option. > > >>>> > > >>>> Regards, > > >>>> Timo > > >>>> > > >>>> > > >>>> On 24.02.21 08:59, Jark Wu wrote: > > >>>>>> From my point of view, I also prefer "sql-client.dml-sync", > > >>>>> because the behavior of this configuration is very clear. > > >>>>> Even if we introduce a new config in the future, e.g. > > `table.dml-sync`, > > >>>>> we can also deprecate the sql-client one. > > >>>>> > > >>>>> Introducing a "table." configuration without any implementation > > >>>>> will confuse users a lot, as they expect it should take effect on > > >>>>> the Table API. > > >>>>> > > >>>>> If we want to introduce an unified "table.dml-sync" option, I > prefer > > >>>>> it should be implemented on Table API and affect all the DMLs on > > >>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`, > `StatementSet`), > > >>>>> as I have mentioned before [1]. > > >>>>> > > >>>>>> It would be very straightforward that it affects all the DMLs on > SQL > > >> CLI > > >>>>> and > > >>>>> TableEnvironment (including `executeSql`, `StatementSet`, > > >>>>> `Table#executeInsert`, etc.). > > >>>>> This can also make SQL CLI easy to support this configuration by > > >> passing > > >>>>> through to the TableEnv. > > >>>>> > > >>>>> Best, > > >>>>> Jark > > >>>>> > > >>>>> > > >>>>> [1]: > > >>>>> > > >>>> > > >> > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html > > >>>>> > > >>>>> > > >>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> wrote: > > >>>>> > > >>>>>> If we all agree the option should only be handled by sql client, > > then > > >>>> why > > >>>>>> don't we > > >>>>>> just call it `sql-client.dml-sync`? As you said, calling it > > >>>>>> `table.dml-sync` but has no > > >>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also cause > a > > >> big > > >>>>>> confusion for > > >>>>>> users. > > >>>>>> > > >>>>>> The only concern I saw is if we introduce > > >>>>>> "TableEnvironment.executeMultiSql()" in the > > >>>>>> future, how do we control the synchronization between statements? > > TBH > > >> I > > >>>>>> don't really > > >>>>>> see a strong requirement for such interfaces. Right now, we have a > > >>>> pretty > > >>>>>> clear semantic > > >>>>>> of `TableEnv.executeSql`, and it's very convenient for users if > they > > >>>> want > > >>>>>> to execute multiple > > >>>>>> sql statements. They can simulate either synced or async execution > > >> with > > >>>>>> this building block. > > >>>>>> > > >>>>>> This will introduce slight overhead for users, but compared to the > > >>>>>> confusion we might > > >>>>>> cause if we introduce such a method of our own, I think it's > better > > to > > >>>> wait > > >>>>>> for some more > > >>>>>> feedback. > > >>>>>> > > >>>>>> Best, > > >>>>>> Kurt > > >>>>>> > > >>>>>> > > >>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther <[hidden email]> > > >>>> wrote: > > >>>>>> > > >>>>>>> Hi Kurt, > > >>>>>>> > > >>>>>>> we can also shorten it to `table.dml-sync` if that would help. > Then > > >> it > > >>>>>>> would confuse users that do a regular `.executeSql("INSERT > INTO")` > > >> in a > > >>>>>>> notebook session. > > >>>>>>> > > >>>>>>> In any case users will need to learn the semantics of this > option. > > >>>>>>> `table.multi-dml-sync` should be described as "If a you are in a > > >> multi > > >>>>>>> statement environment, execute DMLs synchrounous.". I don't have > a > > >>>>>>> strong opinion on shortening it to `table.dml-sync`. > > >>>>>>> > > >>>>>>> Just to clarify the implementation: The option should be handled > by > > >> the > > >>>>>>> SQL Client only, but the name can be shared accross platforms. > > >>>>>>> > > >>>>>>> Regards, > > >>>>>>> Timo > > >>>>>>> > > >>>>>>> > > >>>>>>> On 23.02.21 09:54, Kurt Young wrote: > > >>>>>>>> Sorry for the late reply, but I'm confused by > > >> `table.multi-dml-sync`. > > >>>>>>>> > > >>>>>>>> IIUC this config will take effect with 2 use cases: > > >>>>>>>> 1. SQL client, either interactive mode or executing multiple > > >>>> statements > > >>>>>>> via > > >>>>>>>> -f. In most cases, > > >>>>>>>> there will be only one INSERT INTO statement but we are > > controlling > > >>>> the > > >>>>>>>> sync/async behavior > > >>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of > users. > > >>>>>>> Besides, > > >>>>>>>> > > >>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future work, > we > > >> are > > >>>>>>> also > > >>>>>>>> not sure if we will > > >>>>>>>> really introduce this in the future. > > >>>>>>>> > > >>>>>>>> I would prefer to introduce this option for only sql client. For > > >>>>>>> platforms > > >>>>>>>> Timo mentioned which > > >>>>>>>> need to control such behavior, I think it's easy and flexible to > > >>>>>>> introduce > > >>>>>>>> one on their own. > > >>>>>>>> > > >>>>>>>> Best, > > >>>>>>>> Kurt > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang < > [hidden email] > > > > > >>>>>>> wrote: > > >>>>>>>> > > >>>>>>>>> Hi everyone. > > >>>>>>>>> > > >>>>>>>>> Sorry for the late response. > > >>>>>>>>> > > >>>>>>>>> For `execution.runtime-mode`, I think it's much better than > > >>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions! > > >>>>>>>>> > > >>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We should > > >>>>>> clarify > > >>>>>>> the > > >>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be allowed > > to > > >>>>>>> specify > > >>>>>>>>> the table that is fully qualified and only works for the table > > that > > >>>> is > > >>>>>>>>> created by the sql statements. > > >>>>>>>>> > > >>>>>>>>> I have updated the FLIP with suggestions. It seems we have > > reached > > >> a > > >>>>>>>>> consensus, I'd like to start a formal vote for the FLIP. > > >>>>>>>>> > > >>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment. > > >>>>>>>>> > > >>>>>>>>> Best, > > >>>>>>>>> Shengkai > > >>>>>>>>> > > >>>>>>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > > >>>>>>>>> > > >>>>>>>>>> Hi Ingo, > > >>>>>>>>>> > > >>>>>>>>>> 1) I think you are right, the table path should be > > >> fully-qualified. > > >>>>>>>>>> > > >>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE > > >>>>>>>>>> only aims to print DDL for the tables registered using SQL > > CREATE > > >>>>>> TABLE > > >>>>>>>>>> DDL. > > >>>>>>>>>> If a table is registered using Table API, e.g. > > >>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String, > > DataStream)`, > > >>>>>>>>>> currently it's not possible to print DDL for such tables. > > >>>>>>>>>> I think we should point it out in the FLIP. > > >>>>>>>>>> > > >>>>>>>>>> Best, > > >>>>>>>>>> Jark > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email]> > > >> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> Hi all, > > >>>>>>>>>>> > > >>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE > > statement. > > >>>>>>>>>>> > > >>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned > DDL > > >>>>>> should > > >>>>>>>>>>> always have the table identifier fully-qualified. Otherwise > the > > >> DDL > > >>>>>>>>>> depends > > >>>>>>>>>>> on the current context (catalog/database), which could be > > >>>>>> surprising, > > >>>>>>>>>>> especially since "the same" table can behave differently if > > >> created > > >>>>>> in > > >>>>>>>>>>> different catalogs. > > >>>>>>>>>>> 2) How should this handle tables which cannot be fully > > >>>> characterized > > >>>>>>> by > > >>>>>>>>>>> properties only? I don't know if there's an example for this > > yet, > > >>>>>> but > > >>>>>>>>>>> hypothetically this is not currently a requirement, right? > This > > >>>>>> isn't > > >>>>>>>>> as > > >>>>>>>>>>> much of a problem if this syntax is SQL-client-specific, but > if > > >>>> it's > > >>>>>>>>>>> general Flink SQL syntax we should consider this (one way or > > >>>>>> another). > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> Regards > > >>>>>>>>>>> Ingo > > >>>>>>>>>>> > > >>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther < > > [hidden email] > > >>> > > >>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Hi Shengkai, > > >>>>>>>>>>>> > > >>>>>>>>>>>> thanks for updating the FLIP. > > >>>>>>>>>>>> > > >>>>>>>>>>>> I have one last comment for the option > `table.execution.mode`. > > >>>>>> Should > > >>>>>>>>>> we > > >>>>>>>>>>>> already use the global Flink option `execution.runtime-mode` > > >>>>>> instead? > > >>>>>>>>>>>> > > >>>>>>>>>>>> We are using Flink's options where possible (e.g. ` > > >> pipeline.name` > > >>>>>>>>> and > > >>>>>>>>>>>> `parallism.default`) why not also for batch/streaming mode? > > >>>>>>>>>>>> > > >>>>>>>>>>>> The description of the option matches to the Blink planner > > >>>>>> behavior: > > >>>>>>>>>>>> > > >>>>>>>>>>>> ``` > > >>>>>>>>>>>> Among other things, this controls task scheduling, network > > >> shuffle > > >>>>>>>>>>>> behavior, and time semantics. > > >>>>>>>>>>>> ``` > > >>>>>>>>>>>> > > >>>>>>>>>>>> Regards, > > >>>>>>>>>>>> Timo > > >>>>>>>>>>>> > > >>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: > > >>>>>>>>>>>>> Hi, guys. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I have updated the FLIP. It seems we have reached > agreement. > > >>>>>> Maybe > > >>>>>>>>>> we > > >>>>>>>>>>>> can > > >>>>>>>>>>>>> start the vote soon. If anyone has other questions, please > > >> leave > > >>>>>>>>> your > > >>>>>>>>>>>>> comments. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Best, > > >>>>>>>>>>>>> Shengkai > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Hi guys, > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> The conclusion sounds good to me. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang < > > >> [hidden email] > > >>>>> > > >>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Hi, Timo, Jark. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> I am fine with the new option name. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>> Shengkai > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 周二下午5:35写道: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be future > > >> work. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this conclusion? > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Thanks, > > >>>>>>>>>>>>>>>> Timo > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: > > >>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI > > looks > > >>>>>> like > > >>>>>>>>>>>>>> single > > >>>>>>>>>>>>>>>>> statement. > > >>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting > statements > > >>>> from > > >>>>>>>>>>>>>> opening > > >>>>>>>>>>>>>>> to > > >>>>>>>>>>>>>>>>> closing. > > >>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by > > >>>>>> default), > > >>>>>>>>>> and > > >>>>>>>>>>>>>> we > > >>>>>>>>>>>>>>>> will > > >>>>>>>>>>>>>>>>> support this config > > >>>>>>>>>>>>>>>>> in SQL CLI first, will support it in > > >>>>>>>>>>>>>> TableEnvironment#executeMultiSql() > > >>>>>>>>>>>>>>>> in > > >>>>>>>>>>>>>>>>> the future, right? > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>>>> Jark > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther < > > >>>> [hidden email] > > >>>>>>> > > >>>>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Hi everyone, > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should > not > > >>>>>> apply > > >>>>>>>>>> to > > >>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes only > > >> sense > > >>>>>>>>>> when > > >>>>>>>>>>>>>>>>>> executing multi statements. Once we have a > > >>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config could > > be > > >>>>>>>>>>>>>> considered. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other > platforms > > >>>> will > > >>>>>>>>>> also > > >>>>>>>>>>>>>>> need > > >>>>>>>>>>>>>>>>>> to have this config option, which is why I would like > to > > >>>>>>>>> avoid a > > >>>>>>>>>>> SQL > > >>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has > to > > >> come > > >>>>>>>>> up > > >>>>>>>>>>> with > > >>>>>>>>>>>>>>>>>> this important config option separately. > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` `table.multi-stmt-sync`? > Or > > >>>>>> other > > >>>>>>>>>>>>>>> opinions? > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> Regards, > > >>>>>>>>>>>>>>>>>> Timo > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > > >>>>>>>>>>>>>>>>>>> Hi, all. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main problem > is > > >> we > > >>>>>>>>>> have > > >>>>>>>>>>> no > > >>>>>>>>>>>>>>>> means > > >>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users set > > the > > >>>>>>>>> option > > >>>>>>>>>>>>>> true > > >>>>>>>>>>>>>>>> and > > >>>>>>>>>>>>>>>>>>> use `TableResult#await` together. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>> Best, > > >>>>>>>>>>>>>>>>>>> Shengkai. > > >>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> -- > > >>>>>>>>>>>>>> Best regards! > > >>>>>>>>>>>>>> Rui Li > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >> > > >> > > > > > > > > |
I prefer option#2 and I think this can make everyone happy.
Best, Jark On Mon, 1 Mar 2021 at 18:22, Shengkai Fang <[hidden email]> wrote: > Hi, everyone. > > After the long discussion, I am fine with both choices. But I prefer the > second option that applies to both table modules and sql client. Just as > Timo said, the option `table.dml-sync` can improve the SQL script > portability. Users don't need to modify the script and execute the script > in different platforms e.g gateway. > > What do you think? CC Timo, Jark, Leonard. > > Best, > Shengkai. > > Kurt Young <[hidden email]> 于2021年3月1日周一 下午5:11写道: > > > I'm +1 for either: > > 1. introduce a sql client specific option, or > > 2. Introduce a table config option and make it apply to both table > module & > > sql client. > > > > It would be the FLIP owner's call to decide. > > > > Best, > > Kurt > > > > > > On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <[hidden email]> wrote: > > > > > We could also think about reading this config option in Table API. The > > > effect would be to call `await()` directly in an execute call. I could > > > also imagine this to be useful esp. when you fire a lot of insert into > > > queries. We had the case before that users where confused that the > > > execution happens asynchronously, such an option could prevent this to > > > happen again. > > > > > > Regards, > > > Timo > > > > > > On 01.03.21 05:14, Kurt Young wrote: > > > > I also asked some users about their opinion that if we introduce some > > > > config prefixed with "table" but doesn't > > > > have affection with methods in Table API and SQL. All of them are > kind > > of > > > > shocked by such question, asking > > > > why we would do anything like this. > > > > > > > > This kind of reaction actually doesn't surprise me a lot, so I jumped > > in > > > > and challenged this config option even > > > > after the FLIP had already been accepted. > > > > > > > > If we only have to define the execution behavior for multiple > > statements > > > in > > > > SQL client, we should only introduce > > > > a config option which would tell users it's affection scope by its > > name. > > > > Prefixing with "table" is definitely not a good > > > > idea here. > > > > > > > > Best, > > > > Kurt > > > > > > > > > > > > On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <[hidden email]> > wrote: > > > > > > > >> Hi, all > > > >> > > > >> Look like there’s only one divergence about option [ table | > > sql-client > > > >> ].dml-sync in this thread, correct me if I’m wrong. > > > >> > > > >> 1. Leaving the context of this thread, from a user's perspective, > > > >> the table.xx configurations should take effect in Table API & SQL, > > > >> the sql-client.xx configurations should only take effect in > > sql-client. > > > >> In my(the user's) opinion, other explanations are > counterintuitive. > > > >> > > > >> 2. It should be pointed out that both all existed table.xx > > > configurations > > > >> like table.exec.state.ttl, table.optimizer.agg-phase-strategy, > > > >> table.local-time-zone,etc.. and the proposed sql-client.xx > > > configurations > > > >> like sql-client.verbose, sql-client.execution.max-table-result.rows > > > >> comply with this convention. > > > >> > > > >> 3. Considering the portability to support different CLI tools > > > (sql-client, > > > >> sql-gateway, etc.), I prefer table.dml-sync. > > > >> > > > >> In addition, I think sql-client/sql-gateway/other CLI tools can be > > > placed > > > >> out of flink-table module even in an external project, this should > not > > > >> affect our conclusion. > > > >> > > > >> > > > >> Hope this can help you. > > > >> > > > >> > > > >> Best, > > > >> Leonard > > > >> > > > >> > > > >> > > > >>> 在 2021年2月25日,18:51,Shengkai Fang <[hidden email]> 写道: > > > >>> > > > >>> Hi, everyone. > > > >>> > > > >>> I do some summaries about the discussion about the option. If the > > > summary > > > >>> has errors, please correct me. > > > >>> > > > >>> `table.dml-sync`: > > > >>> - take effect for `executeMultiSql` and sql client > > > >>> - benefit: SQL script portability. One script for all platforms. > > > >>> - drawback: Don't work for `TableEnvironment#executeSql`. > > > >>> > > > >>> `table.multi-dml-sync`: > > > >>> - take effect for `executeMultiSql` and sql client > > > >>> - benefit: SQL script portability > > > >>> - drawback: It's confused when the sql script has one dml statement > > but > > > >>> need to set option `table.multi-dml-sync` > > > >>> > > > >>> `client.dml-sync`: > > > >>> - take effect for sql client only > > > >>> - benefit: clear definition. > > > >>> - drawback: Every platform needs to define its own option. Bad SQL > > > script > > > >>> portability. > > > >>> > > > >>> Just as Jark said, I think the `table.dml-sync` is a good choice if > > we > > > >> can > > > >>> extend its scope and make this option works for `executeSql`. > > > >>> It's straightforward and users can use this option now in table > api. > > > The > > > >>> drawback is the `TableResult#await` plays the same role as the > > option. > > > >> I > > > >>> don't think the drawback is really critical because many systems > have > > > >>> commands play the same role with the different names. > > > >>> > > > >>> Best, > > > >>> Shengkai > > > >>> > > > >>> Timo Walther <[hidden email]> 于2021年2月25日周四 下午4:23写道: > > > >>> > > > >>>> The `table.` prefix is meant to be a general option in the table > > > >>>> ecosystem. Not necessarily attached to Table API or SQL Client. > > That's > > > >>>> why SQL Client is also located in the `flink-table` module. > > > >>>> > > > >>>> My main concern is the SQL script portability. Declaring the > > > sync/async > > > >>>> behavior will happen in many SQL scripts. And users should be > easily > > > >>>> switch from SQL Client to some commercial product without the need > > of > > > >>>> changing the script again. > > > >>>> > > > >>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` > > > later > > > >>>> but that would mean introducing future confusion. An app name > (what > > > >>>> `sql-client` kind of is) should not be part of a config option key > > if > > > >>>> other apps will need the same kind of option. > > > >>>> > > > >>>> Regards, > > > >>>> Timo > > > >>>> > > > >>>> > > > >>>> On 24.02.21 08:59, Jark Wu wrote: > > > >>>>>> From my point of view, I also prefer "sql-client.dml-sync", > > > >>>>> because the behavior of this configuration is very clear. > > > >>>>> Even if we introduce a new config in the future, e.g. > > > `table.dml-sync`, > > > >>>>> we can also deprecate the sql-client one. > > > >>>>> > > > >>>>> Introducing a "table." configuration without any implementation > > > >>>>> will confuse users a lot, as they expect it should take effect on > > > >>>>> the Table API. > > > >>>>> > > > >>>>> If we want to introduce an unified "table.dml-sync" option, I > > prefer > > > >>>>> it should be implemented on Table API and affect all the DMLs on > > > >>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`, > > `StatementSet`), > > > >>>>> as I have mentioned before [1]. > > > >>>>> > > > >>>>>> It would be very straightforward that it affects all the DMLs on > > SQL > > > >> CLI > > > >>>>> and > > > >>>>> TableEnvironment (including `executeSql`, `StatementSet`, > > > >>>>> `Table#executeInsert`, etc.). > > > >>>>> This can also make SQL CLI easy to support this configuration by > > > >> passing > > > >>>>> through to the TableEnv. > > > >>>>> > > > >>>>> Best, > > > >>>>> Jark > > > >>>>> > > > >>>>> > > > >>>>> [1]: > > > >>>>> > > > >>>> > > > >> > > > > > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html > > > >>>>> > > > >>>>> > > > >>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> > wrote: > > > >>>>> > > > >>>>>> If we all agree the option should only be handled by sql client, > > > then > > > >>>> why > > > >>>>>> don't we > > > >>>>>> just call it `sql-client.dml-sync`? As you said, calling it > > > >>>>>> `table.dml-sync` but has no > > > >>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also > cause > > a > > > >> big > > > >>>>>> confusion for > > > >>>>>> users. > > > >>>>>> > > > >>>>>> The only concern I saw is if we introduce > > > >>>>>> "TableEnvironment.executeMultiSql()" in the > > > >>>>>> future, how do we control the synchronization between > statements? > > > TBH > > > >> I > > > >>>>>> don't really > > > >>>>>> see a strong requirement for such interfaces. Right now, we > have a > > > >>>> pretty > > > >>>>>> clear semantic > > > >>>>>> of `TableEnv.executeSql`, and it's very convenient for users if > > they > > > >>>> want > > > >>>>>> to execute multiple > > > >>>>>> sql statements. They can simulate either synced or async > execution > > > >> with > > > >>>>>> this building block. > > > >>>>>> > > > >>>>>> This will introduce slight overhead for users, but compared to > the > > > >>>>>> confusion we might > > > >>>>>> cause if we introduce such a method of our own, I think it's > > better > > > to > > > >>>> wait > > > >>>>>> for some more > > > >>>>>> feedback. > > > >>>>>> > > > >>>>>> Best, > > > >>>>>> Kurt > > > >>>>>> > > > >>>>>> > > > >>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther < > [hidden email]> > > > >>>> wrote: > > > >>>>>> > > > >>>>>>> Hi Kurt, > > > >>>>>>> > > > >>>>>>> we can also shorten it to `table.dml-sync` if that would help. > > Then > > > >> it > > > >>>>>>> would confuse users that do a regular `.executeSql("INSERT > > INTO")` > > > >> in a > > > >>>>>>> notebook session. > > > >>>>>>> > > > >>>>>>> In any case users will need to learn the semantics of this > > option. > > > >>>>>>> `table.multi-dml-sync` should be described as "If a you are in > a > > > >> multi > > > >>>>>>> statement environment, execute DMLs synchrounous.". I don't > have > > a > > > >>>>>>> strong opinion on shortening it to `table.dml-sync`. > > > >>>>>>> > > > >>>>>>> Just to clarify the implementation: The option should be > handled > > by > > > >> the > > > >>>>>>> SQL Client only, but the name can be shared accross platforms. > > > >>>>>>> > > > >>>>>>> Regards, > > > >>>>>>> Timo > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On 23.02.21 09:54, Kurt Young wrote: > > > >>>>>>>> Sorry for the late reply, but I'm confused by > > > >> `table.multi-dml-sync`. > > > >>>>>>>> > > > >>>>>>>> IIUC this config will take effect with 2 use cases: > > > >>>>>>>> 1. SQL client, either interactive mode or executing multiple > > > >>>> statements > > > >>>>>>> via > > > >>>>>>>> -f. In most cases, > > > >>>>>>>> there will be only one INSERT INTO statement but we are > > > controlling > > > >>>> the > > > >>>>>>>> sync/async behavior > > > >>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of > > users. > > > >>>>>>> Besides, > > > >>>>>>>> > > > >>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future > work, > > we > > > >> are > > > >>>>>>> also > > > >>>>>>>> not sure if we will > > > >>>>>>>> really introduce this in the future. > > > >>>>>>>> > > > >>>>>>>> I would prefer to introduce this option for only sql client. > For > > > >>>>>>> platforms > > > >>>>>>>> Timo mentioned which > > > >>>>>>>> need to control such behavior, I think it's easy and flexible > to > > > >>>>>>> introduce > > > >>>>>>>> one on their own. > > > >>>>>>>> > > > >>>>>>>> Best, > > > >>>>>>>> Kurt > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang < > > [hidden email] > > > > > > > >>>>>>> wrote: > > > >>>>>>>> > > > >>>>>>>>> Hi everyone. > > > >>>>>>>>> > > > >>>>>>>>> Sorry for the late response. > > > >>>>>>>>> > > > >>>>>>>>> For `execution.runtime-mode`, I think it's much better than > > > >>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions! > > > >>>>>>>>> > > > >>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We > should > > > >>>>>> clarify > > > >>>>>>> the > > > >>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be > allowed > > > to > > > >>>>>>> specify > > > >>>>>>>>> the table that is fully qualified and only works for the > table > > > that > > > >>>> is > > > >>>>>>>>> created by the sql statements. > > > >>>>>>>>> > > > >>>>>>>>> I have updated the FLIP with suggestions. It seems we have > > > reached > > > >> a > > > >>>>>>>>> consensus, I'd like to start a formal vote for the FLIP. > > > >>>>>>>>> > > > >>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment. > > > >>>>>>>>> > > > >>>>>>>>> Best, > > > >>>>>>>>> Shengkai > > > >>>>>>>>> > > > >>>>>>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: > > > >>>>>>>>> > > > >>>>>>>>>> Hi Ingo, > > > >>>>>>>>>> > > > >>>>>>>>>> 1) I think you are right, the table path should be > > > >> fully-qualified. > > > >>>>>>>>>> > > > >>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE > > > >>>>>>>>>> only aims to print DDL for the tables registered using SQL > > > CREATE > > > >>>>>> TABLE > > > >>>>>>>>>> DDL. > > > >>>>>>>>>> If a table is registered using Table API, e.g. > > > >>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String, > > > DataStream)`, > > > >>>>>>>>>> currently it's not possible to print DDL for such tables. > > > >>>>>>>>>> I think we should point it out in the FLIP. > > > >>>>>>>>>> > > > >>>>>>>>>> Best, > > > >>>>>>>>>> Jark > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email] > > > > > >> wrote: > > > >>>>>>>>>> > > > >>>>>>>>>>> Hi all, > > > >>>>>>>>>>> > > > >>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE > > > statement. > > > >>>>>>>>>>> > > > >>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned > > DDL > > > >>>>>> should > > > >>>>>>>>>>> always have the table identifier fully-qualified. Otherwise > > the > > > >> DDL > > > >>>>>>>>>> depends > > > >>>>>>>>>>> on the current context (catalog/database), which could be > > > >>>>>> surprising, > > > >>>>>>>>>>> especially since "the same" table can behave differently if > > > >> created > > > >>>>>> in > > > >>>>>>>>>>> different catalogs. > > > >>>>>>>>>>> 2) How should this handle tables which cannot be fully > > > >>>> characterized > > > >>>>>>> by > > > >>>>>>>>>>> properties only? I don't know if there's an example for > this > > > yet, > > > >>>>>> but > > > >>>>>>>>>>> hypothetically this is not currently a requirement, right? > > This > > > >>>>>> isn't > > > >>>>>>>>> as > > > >>>>>>>>>>> much of a problem if this syntax is SQL-client-specific, > but > > if > > > >>>> it's > > > >>>>>>>>>>> general Flink SQL syntax we should consider this (one way > or > > > >>>>>> another). > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> Regards > > > >>>>>>>>>>> Ingo > > > >>>>>>>>>>> > > > >>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther < > > > [hidden email] > > > >>> > > > >>>>>>>>> wrote: > > > >>>>>>>>>>> > > > >>>>>>>>>>>> Hi Shengkai, > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> thanks for updating the FLIP. > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> I have one last comment for the option > > `table.execution.mode`. > > > >>>>>> Should > > > >>>>>>>>>> we > > > >>>>>>>>>>>> already use the global Flink option > `execution.runtime-mode` > > > >>>>>> instead? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> We are using Flink's options where possible (e.g. ` > > > >> pipeline.name` > > > >>>>>>>>> and > > > >>>>>>>>>>>> `parallism.default`) why not also for batch/streaming > mode? > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> The description of the option matches to the Blink planner > > > >>>>>> behavior: > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> ``` > > > >>>>>>>>>>>> Among other things, this controls task scheduling, network > > > >> shuffle > > > >>>>>>>>>>>> behavior, and time semantics. > > > >>>>>>>>>>>> ``` > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> Regards, > > > >>>>>>>>>>>> Timo > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: > > > >>>>>>>>>>>>> Hi, guys. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> I have updated the FLIP. It seems we have reached > > agreement. > > > >>>>>> Maybe > > > >>>>>>>>>> we > > > >>>>>>>>>>>> can > > > >>>>>>>>>>>>> start the vote soon. If anyone has other questions, > please > > > >> leave > > > >>>>>>>>> your > > > >>>>>>>>>>>>> comments. > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>> Shengkai > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>>>> Hi guys, > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> The conclusion sounds good to me. > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang < > > > >> [hidden email] > > > >>>>> > > > >>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Hi, Timo, Jark. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> I am fine with the new option name. > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>> Shengkai > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 > 周二下午5:35写道: > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be > future > > > >> work. > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this > conclusion? > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> Thanks, > > > >>>>>>>>>>>>>>>> Timo > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: > > > >>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI > > > looks > > > >>>>>> like > > > >>>>>>>>>>>>>> single > > > >>>>>>>>>>>>>>>>> statement. > > > >>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting > > statements > > > >>>> from > > > >>>>>>>>>>>>>> opening > > > >>>>>>>>>>>>>>> to > > > >>>>>>>>>>>>>>>>> closing. > > > >>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by > > > >>>>>> default), > > > >>>>>>>>>> and > > > >>>>>>>>>>>>>> we > > > >>>>>>>>>>>>>>>> will > > > >>>>>>>>>>>>>>>>> support this config > > > >>>>>>>>>>>>>>>>> in SQL CLI first, will support it in > > > >>>>>>>>>>>>>> TableEnvironment#executeMultiSql() > > > >>>>>>>>>>>>>>>> in > > > >>>>>>>>>>>>>>>>> the future, right? > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>>>> Jark > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther < > > > >>>> [hidden email] > > > >>>>>>> > > > >>>>>>>>>>>>>> wrote: > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Hi everyone, > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should > > not > > > >>>>>> apply > > > >>>>>>>>>> to > > > >>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes > only > > > >> sense > > > >>>>>>>>>> when > > > >>>>>>>>>>>>>>>>>> executing multi statements. Once we have a > > > >>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config > could > > > be > > > >>>>>>>>>>>>>> considered. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other > > platforms > > > >>>> will > > > >>>>>>>>>> also > > > >>>>>>>>>>>>>>> need > > > >>>>>>>>>>>>>>>>>> to have this config option, which is why I would > like > > to > > > >>>>>>>>> avoid a > > > >>>>>>>>>>> SQL > > > >>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has > > to > > > >> come > > > >>>>>>>>> up > > > >>>>>>>>>>> with > > > >>>>>>>>>>>>>>>>>> this important config option separately. > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` > `table.multi-stmt-sync`? > > Or > > > >>>>>> other > > > >>>>>>>>>>>>>>> opinions? > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> Regards, > > > >>>>>>>>>>>>>>>>>> Timo > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: > > > >>>>>>>>>>>>>>>>>>> Hi, all. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main > problem > > is > > > >> we > > > >>>>>>>>>> have > > > >>>>>>>>>>> no > > > >>>>>>>>>>>>>>>> means > > > >>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users > set > > > the > > > >>>>>>>>> option > > > >>>>>>>>>>>>>> true > > > >>>>>>>>>>>>>>>> and > > > >>>>>>>>>>>>>>>>>>> use `TableResult#await` together. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>> Best, > > > >>>>>>>>>>>>>>>>>>> Shengkai. > > > >>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> -- > > > >>>>>>>>>>>>>> Best regards! > > > >>>>>>>>>>>>>> Rui Li > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>>> > > > >> > > > >> > > > > > > > > > > > > > |
+1 for option 2
Regards, Timo On 02.03.21 03:50, Jark Wu wrote: > I prefer option#2 and I think this can make everyone happy. > > Best, > Jark > > On Mon, 1 Mar 2021 at 18:22, Shengkai Fang <[hidden email]> wrote: > >> Hi, everyone. >> >> After the long discussion, I am fine with both choices. But I prefer the >> second option that applies to both table modules and sql client. Just as >> Timo said, the option `table.dml-sync` can improve the SQL script >> portability. Users don't need to modify the script and execute the script >> in different platforms e.g gateway. >> >> What do you think? CC Timo, Jark, Leonard. >> >> Best, >> Shengkai. >> >> Kurt Young <[hidden email]> 于2021年3月1日周一 下午5:11写道: >> >>> I'm +1 for either: >>> 1. introduce a sql client specific option, or >>> 2. Introduce a table config option and make it apply to both table >> module & >>> sql client. >>> >>> It would be the FLIP owner's call to decide. >>> >>> Best, >>> Kurt >>> >>> >>> On Mon, Mar 1, 2021 at 3:25 PM Timo Walther <[hidden email]> wrote: >>> >>>> We could also think about reading this config option in Table API. The >>>> effect would be to call `await()` directly in an execute call. I could >>>> also imagine this to be useful esp. when you fire a lot of insert into >>>> queries. We had the case before that users where confused that the >>>> execution happens asynchronously, such an option could prevent this to >>>> happen again. >>>> >>>> Regards, >>>> Timo >>>> >>>> On 01.03.21 05:14, Kurt Young wrote: >>>>> I also asked some users about their opinion that if we introduce some >>>>> config prefixed with "table" but doesn't >>>>> have affection with methods in Table API and SQL. All of them are >> kind >>> of >>>>> shocked by such question, asking >>>>> why we would do anything like this. >>>>> >>>>> This kind of reaction actually doesn't surprise me a lot, so I jumped >>> in >>>>> and challenged this config option even >>>>> after the FLIP had already been accepted. >>>>> >>>>> If we only have to define the execution behavior for multiple >>> statements >>>> in >>>>> SQL client, we should only introduce >>>>> a config option which would tell users it's affection scope by its >>> name. >>>>> Prefixing with "table" is definitely not a good >>>>> idea here. >>>>> >>>>> Best, >>>>> Kurt >>>>> >>>>> >>>>> On Fri, Feb 26, 2021 at 9:39 PM Leonard Xu <[hidden email]> >> wrote: >>>>> >>>>>> Hi, all >>>>>> >>>>>> Look like there’s only one divergence about option [ table | >>> sql-client >>>>>> ].dml-sync in this thread, correct me if I’m wrong. >>>>>> >>>>>> 1. Leaving the context of this thread, from a user's perspective, >>>>>> the table.xx configurations should take effect in Table API & SQL, >>>>>> the sql-client.xx configurations should only take effect in >>> sql-client. >>>>>> In my(the user's) opinion, other explanations are >> counterintuitive. >>>>>> >>>>>> 2. It should be pointed out that both all existed table.xx >>>> configurations >>>>>> like table.exec.state.ttl, table.optimizer.agg-phase-strategy, >>>>>> table.local-time-zone,etc.. and the proposed sql-client.xx >>>> configurations >>>>>> like sql-client.verbose, sql-client.execution.max-table-result.rows >>>>>> comply with this convention. >>>>>> >>>>>> 3. Considering the portability to support different CLI tools >>>> (sql-client, >>>>>> sql-gateway, etc.), I prefer table.dml-sync. >>>>>> >>>>>> In addition, I think sql-client/sql-gateway/other CLI tools can be >>>> placed >>>>>> out of flink-table module even in an external project, this should >> not >>>>>> affect our conclusion. >>>>>> >>>>>> >>>>>> Hope this can help you. >>>>>> >>>>>> >>>>>> Best, >>>>>> Leonard >>>>>> >>>>>> >>>>>> >>>>>>> 在 2021年2月25日,18:51,Shengkai Fang <[hidden email]> 写道: >>>>>>> >>>>>>> Hi, everyone. >>>>>>> >>>>>>> I do some summaries about the discussion about the option. If the >>>> summary >>>>>>> has errors, please correct me. >>>>>>> >>>>>>> `table.dml-sync`: >>>>>>> - take effect for `executeMultiSql` and sql client >>>>>>> - benefit: SQL script portability. One script for all platforms. >>>>>>> - drawback: Don't work for `TableEnvironment#executeSql`. >>>>>>> >>>>>>> `table.multi-dml-sync`: >>>>>>> - take effect for `executeMultiSql` and sql client >>>>>>> - benefit: SQL script portability >>>>>>> - drawback: It's confused when the sql script has one dml statement >>> but >>>>>>> need to set option `table.multi-dml-sync` >>>>>>> >>>>>>> `client.dml-sync`: >>>>>>> - take effect for sql client only >>>>>>> - benefit: clear definition. >>>>>>> - drawback: Every platform needs to define its own option. Bad SQL >>>> script >>>>>>> portability. >>>>>>> >>>>>>> Just as Jark said, I think the `table.dml-sync` is a good choice if >>> we >>>>>> can >>>>>>> extend its scope and make this option works for `executeSql`. >>>>>>> It's straightforward and users can use this option now in table >> api. >>>> The >>>>>>> drawback is the `TableResult#await` plays the same role as the >>> option. >>>>>> I >>>>>>> don't think the drawback is really critical because many systems >> have >>>>>>> commands play the same role with the different names. >>>>>>> >>>>>>> Best, >>>>>>> Shengkai >>>>>>> >>>>>>> Timo Walther <[hidden email]> 于2021年2月25日周四 下午4:23写道: >>>>>>> >>>>>>>> The `table.` prefix is meant to be a general option in the table >>>>>>>> ecosystem. Not necessarily attached to Table API or SQL Client. >>> That's >>>>>>>> why SQL Client is also located in the `flink-table` module. >>>>>>>> >>>>>>>> My main concern is the SQL script portability. Declaring the >>>> sync/async >>>>>>>> behavior will happen in many SQL scripts. And users should be >> easily >>>>>>>> switch from SQL Client to some commercial product without the need >>> of >>>>>>>> changing the script again. >>>>>>>> >>>>>>>> Sure, we can change from `sql-client.dml-sync` to `table.dml-sync` >>>> later >>>>>>>> but that would mean introducing future confusion. An app name >> (what >>>>>>>> `sql-client` kind of is) should not be part of a config option key >>> if >>>>>>>> other apps will need the same kind of option. >>>>>>>> >>>>>>>> Regards, >>>>>>>> Timo >>>>>>>> >>>>>>>> >>>>>>>> On 24.02.21 08:59, Jark Wu wrote: >>>>>>>>>> From my point of view, I also prefer "sql-client.dml-sync", >>>>>>>>> because the behavior of this configuration is very clear. >>>>>>>>> Even if we introduce a new config in the future, e.g. >>>> `table.dml-sync`, >>>>>>>>> we can also deprecate the sql-client one. >>>>>>>>> >>>>>>>>> Introducing a "table." configuration without any implementation >>>>>>>>> will confuse users a lot, as they expect it should take effect on >>>>>>>>> the Table API. >>>>>>>>> >>>>>>>>> If we want to introduce an unified "table.dml-sync" option, I >>> prefer >>>>>>>>> it should be implemented on Table API and affect all the DMLs on >>>>>>>>> Table API (`tEnv.executeSql`, `Table.executeInsert`, >>> `StatementSet`), >>>>>>>>> as I have mentioned before [1]. >>>>>>>>> >>>>>>>>>> It would be very straightforward that it affects all the DMLs on >>> SQL >>>>>> CLI >>>>>>>>> and >>>>>>>>> TableEnvironment (including `executeSql`, `StatementSet`, >>>>>>>>> `Table#executeInsert`, etc.). >>>>>>>>> This can also make SQL CLI easy to support this configuration by >>>>>> passing >>>>>>>>> through to the TableEnv. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Jark >>>>>>>>> >>>>>>>>> >>>>>>>>> [1]: >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-163-SQL-Client-Improvements-tp48354p48665.html >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, 24 Feb 2021 at 10:39, Kurt Young <[hidden email]> >> wrote: >>>>>>>>> >>>>>>>>>> If we all agree the option should only be handled by sql client, >>>> then >>>>>>>> why >>>>>>>>>> don't we >>>>>>>>>> just call it `sql-client.dml-sync`? As you said, calling it >>>>>>>>>> `table.dml-sync` but has no >>>>>>>>>> affection in `TableEnv.executeSql("INSERT INTO")` will also >> cause >>> a >>>>>> big >>>>>>>>>> confusion for >>>>>>>>>> users. >>>>>>>>>> >>>>>>>>>> The only concern I saw is if we introduce >>>>>>>>>> "TableEnvironment.executeMultiSql()" in the >>>>>>>>>> future, how do we control the synchronization between >> statements? >>>> TBH >>>>>> I >>>>>>>>>> don't really >>>>>>>>>> see a strong requirement for such interfaces. Right now, we >> have a >>>>>>>> pretty >>>>>>>>>> clear semantic >>>>>>>>>> of `TableEnv.executeSql`, and it's very convenient for users if >>> they >>>>>>>> want >>>>>>>>>> to execute multiple >>>>>>>>>> sql statements. They can simulate either synced or async >> execution >>>>>> with >>>>>>>>>> this building block. >>>>>>>>>> >>>>>>>>>> This will introduce slight overhead for users, but compared to >> the >>>>>>>>>> confusion we might >>>>>>>>>> cause if we introduce such a method of our own, I think it's >>> better >>>> to >>>>>>>> wait >>>>>>>>>> for some more >>>>>>>>>> feedback. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Kurt >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Feb 23, 2021 at 9:45 PM Timo Walther < >> [hidden email]> >>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Kurt, >>>>>>>>>>> >>>>>>>>>>> we can also shorten it to `table.dml-sync` if that would help. >>> Then >>>>>> it >>>>>>>>>>> would confuse users that do a regular `.executeSql("INSERT >>> INTO")` >>>>>> in a >>>>>>>>>>> notebook session. >>>>>>>>>>> >>>>>>>>>>> In any case users will need to learn the semantics of this >>> option. >>>>>>>>>>> `table.multi-dml-sync` should be described as "If a you are in >> a >>>>>> multi >>>>>>>>>>> statement environment, execute DMLs synchrounous.". I don't >> have >>> a >>>>>>>>>>> strong opinion on shortening it to `table.dml-sync`. >>>>>>>>>>> >>>>>>>>>>> Just to clarify the implementation: The option should be >> handled >>> by >>>>>> the >>>>>>>>>>> SQL Client only, but the name can be shared accross platforms. >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Timo >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 23.02.21 09:54, Kurt Young wrote: >>>>>>>>>>>> Sorry for the late reply, but I'm confused by >>>>>> `table.multi-dml-sync`. >>>>>>>>>>>> >>>>>>>>>>>> IIUC this config will take effect with 2 use cases: >>>>>>>>>>>> 1. SQL client, either interactive mode or executing multiple >>>>>>>> statements >>>>>>>>>>> via >>>>>>>>>>>> -f. In most cases, >>>>>>>>>>>> there will be only one INSERT INTO statement but we are >>>> controlling >>>>>>>> the >>>>>>>>>>>> sync/async behavior >>>>>>>>>>>> with "*multi-dml*-sync". I think this will confuse a lot of >>> users. >>>>>>>>>>> Besides, >>>>>>>>>>>> >>>>>>>>>>>> 2. TableEnvironment#executeMultiSql(), but this is future >> work, >>> we >>>>>> are >>>>>>>>>>> also >>>>>>>>>>>> not sure if we will >>>>>>>>>>>> really introduce this in the future. >>>>>>>>>>>> >>>>>>>>>>>> I would prefer to introduce this option for only sql client. >> For >>>>>>>>>>> platforms >>>>>>>>>>>> Timo mentioned which >>>>>>>>>>>> need to control such behavior, I think it's easy and flexible >> to >>>>>>>>>>> introduce >>>>>>>>>>>> one on their own. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Kurt >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Sat, Feb 20, 2021 at 10:23 AM Shengkai Fang < >>> [hidden email] >>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi everyone. >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry for the late response. >>>>>>>>>>>>> >>>>>>>>>>>>> For `execution.runtime-mode`, I think it's much better than >>>>>>>>>>>>> `table.execution.mode`. Thanks for Timo's suggestions! >>>>>>>>>>>>> >>>>>>>>>>>>> For `SHOW CREATE TABLE`, I'm +1 with Jark's comments. We >> should >>>>>>>>>> clarify >>>>>>>>>>> the >>>>>>>>>>>>> usage of the SHOW CREATE TABLE statements. It should be >> allowed >>>> to >>>>>>>>>>> specify >>>>>>>>>>>>> the table that is fully qualified and only works for the >> table >>>> that >>>>>>>> is >>>>>>>>>>>>> created by the sql statements. >>>>>>>>>>>>> >>>>>>>>>>>>> I have updated the FLIP with suggestions. It seems we have >>>> reached >>>>>> a >>>>>>>>>>>>> consensus, I'd like to start a formal vote for the FLIP. >>>>>>>>>>>>> >>>>>>>>>>>>> Please vote +1 to approve the FLIP, or -1 with a comment. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Shengkai >>>>>>>>>>>>> >>>>>>>>>>>>> Jark Wu <[hidden email]> 于2021年2月15日周一 下午10:50写道: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Ingo, >>>>>>>>>>>>>> >>>>>>>>>>>>>> 1) I think you are right, the table path should be >>>>>> fully-qualified. >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2) I think this is also a good point. The SHOW CREATE TABLE >>>>>>>>>>>>>> only aims to print DDL for the tables registered using SQL >>>> CREATE >>>>>>>>>> TABLE >>>>>>>>>>>>>> DDL. >>>>>>>>>>>>>> If a table is registered using Table API, e.g. >>>>>>>>>>>>>> `StreamTableEnvironment#createTemporaryView(String, >>>> DataStream)`, >>>>>>>>>>>>>> currently it's not possible to print DDL for such tables. >>>>>>>>>>>>>> I think we should point it out in the FLIP. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Jark >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, 15 Feb 2021 at 21:33, Ingo Bürk <[hidden email] >>> >>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have a couple questions about the SHOW CREATE TABLE >>>> statement. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 1) Contrary to the example in the FLIP I think the returned >>> DDL >>>>>>>>>> should >>>>>>>>>>>>>>> always have the table identifier fully-qualified. Otherwise >>> the >>>>>> DDL >>>>>>>>>>>>>> depends >>>>>>>>>>>>>>> on the current context (catalog/database), which could be >>>>>>>>>> surprising, >>>>>>>>>>>>>>> especially since "the same" table can behave differently if >>>>>> created >>>>>>>>>> in >>>>>>>>>>>>>>> different catalogs. >>>>>>>>>>>>>>> 2) How should this handle tables which cannot be fully >>>>>>>> characterized >>>>>>>>>>> by >>>>>>>>>>>>>>> properties only? I don't know if there's an example for >> this >>>> yet, >>>>>>>>>> but >>>>>>>>>>>>>>> hypothetically this is not currently a requirement, right? >>> This >>>>>>>>>> isn't >>>>>>>>>>>>> as >>>>>>>>>>>>>>> much of a problem if this syntax is SQL-client-specific, >> but >>> if >>>>>>>> it's >>>>>>>>>>>>>>> general Flink SQL syntax we should consider this (one way >> or >>>>>>>>>> another). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards >>>>>>>>>>>>>>> Ingo >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Fri, Feb 12, 2021 at 3:53 PM Timo Walther < >>>> [hidden email] >>>>>>> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Shengkai, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> thanks for updating the FLIP. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have one last comment for the option >>> `table.execution.mode`. >>>>>>>>>> Should >>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> already use the global Flink option >> `execution.runtime-mode` >>>>>>>>>> instead? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We are using Flink's options where possible (e.g. ` >>>>>> pipeline.name` >>>>>>>>>>>>> and >>>>>>>>>>>>>>>> `parallism.default`) why not also for batch/streaming >> mode? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The description of the option matches to the Blink planner >>>>>>>>>> behavior: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>> Among other things, this controls task scheduling, network >>>>>> shuffle >>>>>>>>>>>>>>>> behavior, and time semantics. >>>>>>>>>>>>>>>> ``` >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 10.02.21 06:30, Shengkai Fang wrote: >>>>>>>>>>>>>>>>> Hi, guys. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have updated the FLIP. It seems we have reached >>> agreement. >>>>>>>>>> Maybe >>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> can >>>>>>>>>>>>>>>>> start the vote soon. If anyone has other questions, >> please >>>>>> leave >>>>>>>>>>>>> your >>>>>>>>>>>>>>>>> comments. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Shengkai >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Rui Li <[hidden email]>于2021年2月9日 周二下午7:52写道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi guys, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The conclusion sounds good to me. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Feb 9, 2021 at 5:39 PM Shengkai Fang < >>>>>> [hidden email] >>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, Timo, Jark. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am fine with the new option name. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>> Shengkai >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Timo Walther <[hidden email]>于2021年2月9日 >> 周二下午5:35写道: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Yes, `TableEnvironment#executeMultiSql()` can be >> future >>>>>> work. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> @Rui, Shengkai: Are you also fine with this >> conclusion? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On 09.02.21 10:14, Jark Wu wrote: >>>>>>>>>>>>>>>>>>>>> I'm fine with `table.multi-dml-sync`. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> My previous concern about "multi" is that DML in CLI >>>> looks >>>>>>>>>> like >>>>>>>>>>>>>>>>>> single >>>>>>>>>>>>>>>>>>>>> statement. >>>>>>>>>>>>>>>>>>>>> But we can treat CLI as a multi-line accepting >>> statements >>>>>>>> from >>>>>>>>>>>>>>>>>> opening >>>>>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>> closing. >>>>>>>>>>>>>>>>>>>>> Thus, I'm fine with `table.multi-dml-sync`. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> So the conclusion is `table.multi-dml-sync` (false by >>>>>>>>>> default), >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>>>>>> will >>>>>>>>>>>>>>>>>>>>> support this config >>>>>>>>>>>>>>>>>>>>> in SQL CLI first, will support it in >>>>>>>>>>>>>>>>>> TableEnvironment#executeMultiSql() >>>>>>>>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>>> the future, right? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>> Jark >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On Tue, 9 Feb 2021 at 16:37, Timo Walther < >>>>>>>> [hidden email] >>>>>>>>>>> >>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Hi everyone, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I understand Rui's concerns. `table.dml-sync` should >>> not >>>>>>>>>> apply >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>> regular `executeSql`. Actually, this option makes >> only >>>>>> sense >>>>>>>>>>>>>> when >>>>>>>>>>>>>>>>>>>>>> executing multi statements. Once we have a >>>>>>>>>>>>>>>>>>>>>> `TableEnvironment.executeMultiSql()` this config >> could >>>> be >>>>>>>>>>>>>>>>>> considered. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Maybe we can find a better generic name? Other >>> platforms >>>>>>>> will >>>>>>>>>>>>>> also >>>>>>>>>>>>>>>>>>> need >>>>>>>>>>>>>>>>>>>>>> to have this config option, which is why I would >> like >>> to >>>>>>>>>>>>> avoid a >>>>>>>>>>>>>>> SQL >>>>>>>>>>>>>>>>>>>>>> Client specific option. Otherwise every platform has >>> to >>>>>> come >>>>>>>>>>>>> up >>>>>>>>>>>>>>> with >>>>>>>>>>>>>>>>>>>>>> this important config option separately. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Maybe `table.multi-dml-sync` >> `table.multi-stmt-sync`? >>> Or >>>>>>>>>> other >>>>>>>>>>>>>>>>>>> opinions? >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>>>> Timo >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On 09.02.21 08:50, Shengkai Fang wrote: >>>>>>>>>>>>>>>>>>>>>>> Hi, all. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> I think it may cause user confused. The main >> problem >>> is >>>>>> we >>>>>>>>>>>>>> have >>>>>>>>>>>>>>> no >>>>>>>>>>>>>>>>>>>> means >>>>>>>>>>>>>>>>>>>>>>> to detect the conflict configuration, e.g. users >> set >>>> the >>>>>>>>>>>>> option >>>>>>>>>>>>>>>>>> true >>>>>>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>>>>> use `TableResult#await` together. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>>>>>>>> Shengkai. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> Best regards! >>>>>>>>>>>>>>>>>> Rui Li >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> > |
Free forum by Nabble | Edit this page |