Hi devs:
Currently,Flink sql does not support source/sink parallelism config.So,it will result in wasting or lacking resources in some cases. I think it is necessary to introduce configuration of source/sink parallelism in sql. From my side,i have the solution for this feature.Add parallelism config in ‘with’ properties of DDL. Before 1.11,we can get parallelism and then set it to StreamTableSink#consumeDataStream or StreamTableSource#getDataStream After 1.11,we can get parallelism from catalogTable and then set it to transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink. What do you think? |
Hi admin,
Thanks for bringing up this discussion. IMHO, it's a valuable feature. We also added this feature for our internal SQL engine. And our way is very similar to your proposal. Regarding the implementation, there is one shorthand that we should modify each connector to support this property. We can wait for others' opinion whether this is a valid proposal. If yes, then we can discuss the implementation detailedly. admin <[hidden email]> 于2020年9月10日周四 上午1:19写道: > Hi devs: > Currently,Flink sql does not support source/sink parallelism config.So,it > will result in wasting or lacking resources in some cases. > I think it is necessary to introduce configuration of source/sink > parallelism in sql. > From my side,i have the solution for this feature.Add parallelism config > in ‘with’ properties of DDL. > > Before 1.11,we can get parallelism and then set it to > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream > After 1.11,we can get parallelism from catalogTable and then set it to > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink. > > What do you think? > > > > > -- Best, Benchao Li |
+1 > -----原始邮件----- > 发件人: "Benchao Li" <[hidden email]> > 发送时间: 2020-09-20 16:28:20 (星期日) > 收件人: dev <[hidden email]> > 抄送: > 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql > > Hi admin, > > Thanks for bringing up this discussion. > IMHO, it's a valuable feature. We also added this feature for our internal > SQL engine. > And our way is very similar to your proposal. > > Regarding the implementation, there is one shorthand that we should modify > each connector > to support this property. > We can wait for others' opinion whether this is a valid proposal. If yes, > then we can discuss > the implementation detailedly. > > admin <[hidden email]> 于2020年9月10日周四 上午1:19写道: > > > Hi devs: > > Currently,Flink sql does not support source/sink parallelism config.So,it > > will result in wasting or lacking resources in some cases. > > I think it is necessary to introduce configuration of source/sink > > parallelism in sql. > > From my side,i have the solution for this feature.Add parallelism config > > in ‘with’ properties of DDL. > > > > Before 1.11,we can get parallelism and then set it to > > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream > > After 1.11,we can get parallelism from catalogTable and then set it to > > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink. > > > > What do you think? > > > > > > > > > > > > -- > > Best, > Benchao Li |
Since FLIP-95, the parallelism is decoupled from the runtime class
(DataStream/SourceFunction), so we need to have an API to tell the planner what the parallelism of the source/sink is. This is indeed the purpose of a previous discussion: [DISCUSS] Introduce SupportsParallelismReport and SupportsStatisticsReport We can continue the discussion there. Best, Jark [1]: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-SupportsParallelismReport-and-SupportsStatisticsReport-for-Hive-and-Filesystem-td43531.html On Sun, 20 Sep 2020 at 23:14, 刘大龙 <[hidden email]> wrote: > > +1 > > > -----原始邮件----- > > 发件人: "Benchao Li" <[hidden email]> > > 发送时间: 2020-09-20 16:28:20 (星期日) > > 收件人: dev <[hidden email]> > > 抄送: > > 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql > > > > Hi admin, > > > > Thanks for bringing up this discussion. > > IMHO, it's a valuable feature. We also added this feature for our > internal > > SQL engine. > > And our way is very similar to your proposal. > > > > Regarding the implementation, there is one shorthand that we should > modify > > each connector > > to support this property. > > We can wait for others' opinion whether this is a valid proposal. If yes, > > then we can discuss > > the implementation detailedly. > > > > admin <[hidden email]> 于2020年9月10日周四 上午1:19写道: > > > > > Hi devs: > > > Currently,Flink sql does not support source/sink parallelism > config.So,it > > > will result in wasting or lacking resources in some cases. > > > I think it is necessary to introduce configuration of source/sink > > > parallelism in sql. > > > From my side,i have the solution for this feature.Add parallelism > config > > > in ‘with’ properties of DDL. > > > > > > Before 1.11,we can get parallelism and then set it to > > > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream > > > After 1.11,we can get parallelism from catalogTable and then set it to > > > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink. > > > > > > What do you think? > > > > > > > > > > > > > > > > > > > -- > > > > Best, > > Benchao Li > |
Hi ,
I have started a discussion about improving the new TableSource and TableSink: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-146-Improve-new-TableSource-and-TableSink-interfaces-td45161.html It includes parallelism setting, welcome to join the discussion and look forward to your comments. Best, Jingsong On Mon, Sep 21, 2020 at 11:03 AM Jark Wu <[hidden email]> wrote: > Since FLIP-95, the parallelism is decoupled from the runtime class > (DataStream/SourceFunction), > so we need to have an API to tell the planner what the parallelism of the > source/sink is. > > This is indeed the purpose of a previous discussion: [DISCUSS] Introduce > SupportsParallelismReport and SupportsStatisticsReport > We can continue the discussion there. > > Best, > Jark > > [1]: > > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-SupportsParallelismReport-and-SupportsStatisticsReport-for-Hive-and-Filesystem-td43531.html > > On Sun, 20 Sep 2020 at 23:14, 刘大龙 <[hidden email]> wrote: > > > > > +1 > > > > > -----原始邮件----- > > > 发件人: "Benchao Li" <[hidden email]> > > > 发送时间: 2020-09-20 16:28:20 (星期日) > > > 收件人: dev <[hidden email]> > > > 抄送: > > > 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql > > > > > > Hi admin, > > > > > > Thanks for bringing up this discussion. > > > IMHO, it's a valuable feature. We also added this feature for our > > internal > > > SQL engine. > > > And our way is very similar to your proposal. > > > > > > Regarding the implementation, there is one shorthand that we should > > modify > > > each connector > > > to support this property. > > > We can wait for others' opinion whether this is a valid proposal. If > yes, > > > then we can discuss > > > the implementation detailedly. > > > > > > admin <[hidden email]> 于2020年9月10日周四 上午1:19写道: > > > > > > > Hi devs: > > > > Currently,Flink sql does not support source/sink parallelism > > config.So,it > > > > will result in wasting or lacking resources in some cases. > > > > I think it is necessary to introduce configuration of source/sink > > > > parallelism in sql. > > > > From my side,i have the solution for this feature.Add parallelism > > config > > > > in ‘with’ properties of DDL. > > > > > > > > Before 1.11,we can get parallelism and then set it to > > > > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream > > > > After 1.11,we can get parallelism from catalogTable and then set it > to > > > > transformation in CommonPhysicalTableSourceScan or > CommonPhysicalSink. > > > > > > > > What do you think? > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best, > > > Benchao Li > > > -- Best, Jingsong Lee |
Free forum by Nabble | Edit this page |