(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Support source/sink parallelism config in Flink sql

Classic

List

Threaded

5 messages Options

admin

[DISCUSS] Support source/sink parallelism config in Flink sql

Hi devs:
Currently,Flink sql does not support source/sink parallelism config.So,it will result in wasting or lacking resources in some cases.
I think it is necessary to introduce configuration of source/sink parallelism in sql.
From my side,i have the solution for this feature.Add parallelism config in ‘with’ properties of DDL.

Before 1.11,we can get parallelism and then set it to StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
After 1.11,we can get parallelism from catalogTable and then set it to transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.

What do you think?

Benchao Li-2

Re: [DISCUSS] Support source/sink parallelism config in Flink sql

Hi admin,

Thanks for bringing up this discussion.
IMHO, it's a valuable feature. We also added this feature for our internal
SQL engine.
And our way is very similar to your proposal.

Regarding the implementation, there is one shorthand that we should modify
each connector
to support this property.
We can wait for others' opinion whether this is a valid proposal. If yes,
then we can discuss
the implementation detailedly.

admin <[hidden email]> 于2020年9月10日周四上午1:19写道：

> Hi devs:
> Currently,Flink sql does not support source/sink parallelism config.So,it
> will result in wasting or lacking resources in some cases.
> I think it is necessary to introduce configuration of source/sink
> parallelism in sql.
> From my side,i have the solution for this feature.Add parallelism config
> in ‘with’ properties of DDL.
>
> Before 1.11,we can get parallelism and then set it to
> StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> After 1.11,we can get parallelism from catalogTable and then set it to
> transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.
>
> What do you think?
>
>
>
>
>

--

Best,
Benchao Li

刘大龙

Re: Re: [DISCUSS] Support source/sink parallelism config in Flink sql

> -----原始邮件-----
> 发件人: "Benchao Li" <[hidden email]>
> 发送时间: 2020-09-20 16:28:20 (星期日)
> 收件人: dev <[hidden email]>
> 抄送:
> 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql
>
> Hi admin,
>
> Thanks for bringing up this discussion.
> IMHO, it's a valuable feature. We also added this feature for our internal
> SQL engine.
> And our way is very similar to your proposal.
>
> Regarding the implementation, there is one shorthand that we should modify
> each connector
> to support this property.
> We can wait for others' opinion whether this is a valid proposal. If yes,
> then we can discuss
> the implementation detailedly.
>
> admin <[hidden email]> 于2020年9月10日周四上午1:19写道：
>
> > Hi devs:
> > Currently,Flink sql does not support source/sink parallelism config.So,it
> > will result in wasting or lacking resources in some cases.
> > I think it is necessary to introduce configuration of source/sink
> > parallelism in sql.
> > From my side,i have the solution for this feature.Add parallelism config
> > in ‘with’ properties of DDL.
> >
> > Before 1.11,we can get parallelism and then set it to
> > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> > After 1.11,we can get parallelism from catalogTable and then set it to
> > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.
> >
> > What do you think?
> >
> >
> >
> >
> >
>
> --
>
> Best,
> Benchao Li

Jark Wu-2

Re: Re: [DISCUSS] Support source/sink parallelism config in Flink sql

Since FLIP-95, the parallelism is decoupled from the runtime class
(DataStream/SourceFunction),
so we need to have an API to tell the planner what the parallelism of the
source/sink is.

This is indeed the purpose of a previous discussion: [DISCUSS] Introduce
SupportsParallelismReport and SupportsStatisticsReport
We can continue the discussion there.

Best,
Jark

[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-SupportsParallelismReport-and-SupportsStatisticsReport-for-Hive-and-Filesystem-td43531.html

On Sun, 20 Sep 2020 at 23:14, 刘大龙 <[hidden email]> wrote:

>
> +1
>
> > -----原始邮件-----
> > 发件人: "Benchao Li" <[hidden email]>
> > 发送时间: 2020-09-20 16:28:20 (星期日)
> > 收件人: dev <[hidden email]>
> > 抄送:
> > 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql
> >
> > Hi admin,
> >
> > Thanks for bringing up this discussion.
> > IMHO, it's a valuable feature. We also added this feature for our
> internal
> > SQL engine.
> > And our way is very similar to your proposal.
> >
> > Regarding the implementation, there is one shorthand that we should
> modify
> > each connector
> > to support this property.
> > We can wait for others' opinion whether this is a valid proposal. If yes,
> > then we can discuss
> > the implementation detailedly.
> >
> > admin <[hidden email]> 于2020年9月10日周四上午1:19写道：
> >
> > > Hi devs:
> > > Currently,Flink sql does not support source/sink parallelism
> config.So,it
> > > will result in wasting or lacking resources in some cases.
> > > I think it is necessary to introduce configuration of source/sink
> > > parallelism in sql.
> > > From my side,i have the solution for this feature.Add parallelism
> config
> > > in ‘with’ properties of DDL.
> > >
> > > Before 1.11,we can get parallelism and then set it to
> > > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> > > After 1.11,we can get parallelism from catalogTable and then set it to
> > > transformation in CommonPhysicalTableSourceScan or CommonPhysicalSink.
> > >
> > > What do you think?
> > >
> > >
> > >
> > >
> > >
> >
> > --
> >
> > Best,
> > Benchao Li
>

Jingsong Li

Re: Re: [DISCUSS] Support source/sink parallelism config in Flink sql

Hi ,

I have started a discussion about improving the new TableSource and
TableSink:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-146-Improve-new-TableSource-and-TableSink-interfaces-td45161.html
It includes parallelism setting, welcome to join the discussion and look
forward to your comments.

Best,
Jingsong

On Mon, Sep 21, 2020 at 11:03 AM Jark Wu <[hidden email]> wrote:

> Since FLIP-95, the parallelism is decoupled from the runtime class
> (DataStream/SourceFunction),
> so we need to have an API to tell the planner what the parallelism of the
> source/sink is.
>
> This is indeed the purpose of a previous discussion: [DISCUSS] Introduce
> SupportsParallelismReport and SupportsStatisticsReport
> We can continue the discussion there.
>
> Best,
> Jark
>
> [1]:
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Introduce-SupportsParallelismReport-and-SupportsStatisticsReport-for-Hive-and-Filesystem-td43531.html
>
> On Sun, 20 Sep 2020 at 23:14, 刘大龙 <[hidden email]> wrote:
>
> >
> > +1
> >
> > > -----原始邮件-----
> > > 发件人: "Benchao Li" <[hidden email]>
> > > 发送时间: 2020-09-20 16:28:20 (星期日)
> > > 收件人: dev <[hidden email]>
> > > 抄送:
> > > 主题: Re: [DISCUSS] Support source/sink parallelism config in Flink sql
> > >
> > > Hi admin,
> > >
> > > Thanks for bringing up this discussion.
> > > IMHO, it's a valuable feature. We also added this feature for our
> > internal
> > > SQL engine.
> > > And our way is very similar to your proposal.
> > >
> > > Regarding the implementation, there is one shorthand that we should
> > modify
> > > each connector
> > > to support this property.
> > > We can wait for others' opinion whether this is a valid proposal. If
> yes,
> > > then we can discuss
> > > the implementation detailedly.
> > >
> > > admin <[hidden email]> 于2020年9月10日周四上午1:19写道：
> > >
> > > > Hi devs:
> > > > Currently,Flink sql does not support source/sink parallelism
> > config.So,it
> > > > will result in wasting or lacking resources in some cases.
> > > > I think it is necessary to introduce configuration of source/sink
> > > > parallelism in sql.
> > > > From my side,i have the solution for this feature.Add parallelism
> > config
> > > > in ‘with’ properties of DDL.
> > > >
> > > > Before 1.11,we can get parallelism and then set it to
> > > > StreamTableSink#consumeDataStream or StreamTableSource#getDataStream
> > > > After 1.11,we can get parallelism from catalogTable and then set it
> to
> > > > transformation in CommonPhysicalTableSourceScan or
> CommonPhysicalSink.
> > > >
> > > > What do you think?
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> >
>

--
Best, Jingsong Lee