(DEPRECATED) Apache Flink Mailing List archive.

[DISCUSS] Adaptive Parallelism of Job Vertex

Classic

List

Threaded

4 messages Options

Bo WANG

[DISCUSS] Adaptive Parallelism of Job Vertex

Hi all,
In distribution computing system, execution parallelism is vital for both
resource efficiency and execution performance. In Flink, execution
parallelism is a pre-specified parameter, which is usually an empirical
value and thus might not be optimal on the various amount of data processed
by each task.

Furthermore, a fixed parallelism cannot scale to varying data size, which
is common in production cluster, since we may not frequently change the
cluster configuration.

Thus, we propose adaptively determine the execution parallelism of each
vertex at runtime based on the actual input data size and an ideal data
size processed by each task. The ideal data size is a pre-specified
parameter according to the property of the operator.

The design doc is ready:
https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing,
any comments are highly appreciated.

Till Rohrmann

Re: [DISCUSS] Adaptive Parallelism of Job Vertex

Hi Bo Wang,

thanks for proposing this design document. I think it is an interesting
idea to improve Flink's execution efficiency.

At the moment, the community is actively working on making Flink's
scheduler pluggable. Once this is possible, we could try this feature out
by implementing a scheduler which supports adaptive parallelism without
affecting the existing code. I think this would be a nice approach to
further evaluate and benchmark the implications of such a strategy. What do
you think?

Cheers,
Till

On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <[hidden email]> wrote:

> Hi all,
> In distribution computing system, execution parallelism is vital for both
> resource efficiency and execution performance. In Flink, execution
> parallelism is a pre-specified parameter, which is usually an empirical
> value and thus might not be optimal on the various amount of data processed
> by each task.
>
> Furthermore, a fixed parallelism cannot scale to varying data size, which
> is common in production cluster, since we may not frequently change the
> cluster configuration.
>
> Thus, we propose adaptively determine the execution parallelism of each
> vertex at runtime based on the actual input data size and an ideal data
> size processed by each task. The ideal data size is a pre-specified
> parameter according to the property of the operator.
>
> The design doc is ready:
>
> https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing
> ,
> any comments are highly appreciated.
>

Bo WANG

Re: [DISCUSS] Adaptive Parallelism of Job Vertex

Thanks Till for the comments.
We will implement a new adaptive parallelism supported scheduler in the new
schedule framework. Based on these schedule interfaces, we could do the
work in parallel.

On Tue, Apr 16, 2019 at 11:18 PM Till Rohrmann <[hidden email]> wrote:

> Hi Bo Wang,
>
> thanks for proposing this design document. I think it is an interesting
> idea to improve Flink's execution efficiency.
>
> At the moment, the community is actively working on making Flink's
> scheduler pluggable. Once this is possible, we could try this feature out
> by implementing a scheduler which supports adaptive parallelism without
> affecting the existing code. I think this would be a nice approach to
> further evaluate and benchmark the implications of such a strategy. What do
> you think?
>
> Cheers,
> Till
>
> On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <[hidden email]> wrote:
>
> > Hi all,
> > In distribution computing system, execution parallelism is vital for both
> > resource efficiency and execution performance. In Flink, execution
> > parallelism is a pre-specified parameter, which is usually an empirical
> > value and thus might not be optimal on the various amount of data
> processed
> > by each task.
> >
> > Furthermore, a fixed parallelism cannot scale to varying data size, which
> > is common in production cluster, since we may not frequently change the
> > cluster configuration.
> >
> > Thus, we propose adaptively determine the execution parallelism of each
> > vertex at runtime based on the actual input data size and an ideal data
> > size processed by each task. The ideal data size is a pre-specified
> > parameter according to the property of the operator.
> >
> > The design doc is ready:
> >
> >
> https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing
> > ,
> > any comments are highly appreciated.
> >
>

Till Rohrmann

Re: [DISCUSS] Adaptive Parallelism of Job Vertex

I think this would be a good way to go. Please monitor FLINK-10429 [1] to
see when the Scheduler interface is put into place. The sub tasks to add
the interfaces should be added very soon.

[1] https://issues.apache.org/jira/browse/FLINK-10429

Cheers,
Till

On Wed, Apr 17, 2019 at 1:54 PM Bo WANG <[hidden email]> wrote:

> Thanks Till for the comments.
> We will implement a new adaptive parallelism supported scheduler in the new
> schedule framework. Based on these schedule interfaces, we could do the
> work in parallel.
>
> On Tue, Apr 16, 2019 at 11:18 PM Till Rohrmann <[hidden email]>
> wrote:
>
> > Hi Bo Wang,
> >
> > thanks for proposing this design document. I think it is an interesting
> > idea to improve Flink's execution efficiency.
> >
> > At the moment, the community is actively working on making Flink's
> > scheduler pluggable. Once this is possible, we could try this feature out
> > by implementing a scheduler which supports adaptive parallelism without
> > affecting the existing code. I think this would be a nice approach to
> > further evaluate and benchmark the implications of such a strategy. What
> do
> > you think?
> >
> > Cheers,
> > Till
> >
> > On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <[hidden email]>
> wrote:
> >
> > > Hi all,
> > > In distribution computing system, execution parallelism is vital for
> both
> > > resource efficiency and execution performance. In Flink, execution
> > > parallelism is a pre-specified parameter, which is usually an empirical
> > > value and thus might not be optimal on the various amount of data
> > processed
> > > by each task.
> > >
> > > Furthermore, a fixed parallelism cannot scale to varying data size,
> which
> > > is common in production cluster, since we may not frequently change the
> > > cluster configuration.
> > >
> > > Thus, we propose adaptively determine the execution parallelism of each
> > > vertex at runtime based on the actual input data size and an ideal data
> > > size processed by each task. The ideal data size is a pre-specified
> > > parameter according to the property of the operator.
> > >
> > > The design doc is ready:
> > >
> > >
> >
> https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing
> > > ,
> > > any comments are highly appreciated.
> > >
> >
>