Hi all,
In distribution computing system, execution parallelism is vital for both resource efficiency and execution performance. In Flink, execution parallelism is a pre-specified parameter, which is usually an empirical value and thus might not be optimal on the various amount of data processed by each task. Furthermore, a fixed parallelism cannot scale to varying data size, which is common in production cluster, since we may not frequently change the cluster configuration. Thus, we propose adaptively determine the execution parallelism of each vertex at runtime based on the actual input data size and an ideal data size processed by each task. The ideal data size is a pre-specified parameter according to the property of the operator. The design doc is ready: https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing, any comments are highly appreciated. |
Hi Bo Wang,
thanks for proposing this design document. I think it is an interesting idea to improve Flink's execution efficiency. At the moment, the community is actively working on making Flink's scheduler pluggable. Once this is possible, we could try this feature out by implementing a scheduler which supports adaptive parallelism without affecting the existing code. I think this would be a nice approach to further evaluate and benchmark the implications of such a strategy. What do you think? Cheers, Till On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <[hidden email]> wrote: > Hi all, > In distribution computing system, execution parallelism is vital for both > resource efficiency and execution performance. In Flink, execution > parallelism is a pre-specified parameter, which is usually an empirical > value and thus might not be optimal on the various amount of data processed > by each task. > > Furthermore, a fixed parallelism cannot scale to varying data size, which > is common in production cluster, since we may not frequently change the > cluster configuration. > > Thus, we propose adaptively determine the execution parallelism of each > vertex at runtime based on the actual input data size and an ideal data > size processed by each task. The ideal data size is a pre-specified > parameter according to the property of the operator. > > The design doc is ready: > > https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing > , > any comments are highly appreciated. > |
Thanks Till for the comments.
We will implement a new adaptive parallelism supported scheduler in the new schedule framework. Based on these schedule interfaces, we could do the work in parallel. On Tue, Apr 16, 2019 at 11:18 PM Till Rohrmann <[hidden email]> wrote: > Hi Bo Wang, > > thanks for proposing this design document. I think it is an interesting > idea to improve Flink's execution efficiency. > > At the moment, the community is actively working on making Flink's > scheduler pluggable. Once this is possible, we could try this feature out > by implementing a scheduler which supports adaptive parallelism without > affecting the existing code. I think this would be a nice approach to > further evaluate and benchmark the implications of such a strategy. What do > you think? > > Cheers, > Till > > On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <[hidden email]> wrote: > > > Hi all, > > In distribution computing system, execution parallelism is vital for both > > resource efficiency and execution performance. In Flink, execution > > parallelism is a pre-specified parameter, which is usually an empirical > > value and thus might not be optimal on the various amount of data > processed > > by each task. > > > > Furthermore, a fixed parallelism cannot scale to varying data size, which > > is common in production cluster, since we may not frequently change the > > cluster configuration. > > > > Thus, we propose adaptively determine the execution parallelism of each > > vertex at runtime based on the actual input data size and an ideal data > > size processed by each task. The ideal data size is a pre-specified > > parameter according to the property of the operator. > > > > The design doc is ready: > > > > > https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing > > , > > any comments are highly appreciated. > > > |
I think this would be a good way to go. Please monitor FLINK-10429 [1] to
see when the Scheduler interface is put into place. The sub tasks to add the interfaces should be added very soon. [1] https://issues.apache.org/jira/browse/FLINK-10429 Cheers, Till On Wed, Apr 17, 2019 at 1:54 PM Bo WANG <[hidden email]> wrote: > Thanks Till for the comments. > We will implement a new adaptive parallelism supported scheduler in the new > schedule framework. Based on these schedule interfaces, we could do the > work in parallel. > > On Tue, Apr 16, 2019 at 11:18 PM Till Rohrmann <[hidden email]> > wrote: > > > Hi Bo Wang, > > > > thanks for proposing this design document. I think it is an interesting > > idea to improve Flink's execution efficiency. > > > > At the moment, the community is actively working on making Flink's > > scheduler pluggable. Once this is possible, we could try this feature out > > by implementing a scheduler which supports adaptive parallelism without > > affecting the existing code. I think this would be a nice approach to > > further evaluate and benchmark the implications of such a strategy. What > do > > you think? > > > > Cheers, > > Till > > > > On Mon, Apr 8, 2019 at 10:28 AM Bo WANG <[hidden email]> > wrote: > > > > > Hi all, > > > In distribution computing system, execution parallelism is vital for > both > > > resource efficiency and execution performance. In Flink, execution > > > parallelism is a pre-specified parameter, which is usually an empirical > > > value and thus might not be optimal on the various amount of data > > processed > > > by each task. > > > > > > Furthermore, a fixed parallelism cannot scale to varying data size, > which > > > is common in production cluster, since we may not frequently change the > > > cluster configuration. > > > > > > Thus, we propose adaptively determine the execution parallelism of each > > > vertex at runtime based on the actual input data size and an ideal data > > > size processed by each task. The ideal data size is a pre-specified > > > parameter according to the property of the operator. > > > > > > The design doc is ready: > > > > > > > > > https://docs.google.com/document/d/1ZxnoJ3SOxUk1PL2xC1t-kepq28Pg20IL6eVKUUOWSKY/edit?usp=sharing > > > , > > > any comments are highly appreciated. > > > > > > |
Free forum by Nabble | Edit this page |