[jira] [Created] (FLINK-12002) Adaptive Parallelism of Job Vertex Execution

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-12002) Adaptive Parallelism of Job Vertex Execution

Shang Yuanchun (Jira)
ryantaocer created FLINK-12002:
----------------------------------

             Summary: Adaptive Parallelism of Job Vertex Execution
                 Key: FLINK-12002
                 URL: https://issues.apache.org/jira/browse/FLINK-12002
             Project: Flink
          Issue Type: Improvement
            Reporter: ryantaocer
            Assignee: BoWang


In Flink the parallelism of job is a pre-specified parameter, which is usually an empirical value and thus might not be optimal for both performance and resource depending on the amount of data processed in each task.

Furthermore, a fixed parallelism cannot scale to varying data size common in production cluster where we may not often change configurations. 

We propose to determine the job parallelism adaptive to the actual total input data size and an ideal data size processed by each task. The ideal size is pre-specified according to the properties of the operator such as the preparation overhead compared with data processing time.

detailed design doc coming soon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (FLINK-12002) Adaptive Parallelism of Job Vertex Execution

dachuan.qu@gmail.com


On 2019/03/25 08:30:00, "ryantaocer (JIRA)" <[hidden email]> wrote:

> ryantaocer created FLINK-12002:
> ----------------------------------
>
>              Summary: Adaptive Parallelism of Job Vertex Execution
>                  Key: FLINK-12002
>                  URL: https://issues.apache.org/jira/browse/FLINK-12002
>              Project: Flink
>           Issue Type: Improvement
>             Reporter: ryantaocer
>             Assignee: BoWang
>
>
> In Flink the parallelism of job is a pre-specified parameter, which is usually an empirical value and thus might not be optimal for both performance and resource depending on the amount of data processed in each task.
>
> Furthermore, a fixed parallelism cannot scale to varying data size common in production cluster where we may not often change configurations. 
>
> We propose to determine the job parallelism adaptive to the actual total input data size and an ideal data size processed by each task. The ideal size is pre-specified according to the properties of the operator such as the preparation overhead compared with data processing time.
>
> detailed design doc coming soon.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
> Sounds great for maximizing resource usage! Expecting design doc in more details.