Hi all,
As the subject states, I am proposing to temporarily remove support for changing the parallelism of a job via the following syntax [1]: ./bin/flink modify [job-id] -p [new-parallelism] This is an experimental feature that we introduced with the first rollout of FLIP-6 (Flink 1.5). However, this feature comes with a few caveats: * Rescaling does not work with HA enabled [2] * New parallelism is not persisted, i.e., after a JobManager restart, the job will be recovered with the initial parallelism Due to the above-mentioned issues, I believe that currently nobody uses "modify -p" to rescale their jobs in production. Moreover, the rescaling feature stands in the way of our current efforts to rework Flink's scheduling [3]. I therefore propose to remove the rescaling code for the time being. Note that it will still be possible to change the parallelism by taking a savepoint and restoring the job with a different parallelism [4]. Any comments and suggestions will be highly appreciated. Best, Gary [1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html [2] https://issues.apache.org/jira/browse/FLINK-8902 [3] https://issues.apache.org/jira/browse/FLINK-10429 [4] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring |
Sounds reasonable to me. If it is a broken feature, then there is not much
value in it. On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote: > Hi all, > > As the subject states, I am proposing to temporarily remove support for > changing the parallelism of a job via the following syntax [1]: > > ./bin/flink modify [job-id] -p [new-parallelism] > > This is an experimental feature that we introduced with the first rollout > of > FLIP-6 (Flink 1.5). However, this feature comes with a few caveats: > > * Rescaling does not work with HA enabled [2] > * New parallelism is not persisted, i.e., after a JobManager restart, > the job > will be recovered with the initial parallelism > > Due to the above-mentioned issues, I believe that currently nobody uses > "modify -p" to rescale their jobs in production. Moreover, the rescaling > feature stands in the way of our current efforts to rework Flink's > scheduling > [3]. I therefore propose to remove the rescaling code for the time being. > Note > that it will still be possible to change the parallelism by taking a > savepoint > and restoring the job with a different parallelism [4]. > > Any comments and suggestions will be highly appreciated. > > Best, > Gary > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html > [2] https://issues.apache.org/jira/browse/FLINK-8902 > [3] https://issues.apache.org/jira/browse/FLINK-10429 > [4] > > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring > |
Hi Gary,
+ 1 to remove it for now. Actually some users are not aware of that it’s still experimental, and ask quite a lot about the problem it causes. Best, Paul Lam > 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道: > > Sounds reasonable to me. If it is a broken feature, then there is not much > value in it. > > On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote: > >> Hi all, >> >> As the subject states, I am proposing to temporarily remove support for >> changing the parallelism of a job via the following syntax [1]: >> >> ./bin/flink modify [job-id] -p [new-parallelism] >> >> This is an experimental feature that we introduced with the first rollout >> of >> FLIP-6 (Flink 1.5). However, this feature comes with a few caveats: >> >> * Rescaling does not work with HA enabled [2] >> * New parallelism is not persisted, i.e., after a JobManager restart, >> the job >> will be recovered with the initial parallelism >> >> Due to the above-mentioned issues, I believe that currently nobody uses >> "modify -p" to rescale their jobs in production. Moreover, the rescaling >> feature stands in the way of our current efforts to rework Flink's >> scheduling >> [3]. I therefore propose to remove the rescaling code for the time being. >> Note >> that it will still be possible to change the parallelism by taking a >> savepoint >> and restoring the job with a different parallelism [4]. >> >> Any comments and suggestions will be highly appreciated. >> >> Best, >> Gary >> >> [1] >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html >> [2] https://issues.apache.org/jira/browse/FLINK-8902 >> [3] https://issues.apache.org/jira/browse/FLINK-10429 >> [4] >> >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring >> |
+1 for temporarily removing support for the modify command.
Eventually, we have to add it again in order to support auto scaling. The next time we add it, we should address the known limitations. Cheers, Till On Wed, Apr 24, 2019 at 9:06 AM Paul Lam <[hidden email]> wrote: > Hi Gary, > > + 1 to remove it for now. Actually some users are not aware of that it’s > still experimental, and ask quite a lot about the problem it causes. > > Best, > Paul Lam > > 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道: > > Sounds reasonable to me. If it is a broken feature, then there is not much > value in it. > > On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote: > > Hi all, > > As the subject states, I am proposing to temporarily remove support for > changing the parallelism of a job via the following syntax [1]: > > ./bin/flink modify [job-id] -p [new-parallelism] > > This is an experimental feature that we introduced with the first rollout > of > FLIP-6 (Flink 1.5). However, this feature comes with a few caveats: > > * Rescaling does not work with HA enabled [2] > * New parallelism is not persisted, i.e., after a JobManager restart, > the job > will be recovered with the initial parallelism > > Due to the above-mentioned issues, I believe that currently nobody uses > "modify -p" to rescale their jobs in production. Moreover, the rescaling > feature stands in the way of our current efforts to rework Flink's > scheduling > [3]. I therefore propose to remove the rescaling code for the time being. > Note > that it will still be possible to change the parallelism by taking a > savepoint > and restoring the job with a different parallelism [4]. > > Any comments and suggestions will be highly appreciated. > > Best, > Gary > > [1] > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html > [2] https://issues.apache.org/jira/browse/FLINK-8902 > [3] https://issues.apache.org/jira/browse/FLINK-10429 > [4] > > > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring > > > |
Will we only remove command support in client side or the code in job
master will also be removed? Till Rohrmann <[hidden email]> 于2019年4月24日周三 下午4:12写道: > +1 for temporarily removing support for the modify command. > > Eventually, we have to add it again in order to support auto scaling. The > next time we add it, we should address the known limitations. > > Cheers, > Till > > On Wed, Apr 24, 2019 at 9:06 AM Paul Lam <[hidden email]> wrote: > > > Hi Gary, > > > > + 1 to remove it for now. Actually some users are not aware of that it’s > > still experimental, and ask quite a lot about the problem it causes. > > > > Best, > > Paul Lam > > > > 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道: > > > > Sounds reasonable to me. If it is a broken feature, then there is not > much > > value in it. > > > > On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote: > > > > Hi all, > > > > As the subject states, I am proposing to temporarily remove support for > > changing the parallelism of a job via the following syntax [1]: > > > > ./bin/flink modify [job-id] -p [new-parallelism] > > > > This is an experimental feature that we introduced with the first rollout > > of > > FLIP-6 (Flink 1.5). However, this feature comes with a few caveats: > > > > * Rescaling does not work with HA enabled [2] > > * New parallelism is not persisted, i.e., after a JobManager restart, > > the job > > will be recovered with the initial parallelism > > > > Due to the above-mentioned issues, I believe that currently nobody uses > > "modify -p" to rescale their jobs in production. Moreover, the rescaling > > feature stands in the way of our current efforts to rework Flink's > > scheduling > > [3]. I therefore propose to remove the rescaling code for the time being. > > Note > > that it will still be possible to change the parallelism by taking a > > savepoint > > and restoring the job with a different parallelism [4]. > > > > Any comments and suggestions will be highly appreciated. > > > > Best, > > Gary > > > > [1] > > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html > > [2] https://issues.apache.org/jira/browse/FLINK-8902 > > [3] https://issues.apache.org/jira/browse/FLINK-10429 > > [4] > > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring > > > > > > > |
The idea is to also remove the rescaling code in the JobMaster. This will
make it easier to remove the ExecutionGraph reference from the JobMaster which is needed for the scheduling rework [1]. [1] https://issues.apache.org/jira/browse/FLINK-12231 On Wed, Apr 24, 2019 at 12:14 PM Shuai Xu <[hidden email]> wrote: > Will we only remove command support in client side or the code in job > master will also be removed? > > Till Rohrmann <[hidden email]> 于2019年4月24日周三 下午4:12写道: > > > +1 for temporarily removing support for the modify command. > > > > Eventually, we have to add it again in order to support auto scaling. The > > next time we add it, we should address the known limitations. > > > > Cheers, > > Till > > > > On Wed, Apr 24, 2019 at 9:06 AM Paul Lam <[hidden email]> wrote: > > > > > Hi Gary, > > > > > > + 1 to remove it for now. Actually some users are not aware of that > it’s > > > still experimental, and ask quite a lot about the problem it causes. > > > > > > Best, > > > Paul Lam > > > > > > 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道: > > > > > > Sounds reasonable to me. If it is a broken feature, then there is not > > much > > > value in it. > > > > > > On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote: > > > > > > Hi all, > > > > > > As the subject states, I am proposing to temporarily remove support for > > > changing the parallelism of a job via the following syntax [1]: > > > > > > ./bin/flink modify [job-id] -p [new-parallelism] > > > > > > This is an experimental feature that we introduced with the first > rollout > > > of > > > FLIP-6 (Flink 1.5). However, this feature comes with a few caveats: > > > > > > * Rescaling does not work with HA enabled [2] > > > * New parallelism is not persisted, i.e., after a JobManager > restart, > > > the job > > > will be recovered with the initial parallelism > > > > > > Due to the above-mentioned issues, I believe that currently nobody uses > > > "modify -p" to rescale their jobs in production. Moreover, the > rescaling > > > feature stands in the way of our current efforts to rework Flink's > > > scheduling > > > [3]. I therefore propose to remove the rescaling code for the time > being. > > > Note > > > that it will still be possible to change the parallelism by taking a > > > savepoint > > > and restoring the job with a different parallelism [4]. > > > > > > Any comments and suggestions will be highly appreciated. > > > > > > Best, > > > Gary > > > > > > [1] > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html > > > [2] https://issues.apache.org/jira/browse/FLINK-8902 > > > [3] https://issues.apache.org/jira/browse/FLINK-10429 > > > [4] > > > > > > > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring > > > > > > > > > > > > |
Since there were no objections so far, I will proceed with removing the
code [1]. [1] https://issues.apache.org/jira/browse/FLINK-12312 On Wed, Apr 24, 2019 at 1:38 PM Gary Yao <[hidden email]> wrote: > The idea is to also remove the rescaling code in the JobMaster. This will > make > it easier to remove the ExecutionGraph reference from the JobMaster which > is > needed for the scheduling rework [1]. > > [1] https://issues.apache.org/jira/browse/FLINK-12231 > > On Wed, Apr 24, 2019 at 12:14 PM Shuai Xu <[hidden email]> wrote: > >> Will we only remove command support in client side or the code in job >> master will also be removed? >> >> Till Rohrmann <[hidden email]> 于2019年4月24日周三 下午4:12写道: >> >> > +1 for temporarily removing support for the modify command. >> > >> > Eventually, we have to add it again in order to support auto scaling. >> The >> > next time we add it, we should address the known limitations. >> > >> > Cheers, >> > Till >> > >> > On Wed, Apr 24, 2019 at 9:06 AM Paul Lam <[hidden email]> wrote: >> > >> > > Hi Gary, >> > > >> > > + 1 to remove it for now. Actually some users are not aware of that >> it’s >> > > still experimental, and ask quite a lot about the problem it causes. >> > > >> > > Best, >> > > Paul Lam >> > > >> > > 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道: >> > > >> > > Sounds reasonable to me. If it is a broken feature, then there is not >> > much >> > > value in it. >> > > >> > > On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote: >> > > >> > > Hi all, >> > > >> > > As the subject states, I am proposing to temporarily remove support >> for >> > > changing the parallelism of a job via the following syntax [1]: >> > > >> > > ./bin/flink modify [job-id] -p [new-parallelism] >> > > >> > > This is an experimental feature that we introduced with the first >> rollout >> > > of >> > > FLIP-6 (Flink 1.5). However, this feature comes with a few caveats: >> > > >> > > * Rescaling does not work with HA enabled [2] >> > > * New parallelism is not persisted, i.e., after a JobManager >> restart, >> > > the job >> > > will be recovered with the initial parallelism >> > > >> > > Due to the above-mentioned issues, I believe that currently nobody >> uses >> > > "modify -p" to rescale their jobs in production. Moreover, the >> rescaling >> > > feature stands in the way of our current efforts to rework Flink's >> > > scheduling >> > > [3]. I therefore propose to remove the rescaling code for the time >> being. >> > > Note >> > > that it will still be possible to change the parallelism by taking a >> > > savepoint >> > > and restoring the job with a different parallelism [4]. >> > > >> > > Any comments and suggestions will be highly appreciated. >> > > >> > > Best, >> > > Gary >> > > >> > > [1] >> > > >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html >> > > [2] https://issues.apache.org/jira/browse/FLINK-8902 >> > > [3] https://issues.apache.org/jira/browse/FLINK-10429 >> > > [4] >> > > >> > > >> > > >> > >> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring >> > > >> > > >> > > >> > >> > |
Free forum by Nabble | Edit this page |