[DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

Gary Yao-3
Hi all,

As the subject states, I am proposing to temporarily remove support for
changing the parallelism of a job via the following syntax [1]:

    ./bin/flink modify [job-id] -p [new-parallelism]

This is an experimental feature that we introduced with the first rollout of
FLIP-6 (Flink 1.5). However, this feature comes with a few caveats:

    * Rescaling does not work with HA enabled [2]
    * New parallelism is not persisted, i.e., after a JobManager restart,
the job
      will be recovered with the initial parallelism

Due to the above-mentioned issues, I believe that currently nobody uses
"modify -p" to rescale their jobs in production. Moreover, the rescaling
feature stands in the way of our current efforts to rework Flink's
scheduling
[3]. I therefore propose to remove the rescaling code for the time being.
Note
that it will still be possible to change the parallelism by taking a
savepoint
and restoring the job with a different parallelism [4].

Any comments and suggestions will be highly appreciated.

Best,
Gary

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html
[2] https://issues.apache.org/jira/browse/FLINK-8902
[3] https://issues.apache.org/jira/browse/FLINK-10429
[4]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

Stephan Ewen
Sounds reasonable to me. If it is a broken feature, then there is not much
value in it.

On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote:

> Hi all,
>
> As the subject states, I am proposing to temporarily remove support for
> changing the parallelism of a job via the following syntax [1]:
>
>     ./bin/flink modify [job-id] -p [new-parallelism]
>
> This is an experimental feature that we introduced with the first rollout
> of
> FLIP-6 (Flink 1.5). However, this feature comes with a few caveats:
>
>     * Rescaling does not work with HA enabled [2]
>     * New parallelism is not persisted, i.e., after a JobManager restart,
> the job
>       will be recovered with the initial parallelism
>
> Due to the above-mentioned issues, I believe that currently nobody uses
> "modify -p" to rescale their jobs in production. Moreover, the rescaling
> feature stands in the way of our current efforts to rework Flink's
> scheduling
> [3]. I therefore propose to remove the rescaling code for the time being.
> Note
> that it will still be possible to change the parallelism by taking a
> savepoint
> and restoring the job with a different parallelism [4].
>
> Any comments and suggestions will be highly appreciated.
>
> Best,
> Gary
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html
> [2] https://issues.apache.org/jira/browse/FLINK-8902
> [3] https://issues.apache.org/jira/browse/FLINK-10429
> [4]
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

PaulLam
Hi Gary,

+ 1 to remove it for now. Actually some users are not aware of that it’s still experimental, and ask quite a lot about the problem it causes.

Best,
Paul Lam

> 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道:
>
> Sounds reasonable to me. If it is a broken feature, then there is not much
> value in it.
>
> On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote:
>
>> Hi all,
>>
>> As the subject states, I am proposing to temporarily remove support for
>> changing the parallelism of a job via the following syntax [1]:
>>
>>    ./bin/flink modify [job-id] -p [new-parallelism]
>>
>> This is an experimental feature that we introduced with the first rollout
>> of
>> FLIP-6 (Flink 1.5). However, this feature comes with a few caveats:
>>
>>    * Rescaling does not work with HA enabled [2]
>>    * New parallelism is not persisted, i.e., after a JobManager restart,
>> the job
>>      will be recovered with the initial parallelism
>>
>> Due to the above-mentioned issues, I believe that currently nobody uses
>> "modify -p" to rescale their jobs in production. Moreover, the rescaling
>> feature stands in the way of our current efforts to rework Flink's
>> scheduling
>> [3]. I therefore propose to remove the rescaling code for the time being.
>> Note
>> that it will still be possible to change the parallelism by taking a
>> savepoint
>> and restoring the job with a different parallelism [4].
>>
>> Any comments and suggestions will be highly appreciated.
>>
>> Best,
>> Gary
>>
>> [1]
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html
>> [2] https://issues.apache.org/jira/browse/FLINK-8902
>> [3] https://issues.apache.org/jira/browse/FLINK-10429
>> [4]
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring
>>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

Till Rohrmann
+1 for temporarily removing support for the modify command.

Eventually, we have to add it again in order to support auto scaling. The
next time we add it, we should address the known limitations.

Cheers,
Till

On Wed, Apr 24, 2019 at 9:06 AM Paul Lam <[hidden email]> wrote:

> Hi Gary,
>
> + 1 to remove it for now. Actually some users are not aware of that it’s
> still experimental, and ask quite a lot about the problem it causes.
>
> Best,
> Paul Lam
>
> 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道:
>
> Sounds reasonable to me. If it is a broken feature, then there is not much
> value in it.
>
> On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote:
>
> Hi all,
>
> As the subject states, I am proposing to temporarily remove support for
> changing the parallelism of a job via the following syntax [1]:
>
>    ./bin/flink modify [job-id] -p [new-parallelism]
>
> This is an experimental feature that we introduced with the first rollout
> of
> FLIP-6 (Flink 1.5). However, this feature comes with a few caveats:
>
>    * Rescaling does not work with HA enabled [2]
>    * New parallelism is not persisted, i.e., after a JobManager restart,
> the job
>      will be recovered with the initial parallelism
>
> Due to the above-mentioned issues, I believe that currently nobody uses
> "modify -p" to rescale their jobs in production. Moreover, the rescaling
> feature stands in the way of our current efforts to rework Flink's
> scheduling
> [3]. I therefore propose to remove the rescaling code for the time being.
> Note
> that it will still be possible to change the parallelism by taking a
> savepoint
> and restoring the job with a different parallelism [4].
>
> Any comments and suggestions will be highly appreciated.
>
> Best,
> Gary
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html
> [2] https://issues.apache.org/jira/browse/FLINK-8902
> [3] https://issues.apache.org/jira/browse/FLINK-10429
> [4]
>
>
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

Shuai Xu
Will we only remove command support in client side or the code in job
master will also be removed?

Till Rohrmann <[hidden email]> 于2019年4月24日周三 下午4:12写道:

> +1 for temporarily removing support for the modify command.
>
> Eventually, we have to add it again in order to support auto scaling. The
> next time we add it, we should address the known limitations.
>
> Cheers,
> Till
>
> On Wed, Apr 24, 2019 at 9:06 AM Paul Lam <[hidden email]> wrote:
>
> > Hi Gary,
> >
> > + 1 to remove it for now. Actually some users are not aware of that it’s
> > still experimental, and ask quite a lot about the problem it causes.
> >
> > Best,
> > Paul Lam
> >
> > 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道:
> >
> > Sounds reasonable to me. If it is a broken feature, then there is not
> much
> > value in it.
> >
> > On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote:
> >
> > Hi all,
> >
> > As the subject states, I am proposing to temporarily remove support for
> > changing the parallelism of a job via the following syntax [1]:
> >
> >    ./bin/flink modify [job-id] -p [new-parallelism]
> >
> > This is an experimental feature that we introduced with the first rollout
> > of
> > FLIP-6 (Flink 1.5). However, this feature comes with a few caveats:
> >
> >    * Rescaling does not work with HA enabled [2]
> >    * New parallelism is not persisted, i.e., after a JobManager restart,
> > the job
> >      will be recovered with the initial parallelism
> >
> > Due to the above-mentioned issues, I believe that currently nobody uses
> > "modify -p" to rescale their jobs in production. Moreover, the rescaling
> > feature stands in the way of our current efforts to rework Flink's
> > scheduling
> > [3]. I therefore propose to remove the rescaling code for the time being.
> > Note
> > that it will still be possible to change the parallelism by taking a
> > savepoint
> > and restoring the job with a different parallelism [4].
> >
> > Any comments and suggestions will be highly appreciated.
> >
> > Best,
> > Gary
> >
> > [1]
> > https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html
> > [2] https://issues.apache.org/jira/browse/FLINK-8902
> > [3] https://issues.apache.org/jira/browse/FLINK-10429
> > [4]
> >
> >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

Gary Yao-3
The idea is to also remove the rescaling code in the JobMaster. This will
make
it easier to remove the ExecutionGraph reference from the JobMaster which is
needed for the scheduling rework [1].

[1] https://issues.apache.org/jira/browse/FLINK-12231

On Wed, Apr 24, 2019 at 12:14 PM Shuai Xu <[hidden email]> wrote:

> Will we only remove command support in client side or the code in job
> master will also be removed?
>
> Till Rohrmann <[hidden email]> 于2019年4月24日周三 下午4:12写道:
>
> > +1 for temporarily removing support for the modify command.
> >
> > Eventually, we have to add it again in order to support auto scaling. The
> > next time we add it, we should address the known limitations.
> >
> > Cheers,
> > Till
> >
> > On Wed, Apr 24, 2019 at 9:06 AM Paul Lam <[hidden email]> wrote:
> >
> > > Hi Gary,
> > >
> > > + 1 to remove it for now. Actually some users are not aware of that
> it’s
> > > still experimental, and ask quite a lot about the problem it causes.
> > >
> > > Best,
> > > Paul Lam
> > >
> > > 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道:
> > >
> > > Sounds reasonable to me. If it is a broken feature, then there is not
> > much
> > > value in it.
> > >
> > > On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote:
> > >
> > > Hi all,
> > >
> > > As the subject states, I am proposing to temporarily remove support for
> > > changing the parallelism of a job via the following syntax [1]:
> > >
> > >    ./bin/flink modify [job-id] -p [new-parallelism]
> > >
> > > This is an experimental feature that we introduced with the first
> rollout
> > > of
> > > FLIP-6 (Flink 1.5). However, this feature comes with a few caveats:
> > >
> > >    * Rescaling does not work with HA enabled [2]
> > >    * New parallelism is not persisted, i.e., after a JobManager
> restart,
> > > the job
> > >      will be recovered with the initial parallelism
> > >
> > > Due to the above-mentioned issues, I believe that currently nobody uses
> > > "modify -p" to rescale their jobs in production. Moreover, the
> rescaling
> > > feature stands in the way of our current efforts to rework Flink's
> > > scheduling
> > > [3]. I therefore propose to remove the rescaling code for the time
> being.
> > > Note
> > > that it will still be possible to change the parallelism by taking a
> > > savepoint
> > > and restoring the job with a different parallelism [4].
> > >
> > > Any comments and suggestions will be highly appreciated.
> > >
> > > Best,
> > > Gary
> > >
> > > [1]
> > >
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html
> > > [2] https://issues.apache.org/jira/browse/FLINK-8902
> > > [3] https://issues.apache.org/jira/browse/FLINK-10429
> > > [4]
> > >
> > >
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring
> > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Temporarily remove support for job rescaling via CLI action "modify"

Gary Yao-3
Since there were no objections so far, I will proceed with removing the
code [1].

[1] https://issues.apache.org/jira/browse/FLINK-12312

On Wed, Apr 24, 2019 at 1:38 PM Gary Yao <[hidden email]> wrote:

> The idea is to also remove the rescaling code in the JobMaster. This will
> make
> it easier to remove the ExecutionGraph reference from the JobMaster which
> is
> needed for the scheduling rework [1].
>
> [1] https://issues.apache.org/jira/browse/FLINK-12231
>
> On Wed, Apr 24, 2019 at 12:14 PM Shuai Xu <[hidden email]> wrote:
>
>> Will we only remove command support in client side or the code in job
>> master will also be removed?
>>
>> Till Rohrmann <[hidden email]> 于2019年4月24日周三 下午4:12写道:
>>
>> > +1 for temporarily removing support for the modify command.
>> >
>> > Eventually, we have to add it again in order to support auto scaling.
>> The
>> > next time we add it, we should address the known limitations.
>> >
>> > Cheers,
>> > Till
>> >
>> > On Wed, Apr 24, 2019 at 9:06 AM Paul Lam <[hidden email]> wrote:
>> >
>> > > Hi Gary,
>> > >
>> > > + 1 to remove it for now. Actually some users are not aware of that
>> it’s
>> > > still experimental, and ask quite a lot about the problem it causes.
>> > >
>> > > Best,
>> > > Paul Lam
>> > >
>> > > 在 2019年4月24日,14:49,Stephan Ewen <[hidden email]> 写道:
>> > >
>> > > Sounds reasonable to me. If it is a broken feature, then there is not
>> > much
>> > > value in it.
>> > >
>> > > On Tue, Apr 23, 2019 at 7:50 PM Gary Yao <[hidden email]> wrote:
>> > >
>> > > Hi all,
>> > >
>> > > As the subject states, I am proposing to temporarily remove support
>> for
>> > > changing the parallelism of a job via the following syntax [1]:
>> > >
>> > >    ./bin/flink modify [job-id] -p [new-parallelism]
>> > >
>> > > This is an experimental feature that we introduced with the first
>> rollout
>> > > of
>> > > FLIP-6 (Flink 1.5). However, this feature comes with a few caveats:
>> > >
>> > >    * Rescaling does not work with HA enabled [2]
>> > >    * New parallelism is not persisted, i.e., after a JobManager
>> restart,
>> > > the job
>> > >      will be recovered with the initial parallelism
>> > >
>> > > Due to the above-mentioned issues, I believe that currently nobody
>> uses
>> > > "modify -p" to rescale their jobs in production. Moreover, the
>> rescaling
>> > > feature stands in the way of our current efforts to rework Flink's
>> > > scheduling
>> > > [3]. I therefore propose to remove the rescaling code for the time
>> being.
>> > > Note
>> > > that it will still be possible to change the parallelism by taking a
>> > > savepoint
>> > > and restoring the job with a different parallelism [4].
>> > >
>> > > Any comments and suggestions will be highly appreciated.
>> > >
>> > > Best,
>> > > Gary
>> > >
>> > > [1]
>> > >
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/cli.html
>> > > [2] https://issues.apache.org/jira/browse/FLINK-8902
>> > > [3] https://issues.apache.org/jira/browse/FLINK-10429
>> > > [4]
>> > >
>> > >
>> > >
>> >
>> https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/state/savepoints.html#what-happens-when-i-change-the-parallelism-of-my-program-when-restoring
>> > >
>> > >
>> > >
>> >
>>
>