[DISCUSS] FLIP-62: Set default restart delay for FixedDelay- and FailureRateRestartStrategy to 1s

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] FLIP-62: Set default restart delay for FixedDelay- and FailureRateRestartStrategy to 1s

Till Rohrmann
Hi everyone,

I'd like to discuss changing the default restart delay for FixedDelay- and
FailureRateRestartStrategy to "1 s" [1].

According to a user survey about the default value of the restart delay
[2], it turned out that the current default value of "0 s" is not optimal.
In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in
order to prevent restart storms originating from overloaded external
systems.

I would like to set the default restart delay of the
FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the
FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1
s". "1 s" should prevent restart storms originating from causes outside of
Flink (e.g. overloaded external systems) and still be fast enough to not
having a noticeable effect on most Flink deployments.

However, this change will affect all users who currently rely on the
current default restart delay value ("0 s"). The plan is to add a release
note to make these users aware of this change when upgrading Flink.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s
[2]
https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E

Cheers,
Till
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-62: Set default restart delay for FixedDelay- and FailureRateRestartStrategy to 1s

Till Rohrmann
I guess that most things have already been said on the related discussion
thread [1]. Hence, I'm going to start the vote.

[1]
https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E

Cheers,
Till

On Tue, Sep 3, 2019 at 11:41 AM Till Rohrmann <[hidden email]> wrote:

> Hi everyone,
>
> I'd like to discuss changing the default restart delay for FixedDelay- and
> FailureRateRestartStrategy to "1 s" [1].
>
> According to a user survey about the default value of the restart delay
> [2], it turned out that the current default value of "0 s" is not optimal.
> In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in
> order to prevent restart storms originating from overloaded external
> systems.
>
> I would like to set the default restart delay of the
> FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the
> FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1
> s". "1 s" should prevent restart storms originating from causes outside of
> Flink (e.g. overloaded external systems) and still be fast enough to not
> having a noticeable effect on most Flink deployments.
>
> However, this change will affect all users who currently rely on the
> current default restart delay value ("0 s"). The plan is to add a release
> note to make these users aware of this change when upgrading Flink.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s
> [2]
> https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E
>
> Cheers,
> Till
>
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-62: Set default restart delay for FixedDelay- and FailureRateRestartStrategy to 1s

Chesnay Schepler-3
In reply to this post by Till Rohrmann
The issue we seem to run into again and again is that we want to try to
find a value that provides a good experience when trying out Flink, but
also somewhat usable for production users.
We should look into solutions for this; maybe having a "recommended"
value in the docs would help sufficiently, or even configuration
profiles for Flink "dev"/"production" which influence the default values.

On 03/09/2019 11:41, Till Rohrmann wrote:

> Hi everyone,
>
> I'd like to discuss changing the default restart delay for FixedDelay- and
> FailureRateRestartStrategy to "1 s" [1].
>
> According to a user survey about the default value of the restart delay
> [2], it turned out that the current default value of "0 s" is not optimal.
> In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in
> order to prevent restart storms originating from overloaded external
> systems.
>
> I would like to set the default restart delay of the
> FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the
> FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1
> s". "1 s" should prevent restart storms originating from causes outside of
> Flink (e.g. overloaded external systems) and still be fast enough to not
> having a noticeable effect on most Flink deployments.
>
> However, this change will affect all users who currently rely on the
> current default restart delay value ("0 s"). The plan is to add a release
> note to make these users aware of this change when upgrading Flink.
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s
> [2]
> https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E
>
> Cheers,
> Till
>

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] FLIP-62: Set default restart delay for FixedDelay- and FailureRateRestartStrategy to 1s

Till Rohrmann
An improved documentation can definitely help. I think Arvid suggested
something like this in the linked SURVEY thread and said that Kafka does
something similar.

The idea of different profiles sounds also promising.

I guess something like this deserves a dedicated effort and someone driving
it.

Cheers,
Till

On Wed, Sep 4, 2019 at 12:45 PM Chesnay Schepler <[hidden email]> wrote:

> The issue we seem to run into again and again is that we want to try to
> find a value that provides a good experience when trying out Flink, but
> also somewhat usable for production users.
> We should look into solutions for this; maybe having a "recommended"
> value in the docs would help sufficiently, or even configuration
> profiles for Flink "dev"/"production" which influence the default values.
>
> On 03/09/2019 11:41, Till Rohrmann wrote:
> > Hi everyone,
> >
> > I'd like to discuss changing the default restart delay for FixedDelay-
> and
> > FailureRateRestartStrategy to "1 s" [1].
> >
> > According to a user survey about the default value of the restart delay
> > [2], it turned out that the current default value of "0 s" is not
> optimal.
> > In practice Flink users tend to set it to a non-zero value (e.g. "10 s")
> in
> > order to prevent restart storms originating from overloaded external
> > systems.
> >
> > I would like to set the default restart delay of the
> > FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of
> the
> > FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1
> > s". "1 s" should prevent restart storms originating from causes outside
> of
> > Flink (e.g. overloaded external systems) and still be fast enough to not
> > having a noticeable effect on most Flink deployments.
> >
> > However, this change will affect all users who currently rely on the
> > current default restart delay value ("0 s"). The plan is to add a release
> > note to make these users aware of this change when upgrading Flink.
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s
> > [2]
> >
> https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E
> >
> > Cheers,
> > Till
> >
>
>