Hi everyone,
I'd like to discuss changing the default restart delay for FixedDelay- and FailureRateRestartStrategy to "1 s" [1]. According to a user survey about the default value of the restart delay [2], it turned out that the current default value of "0 s" is not optimal. In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in order to prevent restart storms originating from overloaded external systems. I would like to set the default restart delay of the FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1 s". "1 s" should prevent restart storms originating from causes outside of Flink (e.g. overloaded external systems) and still be fast enough to not having a noticeable effect on most Flink deployments. However, this change will affect all users who currently rely on the current default restart delay value ("0 s"). The plan is to add a release note to make these users aware of this change when upgrading Flink. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s [2] https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E Cheers, Till |
I guess that most things have already been said on the related discussion
thread [1]. Hence, I'm going to start the vote. [1] https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E Cheers, Till On Tue, Sep 3, 2019 at 11:41 AM Till Rohrmann <[hidden email]> wrote: > Hi everyone, > > I'd like to discuss changing the default restart delay for FixedDelay- and > FailureRateRestartStrategy to "1 s" [1]. > > According to a user survey about the default value of the restart delay > [2], it turned out that the current default value of "0 s" is not optimal. > In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in > order to prevent restart storms originating from overloaded external > systems. > > I would like to set the default restart delay of the > FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the > FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1 > s". "1 s" should prevent restart storms originating from causes outside of > Flink (e.g. overloaded external systems) and still be fast enough to not > having a noticeable effect on most Flink deployments. > > However, this change will affect all users who currently rely on the > current default restart delay value ("0 s"). The plan is to add a release > note to make these users aware of this change when upgrading Flink. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s > [2] > https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E > > Cheers, > Till > |
In reply to this post by Till Rohrmann
The issue we seem to run into again and again is that we want to try to
find a value that provides a good experience when trying out Flink, but also somewhat usable for production users. We should look into solutions for this; maybe having a "recommended" value in the docs would help sufficiently, or even configuration profiles for Flink "dev"/"production" which influence the default values. On 03/09/2019 11:41, Till Rohrmann wrote: > Hi everyone, > > I'd like to discuss changing the default restart delay for FixedDelay- and > FailureRateRestartStrategy to "1 s" [1]. > > According to a user survey about the default value of the restart delay > [2], it turned out that the current default value of "0 s" is not optimal. > In practice Flink users tend to set it to a non-zero value (e.g. "10 s") in > order to prevent restart storms originating from overloaded external > systems. > > I would like to set the default restart delay of the > FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of the > FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1 > s". "1 s" should prevent restart storms originating from causes outside of > Flink (e.g. overloaded external systems) and still be fast enough to not > having a noticeable effect on most Flink deployments. > > However, this change will affect all users who currently rely on the > current default restart delay value ("0 s"). The plan is to add a release > note to make these users aware of this change when upgrading Flink. > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s > [2] > https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E > > Cheers, > Till > |
An improved documentation can definitely help. I think Arvid suggested
something like this in the linked SURVEY thread and said that Kafka does something similar. The idea of different profiles sounds also promising. I guess something like this deserves a dedicated effort and someone driving it. Cheers, Till On Wed, Sep 4, 2019 at 12:45 PM Chesnay Schepler <[hidden email]> wrote: > The issue we seem to run into again and again is that we want to try to > find a value that provides a good experience when trying out Flink, but > also somewhat usable for production users. > We should look into solutions for this; maybe having a "recommended" > value in the docs would help sufficiently, or even configuration > profiles for Flink "dev"/"production" which influence the default values. > > On 03/09/2019 11:41, Till Rohrmann wrote: > > Hi everyone, > > > > I'd like to discuss changing the default restart delay for FixedDelay- > and > > FailureRateRestartStrategy to "1 s" [1]. > > > > According to a user survey about the default value of the restart delay > > [2], it turned out that the current default value of "0 s" is not > optimal. > > In practice Flink users tend to set it to a non-zero value (e.g. "10 s") > in > > order to prevent restart storms originating from overloaded external > > systems. > > > > I would like to set the default restart delay of the > > FixedDelayRestartStrategy ("restart-strategy.fixed-delay.delay") and of > the > > FailureRateRestartStrategy ("restart-strategy.failure-rate.delay") to "1 > > s". "1 s" should prevent restart storms originating from causes outside > of > > Flink (e.g. overloaded external systems) and still be fast enough to not > > having a noticeable effect on most Flink deployments. > > > > However, this change will affect all users who currently rely on the > > current default restart delay value ("0 s"). The plan is to add a release > > note to make these users aware of this change when upgrading Flink. > > > > [1] > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s > > [2] > > > https://lists.apache.org/thread.html/107b15de6b8ac849610d99c4754715d2a8a2f32ddfe9f8da02af2ccc@%3Cdev.flink.apache.org%3E > > > > Cheers, > > Till > > > > |
Free forum by Nabble | Edit this page |