Flaky tests

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Flaky tests

Stanislav Kozlovski
Hey there Flink community,

I work on a fellow open-source project - Apache Kafka - and there we have been fighting flaky tests a lot. We run Java 8 and Java 11 builds on every Pull Request and due to test flakiness, almost all of them turn out red with 1 or 2 tests (completely unrelated to the change in the PR) failing. This has resulted in committers either ignoring them and merging the changes or in the worst case rerunning the hour-long build until it becomes green.
This test flakiness has also slowed down our releases significantly.

In general, I was just curious to understand if this is a problem that your project faces as well. Does your project have a lot of intermittently failing tests, do you have any active process of addressing such tests (during the initial review, after realizing it is flaky, etc). Any pointers will be greatly appreciated!

Thanks,
Stanislav

Reply | Threaded
Open this post in threaded view
|

Re: Flaky tests

Chesnay Schepler-3
We've been in the same position a while back with the same effects. We
solved it by creating JIRAs for every failing test and cracking down
hard on them; I don't think there's any other way to address this.
However to truly solve this one must look at the original cause to
prevent new flaky tests from being added.
 From what I remember, many of our tests were flaky because they relied
on timings (e.g. lets Thread.sleep for X and assume Y has happened) or
had similar race-conditions, and committers nowadays are rather
observant for these issues.

By now the majority of our builds succeed.
We don't to anything like running the builds multiple times before a
merge. I know some committers always run a PR at least once against
master, but this certainly doesn't apply to everyone.
There are still tests that fail from time-to-time, but my impressions is
that people still check which tests are failing to ensure they are
unrelated, and track them regardless.

On 26.02.2019 17:28, Stanislav Kozlovski wrote:

> Hey there Flink community,
>
> I work on a fellow open-source project - Apache Kafka - and there we have been fighting flaky tests a lot. We run Java 8 and Java 11 builds on every Pull Request and due to test flakiness, almost all of them turn out red with 1 or 2 tests (completely unrelated to the change in the PR) failing. This has resulted in committers either ignoring them and merging the changes or in the worst case rerunning the hour-long build until it becomes green.
> This test flakiness has also slowed down our releases significantly.
>
> In general, I was just curious to understand if this is a problem that your project faces as well. Does your project have a lot of intermittently failing tests, do you have any active process of addressing such tests (during the initial review, after realizing it is flaky, etc). Any pointers will be greatly appreciated!
>
> Thanks,
> Stanislav
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Flaky tests

Aljoscha Krettek-2
I agree with Chesnay, and I would like to add that the most important step towards fixing flakiness is awareness and willingness. As soon as you accept flakiness and start working around it (as you mentioned) more flakiness will creep in, making it harder to get rid of it in the future.

Aljoscha

> On 27. Feb 2019, at 12:04, Chesnay Schepler <[hidden email]> wrote:
>
> We've been in the same position a while back with the same effects. We solved it by creating JIRAs for every failing test and cracking down hard on them; I don't think there's any other way to address this.
> However to truly solve this one must look at the original cause to prevent new flaky tests from being added.
> From what I remember, many of our tests were flaky because they relied on timings (e.g. lets Thread.sleep for X and assume Y has happened) or had similar race-conditions, and committers nowadays are rather observant for these issues.
>
> By now the majority of our builds succeed.
> We don't to anything like running the builds multiple times before a merge. I know some committers always run a PR at least once against master, but this certainly doesn't apply to everyone.
> There are still tests that fail from time-to-time, but my impressions is that people still check which tests are failing to ensure they are unrelated, and track them regardless.
>
> On 26.02.2019 17:28, Stanislav Kozlovski wrote:
>> Hey there Flink community,
>>
>> I work on a fellow open-source project - Apache Kafka - and there we have been fighting flaky tests a lot. We run Java 8 and Java 11 builds on every Pull Request and due to test flakiness, almost all of them turn out red with 1 or 2 tests (completely unrelated to the change in the PR) failing. This has resulted in committers either ignoring them and merging the changes or in the worst case rerunning the hour-long build until it becomes green.
>> This test flakiness has also slowed down our releases significantly.
>>
>> In general, I was just curious to understand if this is a problem that your project faces as well. Does your project have a lot of intermittently failing tests, do you have any active process of addressing such tests (during the initial review, after realizing it is flaky, etc). Any pointers will be greatly appreciated!
>>
>> Thanks,
>> Stanislav
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Flaky tests

Ufuk Celebi-4
I fully agree with Aljoscha and Chesnay (although my recent PR
experience was still close to what Stanislav describes).

@Robert: Do we have standard labels that we apply to tickets that
report a flaky test? I think this would be helpful to make sure that
we have a good overview of the state of flaky tests.

Best,

Ufuk

On Wed, Feb 27, 2019 at 3:04 PM Aljoscha Krettek <[hidden email]> wrote:

>
> I agree with Chesnay, and I would like to add that the most important step towards fixing flakiness is awareness and willingness. As soon as you accept flakiness and start working around it (as you mentioned) more flakiness will creep in, making it harder to get rid of it in the future.
>
> Aljoscha
>
> > On 27. Feb 2019, at 12:04, Chesnay Schepler <[hidden email]> wrote:
> >
> > We've been in the same position a while back with the same effects. We solved it by creating JIRAs for every failing test and cracking down hard on them; I don't think there's any other way to address this.
> > However to truly solve this one must look at the original cause to prevent new flaky tests from being added.
> > From what I remember, many of our tests were flaky because they relied on timings (e.g. lets Thread.sleep for X and assume Y has happened) or had similar race-conditions, and committers nowadays are rather observant for these issues.
> >
> > By now the majority of our builds succeed.
> > We don't to anything like running the builds multiple times before a merge. I know some committers always run a PR at least once against master, but this certainly doesn't apply to everyone.
> > There are still tests that fail from time-to-time, but my impressions is that people still check which tests are failing to ensure they are unrelated, and track them regardless.
> >
> > On 26.02.2019 17:28, Stanislav Kozlovski wrote:
> >> Hey there Flink community,
> >>
> >> I work on a fellow open-source project - Apache Kafka - and there we have been fighting flaky tests a lot. We run Java 8 and Java 11 builds on every Pull Request and due to test flakiness, almost all of them turn out red with 1 or 2 tests (completely unrelated to the change in the PR) failing. This has resulted in committers either ignoring them and merging the changes or in the worst case rerunning the hour-long build until it becomes green.
> >> This test flakiness has also slowed down our releases significantly.
> >>
> >> In general, I was just curious to understand if this is a problem that your project faces as well. Does your project have a lot of intermittently failing tests, do you have any active process of addressing such tests (during the initial review, after realizing it is flaky, etc). Any pointers will be greatly appreciated!
> >>
> >> Thanks,
> >> Stanislav
> >>
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Flaky tests

Till Rohrmann
@Ufuk my understanding, though never written down, was to mark test
stability issues as critical and adding the test-stability label. Maybe we
should state this somewhere more explicitly.

On Thu, Feb 28, 2019 at 1:59 PM Ufuk Celebi <[hidden email]> wrote:

> I fully agree with Aljoscha and Chesnay (although my recent PR
> experience was still close to what Stanislav describes).
>
> @Robert: Do we have standard labels that we apply to tickets that
> report a flaky test? I think this would be helpful to make sure that
> we have a good overview of the state of flaky tests.
>
> Best,
>
> Ufuk
>
> On Wed, Feb 27, 2019 at 3:04 PM Aljoscha Krettek <[hidden email]>
> wrote:
> >
> > I agree with Chesnay, and I would like to add that the most important
> step towards fixing flakiness is awareness and willingness. As soon as you
> accept flakiness and start working around it (as you mentioned) more
> flakiness will creep in, making it harder to get rid of it in the future.
> >
> > Aljoscha
> >
> > > On 27. Feb 2019, at 12:04, Chesnay Schepler <[hidden email]>
> wrote:
> > >
> > > We've been in the same position a while back with the same effects. We
> solved it by creating JIRAs for every failing test and cracking down hard
> on them; I don't think there's any other way to address this.
> > > However to truly solve this one must look at the original cause to
> prevent new flaky tests from being added.
> > > From what I remember, many of our tests were flaky because they relied
> on timings (e.g. lets Thread.sleep for X and assume Y has happened) or had
> similar race-conditions, and committers nowadays are rather observant for
> these issues.
> > >
> > > By now the majority of our builds succeed.
> > > We don't to anything like running the builds multiple times before a
> merge. I know some committers always run a PR at least once against master,
> but this certainly doesn't apply to everyone.
> > > There are still tests that fail from time-to-time, but my impressions
> is that people still check which tests are failing to ensure they are
> unrelated, and track them regardless.
> > >
> > > On 26.02.2019 17:28, Stanislav Kozlovski wrote:
> > >> Hey there Flink community,
> > >>
> > >> I work on a fellow open-source project - Apache Kafka - and there we
> have been fighting flaky tests a lot. We run Java 8 and Java 11 builds on
> every Pull Request and due to test flakiness, almost all of them turn out
> red with 1 or 2 tests (completely unrelated to the change in the PR)
> failing. This has resulted in committers either ignoring them and merging
> the changes or in the worst case rerunning the hour-long build until it
> becomes green.
> > >> This test flakiness has also slowed down our releases significantly.
> > >>
> > >> In general, I was just curious to understand if this is a problem
> that your project faces as well. Does your project have a lot of
> intermittently failing tests, do you have any active process of addressing
> such tests (during the initial review, after realizing it is flaky, etc).
> Any pointers will be greatly appreciated!
> > >>
> > >> Thanks,
> > >> Stanislav
> > >>
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Flaky tests

Ufuk Celebi-4
@Till, Robert: +1. That would be helpful.

On Thu, Feb 28, 2019 at 4:08 PM Till Rohrmann <[hidden email]> wrote:

>
> @Ufuk my understanding, though never written down, was to mark test
> stability issues as critical and adding the test-stability label. Maybe we
> should state this somewhere more explicitly.
>
> On Thu, Feb 28, 2019 at 1:59 PM Ufuk Celebi <[hidden email]> wrote:
>
> > I fully agree with Aljoscha and Chesnay (although my recent PR
> > experience was still close to what Stanislav describes).
> >
> > @Robert: Do we have standard labels that we apply to tickets that
> > report a flaky test? I think this would be helpful to make sure that
> > we have a good overview of the state of flaky tests.
> >
> > Best,
> >
> > Ufuk
> >
> > On Wed, Feb 27, 2019 at 3:04 PM Aljoscha Krettek <[hidden email]>
> > wrote:
> > >
> > > I agree with Chesnay, and I would like to add that the most important
> > step towards fixing flakiness is awareness and willingness. As soon as you
> > accept flakiness and start working around it (as you mentioned) more
> > flakiness will creep in, making it harder to get rid of it in the future.
> > >
> > > Aljoscha
> > >
> > > > On 27. Feb 2019, at 12:04, Chesnay Schepler <[hidden email]>
> > wrote:
> > > >
> > > > We've been in the same position a while back with the same effects. We
> > solved it by creating JIRAs for every failing test and cracking down hard
> > on them; I don't think there's any other way to address this.
> > > > However to truly solve this one must look at the original cause to
> > prevent new flaky tests from being added.
> > > > From what I remember, many of our tests were flaky because they relied
> > on timings (e.g. lets Thread.sleep for X and assume Y has happened) or had
> > similar race-conditions, and committers nowadays are rather observant for
> > these issues.
> > > >
> > > > By now the majority of our builds succeed.
> > > > We don't to anything like running the builds multiple times before a
> > merge. I know some committers always run a PR at least once against master,
> > but this certainly doesn't apply to everyone.
> > > > There are still tests that fail from time-to-time, but my impressions
> > is that people still check which tests are failing to ensure they are
> > unrelated, and track them regardless.
> > > >
> > > > On 26.02.2019 17:28, Stanislav Kozlovski wrote:
> > > >> Hey there Flink community,
> > > >>
> > > >> I work on a fellow open-source project - Apache Kafka - and there we
> > have been fighting flaky tests a lot. We run Java 8 and Java 11 builds on
> > every Pull Request and due to test flakiness, almost all of them turn out
> > red with 1 or 2 tests (completely unrelated to the change in the PR)
> > failing. This has resulted in committers either ignoring them and merging
> > the changes or in the worst case rerunning the hour-long build until it
> > becomes green.
> > > >> This test flakiness has also slowed down our releases significantly.
> > > >>
> > > >> In general, I was just curious to understand if this is a problem
> > that your project faces as well. Does your project have a lot of
> > intermittently failing tests, do you have any active process of addressing
> > such tests (during the initial review, after realizing it is flaky, etc).
> > Any pointers will be greatly appreciated!
> > > >>
> > > >> Thanks,
> > > >> Stanislav
> > > >>
> > > >>
> > > >
> > >
> >