Something wrong with travis?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Something wrong with travis?

Kurt Young
Hi dev,

I noticed that all the travis tests triggered by pull request are failed
with the same error:

"Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
Exiting build."

Anyone have a clue on what happened and how to fix this?

Best,
Kurt
Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Chesnay Schepler-3
This is (hopefully a short-lived) hiccup on the Travis caching
infrastructure.

There's nothing we can do to _fix_ it; if it persists we'll have to
rework our travis setup again to not rely on caching.

On 18/06/2019 08:34, Kurt Young wrote:

> Hi dev,
>
> I noticed that all the travis tests triggered by pull request are failed
> with the same error:
>
> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> Exiting build."
>
> Anyone have a clue on what happened and how to fix this?
>
> Best,
> Kurt
>

Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Jeff Zhang
If it is travis caching issue, we can file apache infra ticket and ask them
to clean the cache.



Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:18写道:

> This is (hopefully a short-lived) hiccup on the Travis caching
> infrastructure.
>
> There's nothing we can do to _fix_ it; if it persists we'll have to
> rework our travis setup again to not rely on caching.
>
> On 18/06/2019 08:34, Kurt Young wrote:
> > Hi dev,
> >
> > I noticed that all the travis tests triggered by pull request are failed
> > with the same error:
> >
> > "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> > Exiting build."
> >
> > Anyone have a clue on what happened and how to fix this?
> >
> > Best,
> > Kurt
> >
>
>

--
Best Regards

Jeff Zhang
Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Biao Liu
It has been crashed for more than 14 hours. Hope it recovers soon.

Jeff Zhang <[hidden email]> 于2019年6月18日周二 下午3:21写道:

> If it is travis caching issue, we can file apache infra ticket and ask them
> to clean the cache.
>
>
>
> Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:18写道:
>
> > This is (hopefully a short-lived) hiccup on the Travis caching
> > infrastructure.
> >
> > There's nothing we can do to _fix_ it; if it persists we'll have to
> > rework our travis setup again to not rely on caching.
> >
> > On 18/06/2019 08:34, Kurt Young wrote:
> > > Hi dev,
> > >
> > > I noticed that all the travis tests triggered by pull request are
> failed
> > > with the same error:
> > >
> > > "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> > > Exiting build."
> > >
> > > Anyone have a clue on what happened and how to fix this?
> > >
> > > Best,
> > > Kurt
> > >
> >
> >
>
> --
> Best Regards
>
> Jeff Zhang
>
Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Chesnay Schepler-3
In reply to this post by Jeff Zhang
The problem is not that bad stuff is in the cache (which is the only
thing a cache cleaning solves), it is that the test stages don't
download the correct one.

Our compile stage uploads stuff in to the cache, and the subsequent test
builds downloads it again.

Whether the upload from the compile phase is visible to the test phase
is basically a timing thing; it depends on the visibility guarantee that
the backing infrastructure provides. So far it _usually_ worked, but
these are naturally things that may change over time.

On 18/06/2019 09:20, Jeff Zhang wrote:

> If it is travis caching issue, we can file apache infra ticket and ask them
> to clean the cache.
>
>
>
> Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:18写道:
>
>> This is (hopefully a short-lived) hiccup on the Travis caching
>> infrastructure.
>>
>> There's nothing we can do to _fix_ it; if it persists we'll have to
>> rework our travis setup again to not rely on caching.
>>
>> On 18/06/2019 08:34, Kurt Young wrote:
>>> Hi dev,
>>>
>>> I noticed that all the travis tests triggered by pull request are failed
>>> with the same error:
>>>
>>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
>>> Exiting build."
>>>
>>> Anyone have a clue on what happened and how to fix this?
>>>
>>> Best,
>>> Kurt
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

jincheng sun
I agree with the explanation from @Chesnay Schepler <[hidden email]>.  this
should be a problem with the Travis infrastructure because recently we have
not big changed the logic of Travis inside Flink.
At present, most of the failures are after the compile is completed. The
cache size is only 7.7M, which means that the JARs are not successfully
uploaded.

So here is a question:
 - Where can we check the cache storage to see if there is a problem with
the storage?

In order to try to find out some reason for the CI issue,  I do the follows
test:

 - I delete other test phases locally and test them - Test whether the
cache is uploaded normally during the compilation phase. See here
https://travis-ci.org/sunjincheng121/flink/builds/547155029
 - Increase Travis cache timeout to 1200 - Test the cache cannot be
downloaded due to cache is a timeout. (I think this test will have the same
result ) See here https://travis-ci.org/apache/flink/builds/547136163

Will feedback here after testing.

Best,
Jincheng

Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:53写道:

> The problem is not that bad stuff is in the cache (which is the only
> thing a cache cleaning solves), it is that the test stages don't
> download the correct one.
>
> Our compile stage uploads stuff in to the cache, and the subsequent test
> builds downloads it again.
>
> Whether the upload from the compile phase is visible to the test phase
> is basically a timing thing; it depends on the visibility guarantee that
> the backing infrastructure provides. So far it _usually_ worked, but
> these are naturally things that may change over time.
>
> On 18/06/2019 09:20, Jeff Zhang wrote:
> > If it is travis caching issue, we can file apache infra ticket and ask
> them
> > to clean the cache.
> >
> >
> >
> > Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:18写道:
> >
> >> This is (hopefully a short-lived) hiccup on the Travis caching
> >> infrastructure.
> >>
> >> There's nothing we can do to _fix_ it; if it persists we'll have to
> >> rework our travis setup again to not rely on caching.
> >>
> >> On 18/06/2019 08:34, Kurt Young wrote:
> >>> Hi dev,
> >>>
> >>> I noticed that all the travis tests triggered by pull request are
> failed
> >>> with the same error:
> >>>
> >>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> >>> Exiting build."
> >>>
> >>> Anyone have a clue on what happened and how to fix this?
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

jincheng sun
Test result:
 - The test for only compile state are succeeding (I deleted some old
caches) cache size 1146.26M. See here
https://travis-ci.org/sunjincheng121/flink/caches
- timeout to 1200 test fail, get the same error, but I think maybe the
storage problem, so I delete more old cache and restart the CI. See here
https://travis-ci.org/apache/flink/builds/547136163

So now it feels like the storage size of the cache is limited. If so we can
add some cleanup logic for the old cache (I am not sure,some validation is
needed)

Best
Jincheng

jincheng sun <[hidden email]> 于2019年6月18日周二 下午6:00写道:

> I agree with the explanation from @Chesnay Schepler <[hidden email]>.  this
> should be a problem with the Travis infrastructure because recently we have
> not big changed the logic of Travis inside Flink.
> At present, most of the failures are after the compile is completed. The
> cache size is only 7.7M, which means that the JARs are not successfully
> uploaded.
>
> So here is a question:
>  - Where can we check the cache storage to see if there is a problem with
> the storage?
>
> In order to try to find out some reason for the CI issue,  I do the
> follows test:
>
>  - I delete other test phases locally and test them - Test whether the
> cache is uploaded normally during the compilation phase. See here
> https://travis-ci.org/sunjincheng121/flink/builds/547155029
>  - Increase Travis cache timeout to 1200 - Test the cache cannot be
> downloaded due to cache is a timeout. (I think this test will have the same
> result ) See here https://travis-ci.org/apache/flink/builds/547136163
>
> Will feedback here after testing.
>
> Best,
> Jincheng
>
> Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:53写道:
>
>> The problem is not that bad stuff is in the cache (which is the only
>> thing a cache cleaning solves), it is that the test stages don't
>> download the correct one.
>>
>> Our compile stage uploads stuff in to the cache, and the subsequent test
>> builds downloads it again.
>>
>> Whether the upload from the compile phase is visible to the test phase
>> is basically a timing thing; it depends on the visibility guarantee that
>> the backing infrastructure provides. So far it _usually_ worked, but
>> these are naturally things that may change over time.
>>
>> On 18/06/2019 09:20, Jeff Zhang wrote:
>> > If it is travis caching issue, we can file apache infra ticket and ask
>> them
>> > to clean the cache.
>> >
>> >
>> >
>> > Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:18写道:
>> >
>> >> This is (hopefully a short-lived) hiccup on the Travis caching
>> >> infrastructure.
>> >>
>> >> There's nothing we can do to _fix_ it; if it persists we'll have to
>> >> rework our travis setup again to not rely on caching.
>> >>
>> >> On 18/06/2019 08:34, Kurt Young wrote:
>> >>> Hi dev,
>> >>>
>> >>> I noticed that all the travis tests triggered by pull request are
>> failed
>> >>> with the same error:
>> >>>
>> >>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
>> >>> Exiting build."
>> >>>
>> >>> Anyone have a clue on what happened and how to fix this?
>> >>>
>> >>> Best,
>> >>> Kurt
>> >>>
>> >>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Chesnay Schepler-3
The compile stage was always passing.

The timeout makes no difference, it only affects how long we wait for
the download to complete.
We already had significantly more data in the cache a while ago (like
twice as much), so I skeptical that the amount of cached data is the
problem.

On 18/06/2019 12:47, jincheng sun wrote:

> Test result:
>   - The test for only compile state are succeeding (I deleted some old
> caches) cache size 1146.26M. See here
> https://travis-ci.org/sunjincheng121/flink/caches
> - timeout to 1200 test fail, get the same error, but I think maybe the
> storage problem, so I delete more old cache and restart the CI. See here
> https://travis-ci.org/apache/flink/builds/547136163
>
> So now it feels like the storage size of the cache is limited. If so we can
> add some cleanup logic for the old cache (I am not sure,some validation is
> needed)
>
> Best
> Jincheng
>
> jincheng sun <[hidden email]> 于2019年6月18日周二 下午6:00写道:
>
>> I agree with the explanation from @Chesnay Schepler <[hidden email]>.  this
>> should be a problem with the Travis infrastructure because recently we have
>> not big changed the logic of Travis inside Flink.
>> At present, most of the failures are after the compile is completed. The
>> cache size is only 7.7M, which means that the JARs are not successfully
>> uploaded.
>>
>> So here is a question:
>>   - Where can we check the cache storage to see if there is a problem with
>> the storage?
>>
>> In order to try to find out some reason for the CI issue,  I do the
>> follows test:
>>
>>   - I delete other test phases locally and test them - Test whether the
>> cache is uploaded normally during the compilation phase. See here
>> https://travis-ci.org/sunjincheng121/flink/builds/547155029
>>   - Increase Travis cache timeout to 1200 - Test the cache cannot be
>> downloaded due to cache is a timeout. (I think this test will have the same
>> result ) See here https://travis-ci.org/apache/flink/builds/547136163
>>
>> Will feedback here after testing.
>>
>> Best,
>> Jincheng
>>
>> Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:53写道:
>>
>>> The problem is not that bad stuff is in the cache (which is the only
>>> thing a cache cleaning solves), it is that the test stages don't
>>> download the correct one.
>>>
>>> Our compile stage uploads stuff in to the cache, and the subsequent test
>>> builds downloads it again.
>>>
>>> Whether the upload from the compile phase is visible to the test phase
>>> is basically a timing thing; it depends on the visibility guarantee that
>>> the backing infrastructure provides. So far it _usually_ worked, but
>>> these are naturally things that may change over time.
>>>
>>> On 18/06/2019 09:20, Jeff Zhang wrote:
>>>> If it is travis caching issue, we can file apache infra ticket and ask
>>> them
>>>> to clean the cache.
>>>>
>>>>
>>>>
>>>> Chesnay Schepler <[hidden email]> 于2019年6月18日周二 下午3:18写道:
>>>>
>>>>> This is (hopefully a short-lived) hiccup on the Travis caching
>>>>> infrastructure.
>>>>>
>>>>> There's nothing we can do to _fix_ it; if it persists we'll have to
>>>>> rework our travis setup again to not rely on caching.
>>>>>
>>>>> On 18/06/2019 08:34, Kurt Young wrote:
>>>>>> Hi dev,
>>>>>>
>>>>>> I noticed that all the travis tests triggered by pull request are
>>> failed
>>>>>> with the same error:
>>>>>>
>>>>>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
>>>>>> Exiting build."
>>>>>>
>>>>>> Anyone have a clue on what happened and how to fix this?
>>>>>>
>>>>>> Best,
>>>>>> Kurt
>>>>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Chesnay Schepler-3
In reply to this post by Kurt Young
Recent builds are passing again.

On 18/06/2019 08:34, Kurt Young wrote:

> Hi dev,
>
> I noticed that all the travis tests triggered by pull request are failed
> with the same error:
>
> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> Exiting build."
>
> Anyone have a clue on what happened and how to fix this?
>
> Best,
> Kurt
>

Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Yun Tang
Unfortunately, I met this problem again just now https://api.travis-ci.org/v3/job/549534496/log.txt (the build overview https://travis-ci.org/apache/flink/builds/549534489). For those non-committers, including me, we have to close-reopen the PR or push another commit to re-trigger the PR check🙁

Best
Yun Tang
________________________________
From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, June 19, 2019 16:59
To: [hidden email]; Kurt Young
Subject: Re: Something wrong with travis?

Recent builds are passing again.

On 18/06/2019 08:34, Kurt Young wrote:

> Hi dev,
>
> I noticed that all the travis tests triggered by pull request are failed
> with the same error:
>
> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> Exiting build."
>
> Anyone have a clue on what happened and how to fix this?
>
> Best,
> Kurt
>

Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Yun Tang
I met this problem again at https://api.travis-ci.com/v3/job/220732163/log.txt . Is there any place we could ask for help to contact tarvis or any clues we could use to figure out this?

Best
Yun Tang
________________________________
From: Yun Tang <[hidden email]>
Sent: Monday, June 24, 2019 14:22
To: [hidden email] <[hidden email]>; Kurt Young <[hidden email]>
Subject: Re: Something wrong with travis?

Unfortunately, I met this problem again just now https://api.travis-ci.org/v3/job/549534496/log.txt (the build overview https://travis-ci.org/apache/flink/builds/549534489). For those non-committers, including me, we have to close-reopen the PR or push another commit to re-trigger the PR check🙁

Best
Yun Tang
________________________________
From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, June 19, 2019 16:59
To: [hidden email]; Kurt Young
Subject: Re: Something wrong with travis?

Recent builds are passing again.

On 18/06/2019 08:34, Kurt Young wrote:

> Hi dev,
>
> I noticed that all the travis tests triggered by pull request are failed
> with the same error:
>
> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
> Exiting build."
>
> Anyone have a clue on what happened and how to fix this?
>
> Best,
> Kurt
>

Reply | Threaded
Open this post in threaded view
|

Re: Something wrong with travis?

Chesnay Schepler-3
There is nothing to report; we already know what the problem is but it
cannot be fixed.

On 30/07/2019 08:46, Yun Tang wrote:

> I met this problem again at https://api.travis-ci.com/v3/job/220732163/log.txt . Is there any place we could ask for help to contact tarvis or any clues we could use to figure out this?
>
> Best
> Yun Tang
> ________________________________
> From: Yun Tang <[hidden email]>
> Sent: Monday, June 24, 2019 14:22
> To: [hidden email] <[hidden email]>; Kurt Young <[hidden email]>
> Subject: Re: Something wrong with travis?
>
> Unfortunately, I met this problem again just now https://api.travis-ci.org/v3/job/549534496/log.txt (the build overview https://travis-ci.org/apache/flink/builds/549534489). For those non-committers, including me, we have to close-reopen the PR or push another commit to re-trigger the PR check🙁
>
> Best
> Yun Tang
> ________________________________
> From: Chesnay Schepler <[hidden email]>
> Sent: Wednesday, June 19, 2019 16:59
> To: [hidden email]; Kurt Young
> Subject: Re: Something wrong with travis?
>
> Recent builds are passing again.
>
> On 18/06/2019 08:34, Kurt Young wrote:
>> Hi dev,
>>
>> I noticed that all the travis tests triggered by pull request are failed
>> with the same error:
>>
>> "Cached flink dir /home/travis/flink_cache/xxxxx/flink does not exist.
>> Exiting build."
>>
>> Anyone have a clue on what happened and how to fix this?
>>
>> Best,
>> Kurt
>>