Failing Test again

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Failing Test again

Matthias J. Sax
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Henry Saputra
Thanks for reporting it , Matthias. Will try to run Travis for latest Flink.

Tachyon test is a bit flaky. Maybe updating to latest release could help.

- Henry

On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
<[hidden email]> wrote:
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Matthias J. Sax
I only report failing tests after a rebase. ;)

-Matthias

On 08/03/2015 11:23 PM, Henry Saputra wrote:

> Thanks for reporting it , Matthias. Will try to run Travis for latest Flink.
>
> Tachyon test is a bit flaky. Maybe updating to latest release could help.
>
> - Henry
>
> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
> <[hidden email]> wrote:
>> Today, not a single built was successful completely. Please see here:
>>
>> Flink Streaming Core:
>> https://travis-ci.org/mjsax/flink/jobs/73938109
>> https://travis-ci.org/mjsax/flink/jobs/73951362
>> https://travis-ci.org/apache/flink/jobs/73938124
>> https://travis-ci.org/apache/flink/jobs/73899795
>> https://travis-ci.org/apache/flink/jobs/73938122
>> https://travis-ci.org/apache/flink/jobs/73952441
>>
>> Flink Taychon:
>> https://travis-ci.org/apache/flink/jobs/73938123
>>
>>
>> -Matthias
>>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Aljoscha Krettek-2
What are the commits that you rebased on? Could you maybe narrow down what
caused the regression?

On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <[hidden email]>
wrote:

> I only report failing tests after a rebase. ;)
>
> -Matthias
>
> On 08/03/2015 11:23 PM, Henry Saputra wrote:
> > Thanks for reporting it , Matthias. Will try to run Travis for latest
> Flink.
> >
> > Tachyon test is a bit flaky. Maybe updating to latest release could help.
> >
> > - Henry
> >
> > On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
> > <[hidden email]> wrote:
> >> Today, not a single built was successful completely. Please see here:
> >>
> >> Flink Streaming Core:
> >> https://travis-ci.org/mjsax/flink/jobs/73938109
> >> https://travis-ci.org/mjsax/flink/jobs/73951362
> >> https://travis-ci.org/apache/flink/jobs/73938124
> >> https://travis-ci.org/apache/flink/jobs/73899795
> >> https://travis-ci.org/apache/flink/jobs/73938122
> >> https://travis-ci.org/apache/flink/jobs/73952441
> >>
> >> Flink Taychon:
> >> https://travis-ci.org/apache/flink/jobs/73938123
> >>
> >>
> >> -Matthias
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Matthias J. Sax
Rebased on:

https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3

But if the gap between two rebases is large, it's hard to say what the
problem might be...

The old parent commit (ie, rebase before last rebase) was
https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e

-Matthias

On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:

> What are the commits that you rebased on? Could you maybe narrow down what
> caused the regression?
>
> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <[hidden email]>
> wrote:
>
>> I only report failing tests after a rebase. ;)
>>
>> -Matthias
>>
>> On 08/03/2015 11:23 PM, Henry Saputra wrote:
>>> Thanks for reporting it , Matthias. Will try to run Travis for latest
>> Flink.
>>>
>>> Tachyon test is a bit flaky. Maybe updating to latest release could help.
>>>
>>> - Henry
>>>
>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
>>> <[hidden email]> wrote:
>>>> Today, not a single built was successful completely. Please see here:
>>>>
>>>> Flink Streaming Core:
>>>> https://travis-ci.org/mjsax/flink/jobs/73938109
>>>> https://travis-ci.org/mjsax/flink/jobs/73951362
>>>> https://travis-ci.org/apache/flink/jobs/73938124
>>>> https://travis-ci.org/apache/flink/jobs/73899795
>>>> https://travis-ci.org/apache/flink/jobs/73938122
>>>> https://travis-ci.org/apache/flink/jobs/73952441
>>>>
>>>> Flink Taychon:
>>>> https://travis-ci.org/apache/flink/jobs/73938123
>>>>
>>>>
>>>> -Matthias
>>>>
>>
>>
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Aljoscha Krettek-2
I've also seen this fail: https://travis-ci.org/apache/flink/jobs/74025862

in SuccessAfterNetworkBuffersFailureITCase

Build seems quite flaky recently.

On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <[hidden email]>
wrote:

> Rebased on:
>
>
> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
>
> But if the gap between two rebases is large, it's hard to say what the
> problem might be...
>
> The old parent commit (ie, rebase before last rebase) was
>
> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
>
> -Matthias
>
> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
> > What are the commits that you rebased on? Could you maybe narrow down
> what
> > caused the regression?
> >
> > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <
> [hidden email]>
> > wrote:
> >
> >> I only report failing tests after a rebase. ;)
> >>
> >> -Matthias
> >>
> >> On 08/03/2015 11:23 PM, Henry Saputra wrote:
> >>> Thanks for reporting it , Matthias. Will try to run Travis for latest
> >> Flink.
> >>>
> >>> Tachyon test is a bit flaky. Maybe updating to latest release could
> help.
> >>>
> >>> - Henry
> >>>
> >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
> >>> <[hidden email]> wrote:
> >>>> Today, not a single built was successful completely. Please see here:
> >>>>
> >>>> Flink Streaming Core:
> >>>> https://travis-ci.org/mjsax/flink/jobs/73938109
> >>>> https://travis-ci.org/mjsax/flink/jobs/73951362
> >>>> https://travis-ci.org/apache/flink/jobs/73938124
> >>>> https://travis-ci.org/apache/flink/jobs/73899795
> >>>> https://travis-ci.org/apache/flink/jobs/73938122
> >>>> https://travis-ci.org/apache/flink/jobs/73952441
> >>>>
> >>>> Flink Taychon:
> >>>> https://travis-ci.org/apache/flink/jobs/73938123
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>
> >>
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Stephan Ewen
Yes, the build stability is super serious right now.

Here are the problems in question, and what we could do about this:



BarrierBuffer:
--------------------
Barrier Buffer tests fail in Java 6 builds.

I have not found a way to diagnose that problem, yet, but if we cannot find
the issue today, I would be willing to revert my latest commits on the
barrier buffer to increase the stability.


StreamCheckpointingITCase
-------------------------------------------
This seems to have started with either the barrier buffer, or the updated
partitioned state. If fixing/reverting the barrier buffer does not fix it,
and no fix has come up

until then, let's revert the latest changes to the partitioned state and
re-add them when they are stable.


Tachyon:
-------------
The Tachyon mini cluster has a problem, apparently, the programs exit with
a sysexit or segfault.

Since we have no Tachyon code ourselves, do we need this test as part of
the nightly tests?
Can we make this a "manual" test that we trigger on demand?



Greetings,
Stephan




On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]>
wrote:

> I've also seen this fail: https://travis-ci.org/apache/flink/jobs/74025862
>
> in SuccessAfterNetworkBuffersFailureITCase
>
> Build seems quite flaky recently.
>
> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <[hidden email]
> >
> wrote:
>
> > Rebased on:
> >
> >
> >
> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
> >
> > But if the gap between two rebases is large, it's hard to say what the
> > problem might be...
> >
> > The old parent commit (ie, rebase before last rebase) was
> >
> >
> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
> >
> > -Matthias
> >
> > On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
> > > What are the commits that you rebased on? Could you maybe narrow down
> > what
> > > caused the regression?
> > >
> > > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <
> > [hidden email]>
> > > wrote:
> > >
> > >> I only report failing tests after a rebase. ;)
> > >>
> > >> -Matthias
> > >>
> > >> On 08/03/2015 11:23 PM, Henry Saputra wrote:
> > >>> Thanks for reporting it , Matthias. Will try to run Travis for latest
> > >> Flink.
> > >>>
> > >>> Tachyon test is a bit flaky. Maybe updating to latest release could
> > help.
> > >>>
> > >>> - Henry
> > >>>
> > >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
> > >>> <[hidden email]> wrote:
> > >>>> Today, not a single built was successful completely. Please see
> here:
> > >>>>
> > >>>> Flink Streaming Core:
> > >>>> https://travis-ci.org/mjsax/flink/jobs/73938109
> > >>>> https://travis-ci.org/mjsax/flink/jobs/73951362
> > >>>> https://travis-ci.org/apache/flink/jobs/73938124
> > >>>> https://travis-ci.org/apache/flink/jobs/73899795
> > >>>> https://travis-ci.org/apache/flink/jobs/73938122
> > >>>> https://travis-ci.org/apache/flink/jobs/73952441
> > >>>>
> > >>>> Flink Taychon:
> > >>>> https://travis-ci.org/apache/flink/jobs/73938123
> > >>>>
> > >>>>
> > >>>> -Matthias
> > >>>>
> > >>
> > >>
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Gyula Fóra
Honestly I don't think the partitioned state changes have anything to do
with the stability, only the reworked test case, which now test proper
exactly-once which was missing before.

Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K, 12:12):

> Yes, the build stability is super serious right now.
>
> Here are the problems in question, and what we could do about this:
>
>
>
> BarrierBuffer:
> --------------------
> Barrier Buffer tests fail in Java 6 builds.
>
> I have not found a way to diagnose that problem, yet, but if we cannot find
> the issue today, I would be willing to revert my latest commits on the
> barrier buffer to increase the stability.
>
>
> StreamCheckpointingITCase
> -------------------------------------------
> This seems to have started with either the barrier buffer, or the updated
> partitioned state. If fixing/reverting the barrier buffer does not fix it,
> and no fix has come up
>
> until then, let's revert the latest changes to the partitioned state and
> re-add them when they are stable.
>
>
> Tachyon:
> -------------
> The Tachyon mini cluster has a problem, apparently, the programs exit with
> a sysexit or segfault.
>
> Since we have no Tachyon code ourselves, do we need this test as part of
> the nightly tests?
> Can we make this a "manual" test that we trigger on demand?
>
>
>
> Greetings,
> Stephan
>
>
>
>
> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]>
> wrote:
>
> > I've also seen this fail:
> https://travis-ci.org/apache/flink/jobs/74025862
> >
> > in SuccessAfterNetworkBuffersFailureITCase
> >
> > Build seems quite flaky recently.
> >
> > On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <
> [hidden email]
> > >
> > wrote:
> >
> > > Rebased on:
> > >
> > >
> > >
> >
> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
> > >
> > > But if the gap between two rebases is large, it's hard to say what the
> > > problem might be...
> > >
> > > The old parent commit (ie, rebase before last rebase) was
> > >
> > >
> >
> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
> > >
> > > -Matthias
> > >
> > > On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
> > > > What are the commits that you rebased on? Could you maybe narrow down
> > > what
> > > > caused the regression?
> > > >
> > > > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <
> > > [hidden email]>
> > > > wrote:
> > > >
> > > >> I only report failing tests after a rebase. ;)
> > > >>
> > > >> -Matthias
> > > >>
> > > >> On 08/03/2015 11:23 PM, Henry Saputra wrote:
> > > >>> Thanks for reporting it , Matthias. Will try to run Travis for
> latest
> > > >> Flink.
> > > >>>
> > > >>> Tachyon test is a bit flaky. Maybe updating to latest release could
> > > help.
> > > >>>
> > > >>> - Henry
> > > >>>
> > > >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
> > > >>> <[hidden email]> wrote:
> > > >>>> Today, not a single built was successful completely. Please see
> > here:
> > > >>>>
> > > >>>> Flink Streaming Core:
> > > >>>> https://travis-ci.org/mjsax/flink/jobs/73938109
> > > >>>> https://travis-ci.org/mjsax/flink/jobs/73951362
> > > >>>> https://travis-ci.org/apache/flink/jobs/73938124
> > > >>>> https://travis-ci.org/apache/flink/jobs/73899795
> > > >>>> https://travis-ci.org/apache/flink/jobs/73938122
> > > >>>> https://travis-ci.org/apache/flink/jobs/73952441
> > > >>>>
> > > >>>> Flink Taychon:
> > > >>>> https://travis-ci.org/apache/flink/jobs/73938123
> > > >>>>
> > > >>>>
> > > >>>> -Matthias
> > > >>>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Stephan Ewen
The "StateCheckpointedITCase" has not failed so far, which also test these
guarantees thoroughly.

But we need to first rule out the BarrierBuffer. The problem is that the
bug occur only on Java 6 and cannot be reproduced locally...

On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]> wrote:

> Honestly I don't think the partitioned state changes have anything to do
> with the stability, only the reworked test case, which now test proper
> exactly-once which was missing before.
>
> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K,
> 12:12):
>
> > Yes, the build stability is super serious right now.
> >
> > Here are the problems in question, and what we could do about this:
> >
> >
> >
> > BarrierBuffer:
> > --------------------
> > Barrier Buffer tests fail in Java 6 builds.
> >
> > I have not found a way to diagnose that problem, yet, but if we cannot
> find
> > the issue today, I would be willing to revert my latest commits on the
> > barrier buffer to increase the stability.
> >
> >
> > StreamCheckpointingITCase
> > -------------------------------------------
> > This seems to have started with either the barrier buffer, or the updated
> > partitioned state. If fixing/reverting the barrier buffer does not fix
> it,
> > and no fix has come up
> >
> > until then, let's revert the latest changes to the partitioned state and
> > re-add them when they are stable.
> >
> >
> > Tachyon:
> > -------------
> > The Tachyon mini cluster has a problem, apparently, the programs exit
> with
> > a sysexit or segfault.
> >
> > Since we have no Tachyon code ourselves, do we need this test as part of
> > the nightly tests?
> > Can we make this a "manual" test that we trigger on demand?
> >
> >
> >
> > Greetings,
> > Stephan
> >
> >
> >
> >
> > On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]>
> > wrote:
> >
> > > I've also seen this fail:
> > https://travis-ci.org/apache/flink/jobs/74025862
> > >
> > > in SuccessAfterNetworkBuffersFailureITCase
> > >
> > > Build seems quite flaky recently.
> > >
> > > On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <
> > [hidden email]
> > > >
> > > wrote:
> > >
> > > > Rebased on:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
> > > >
> > > > But if the gap between two rebases is large, it's hard to say what
> the
> > > > problem might be...
> > > >
> > > > The old parent commit (ie, rebase before last rebase) was
> > > >
> > > >
> > >
> >
> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
> > > >
> > > > -Matthias
> > > >
> > > > On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
> > > > > What are the commits that you rebased on? Could you maybe narrow
> down
> > > > what
> > > > > caused the regression?
> > > > >
> > > > > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <
> > > > [hidden email]>
> > > > > wrote:
> > > > >
> > > > >> I only report failing tests after a rebase. ;)
> > > > >>
> > > > >> -Matthias
> > > > >>
> > > > >> On 08/03/2015 11:23 PM, Henry Saputra wrote:
> > > > >>> Thanks for reporting it , Matthias. Will try to run Travis for
> > latest
> > > > >> Flink.
> > > > >>>
> > > > >>> Tachyon test is a bit flaky. Maybe updating to latest release
> could
> > > > help.
> > > > >>>
> > > > >>> - Henry
> > > > >>>
> > > > >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
> > > > >>> <[hidden email]> wrote:
> > > > >>>> Today, not a single built was successful completely. Please see
> > > here:
> > > > >>>>
> > > > >>>> Flink Streaming Core:
> > > > >>>> https://travis-ci.org/mjsax/flink/jobs/73938109
> > > > >>>> https://travis-ci.org/mjsax/flink/jobs/73951362
> > > > >>>> https://travis-ci.org/apache/flink/jobs/73938124
> > > > >>>> https://travis-ci.org/apache/flink/jobs/73899795
> > > > >>>> https://travis-ci.org/apache/flink/jobs/73938122
> > > > >>>> https://travis-ci.org/apache/flink/jobs/73952441
> > > > >>>>
> > > > >>>> Flink Taychon:
> > > > >>>> https://travis-ci.org/apache/flink/jobs/73938123
> > > > >>>>
> > > > >>>>
> > > > >>>> -Matthias
> > > > >>>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Chesnay Schepler
Aren't we dropping java 6 support?

On 04.08.2015 12:21, Stephan Ewen wrote:

> The "StateCheckpointedITCase" has not failed so far, which also test these
> guarantees thoroughly.
>
> But we need to first rule out the BarrierBuffer. The problem is that the
> bug occur only on Java 6 and cannot be reproduced locally...
>
> On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]> wrote:
>
>> Honestly I don't think the partitioned state changes have anything to do
>> with the stability, only the reworked test case, which now test proper
>> exactly-once which was missing before.
>>
>> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K,
>> 12:12):
>>
>>> Yes, the build stability is super serious right now.
>>>
>>> Here are the problems in question, and what we could do about this:
>>>
>>>
>>>
>>> BarrierBuffer:
>>> --------------------
>>> Barrier Buffer tests fail in Java 6 builds.
>>>
>>> I have not found a way to diagnose that problem, yet, but if we cannot
>> find
>>> the issue today, I would be willing to revert my latest commits on the
>>> barrier buffer to increase the stability.
>>>
>>>
>>> StreamCheckpointingITCase
>>> -------------------------------------------
>>> This seems to have started with either the barrier buffer, or the updated
>>> partitioned state. If fixing/reverting the barrier buffer does not fix
>> it,
>>> and no fix has come up
>>>
>>> until then, let's revert the latest changes to the partitioned state and
>>> re-add them when they are stable.
>>>
>>>
>>> Tachyon:
>>> -------------
>>> The Tachyon mini cluster has a problem, apparently, the programs exit
>> with
>>> a sysexit or segfault.
>>>
>>> Since we have no Tachyon code ourselves, do we need this test as part of
>>> the nightly tests?
>>> Can we make this a "manual" test that we trigger on demand?
>>>
>>>
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>>
>>>
>>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]>
>>> wrote:
>>>
>>>> I've also seen this fail:
>>> https://travis-ci.org/apache/flink/jobs/74025862
>>>> in SuccessAfterNetworkBuffersFailureITCase
>>>>
>>>> Build seems quite flaky recently.
>>>>
>>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <
>>> [hidden email]
>>>> wrote:
>>>>
>>>>> Rebased on:
>>>>>
>>>>>
>>>>>
>> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
>>>>> But if the gap between two rebases is large, it's hard to say what
>> the
>>>>> problem might be...
>>>>>
>>>>> The old parent commit (ie, rebase before last rebase) was
>>>>>
>>>>>
>> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
>>>>> -Matthias
>>>>>
>>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
>>>>>> What are the commits that you rebased on? Could you maybe narrow
>> down
>>>>> what
>>>>>> caused the regression?
>>>>>>
>>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <
>>>>> [hidden email]>
>>>>>> wrote:
>>>>>>
>>>>>>> I only report failing tests after a rebase. ;)
>>>>>>>
>>>>>>> -Matthias
>>>>>>>
>>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote:
>>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for
>>> latest
>>>>>>> Flink.
>>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release
>> could
>>>>> help.
>>>>>>>> - Henry
>>>>>>>>
>>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
>>>>>>>> <[hidden email]> wrote:
>>>>>>>>> Today, not a single built was successful completely. Please see
>>>> here:
>>>>>>>>> Flink Streaming Core:
>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109
>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362
>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124
>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795
>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122
>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441
>>>>>>>>>
>>>>>>>>> Flink Taychon:
>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>
>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Stephan Ewen
Yes.

We should know, though, whether this is a Java 6 bug, or a bug in our
system that just happens to occur only with Java 6 (because of different
timings in this other engine)

On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler <
[hidden email]> wrote:

> Aren't we dropping java 6 support?
>
>
> On 04.08.2015 12:21, Stephan Ewen wrote:
>
>> The "StateCheckpointedITCase" has not failed so far, which also test these
>> guarantees thoroughly.
>>
>> But we need to first rule out the BarrierBuffer. The problem is that the
>> bug occur only on Java 6 and cannot be reproduced locally...
>>
>> On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]> wrote:
>>
>> Honestly I don't think the partitioned state changes have anything to do
>>> with the stability, only the reworked test case, which now test proper
>>> exactly-once which was missing before.
>>>
>>> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K,
>>> 12:12):
>>>
>>> Yes, the build stability is super serious right now.
>>>>
>>>> Here are the problems in question, and what we could do about this:
>>>>
>>>>
>>>>
>>>> BarrierBuffer:
>>>> --------------------
>>>> Barrier Buffer tests fail in Java 6 builds.
>>>>
>>>> I have not found a way to diagnose that problem, yet, but if we cannot
>>>>
>>> find
>>>
>>>> the issue today, I would be willing to revert my latest commits on the
>>>> barrier buffer to increase the stability.
>>>>
>>>>
>>>> StreamCheckpointingITCase
>>>> -------------------------------------------
>>>> This seems to have started with either the barrier buffer, or the
>>>> updated
>>>> partitioned state. If fixing/reverting the barrier buffer does not fix
>>>>
>>> it,
>>>
>>>> and no fix has come up
>>>>
>>>> until then, let's revert the latest changes to the partitioned state and
>>>> re-add them when they are stable.
>>>>
>>>>
>>>> Tachyon:
>>>> -------------
>>>> The Tachyon mini cluster has a problem, apparently, the programs exit
>>>>
>>> with
>>>
>>>> a sysexit or segfault.
>>>>
>>>> Since we have no Tachyon code ourselves, do we need this test as part of
>>>> the nightly tests?
>>>> Can we make this a "manual" test that we trigger on demand?
>>>>
>>>>
>>>>
>>>> Greetings,
>>>> Stephan
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]>
>>>> wrote:
>>>>
>>>> I've also seen this fail:
>>>>>
>>>> https://travis-ci.org/apache/flink/jobs/74025862
>>>>
>>>>> in SuccessAfterNetworkBuffersFailureITCase
>>>>>
>>>>> Build seems quite flaky recently.
>>>>>
>>>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <
>>>>>
>>>> [hidden email]
>>>>
>>>>> wrote:
>>>>>
>>>>> Rebased on:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
>>>
>>>> But if the gap between two rebases is large, it's hard to say what
>>>>>>
>>>>> the
>>>
>>>> problem might be...
>>>>>>
>>>>>> The old parent commit (ie, rebase before last rebase) was
>>>>>>
>>>>>>
>>>>>>
>>> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
>>>
>>>> -Matthias
>>>>>>
>>>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
>>>>>>
>>>>>>> What are the commits that you rebased on? Could you maybe narrow
>>>>>>>
>>>>>> down
>>>
>>>> what
>>>>>>
>>>>>>> caused the regression?
>>>>>>>
>>>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <
>>>>>>>
>>>>>> [hidden email]>
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> I only report failing tests after a rebase. ;)
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote:
>>>>>>>>
>>>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for
>>>>>>>>>
>>>>>>>> latest
>>>>
>>>>> Flink.
>>>>>>>>
>>>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release
>>>>>>>>>
>>>>>>>> could
>>>
>>>> help.
>>>>>>
>>>>>>> - Henry
>>>>>>>>>
>>>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
>>>>>>>>> <[hidden email]> wrote:
>>>>>>>>>
>>>>>>>>>> Today, not a single built was successful completely. Please see
>>>>>>>>>>
>>>>>>>>> here:
>>>>>
>>>>>> Flink Streaming Core:
>>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109
>>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362
>>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124
>>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795
>>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122
>>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441
>>>>>>>>>>
>>>>>>>>>> Flink Taychon:
>>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Robert Metzger
I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself.
Maybe Tachyon 0.7 will fix the issues.

On Tue, Aug 4, 2015 at 1:57 PM, Stephan Ewen <[hidden email]> wrote:

> Yes.
>
> We should know, though, whether this is a Java 6 bug, or a bug in our
> system that just happens to occur only with Java 6 (because of different
> timings in this other engine)
>
> On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler <
> [hidden email]> wrote:
>
> > Aren't we dropping java 6 support?
> >
> >
> > On 04.08.2015 12:21, Stephan Ewen wrote:
> >
> >> The "StateCheckpointedITCase" has not failed so far, which also test
> these
> >> guarantees thoroughly.
> >>
> >> But we need to first rule out the BarrierBuffer. The problem is that the
> >> bug occur only on Java 6 and cannot be reproduced locally...
> >>
> >> On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]>
> wrote:
> >>
> >> Honestly I don't think the partitioned state changes have anything to do
> >>> with the stability, only the reworked test case, which now test proper
> >>> exactly-once which was missing before.
> >>>
> >>> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K,
> >>> 12:12):
> >>>
> >>> Yes, the build stability is super serious right now.
> >>>>
> >>>> Here are the problems in question, and what we could do about this:
> >>>>
> >>>>
> >>>>
> >>>> BarrierBuffer:
> >>>> --------------------
> >>>> Barrier Buffer tests fail in Java 6 builds.
> >>>>
> >>>> I have not found a way to diagnose that problem, yet, but if we cannot
> >>>>
> >>> find
> >>>
> >>>> the issue today, I would be willing to revert my latest commits on the
> >>>> barrier buffer to increase the stability.
> >>>>
> >>>>
> >>>> StreamCheckpointingITCase
> >>>> -------------------------------------------
> >>>> This seems to have started with either the barrier buffer, or the
> >>>> updated
> >>>> partitioned state. If fixing/reverting the barrier buffer does not fix
> >>>>
> >>> it,
> >>>
> >>>> and no fix has come up
> >>>>
> >>>> until then, let's revert the latest changes to the partitioned state
> and
> >>>> re-add them when they are stable.
> >>>>
> >>>>
> >>>> Tachyon:
> >>>> -------------
> >>>> The Tachyon mini cluster has a problem, apparently, the programs exit
> >>>>
> >>> with
> >>>
> >>>> a sysexit or segfault.
> >>>>
> >>>> Since we have no Tachyon code ourselves, do we need this test as part
> of
> >>>> the nightly tests?
> >>>> Can we make this a "manual" test that we trigger on demand?
> >>>>
> >>>>
> >>>>
> >>>> Greetings,
> >>>> Stephan
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <
> [hidden email]>
> >>>> wrote:
> >>>>
> >>>> I've also seen this fail:
> >>>>>
> >>>> https://travis-ci.org/apache/flink/jobs/74025862
> >>>>
> >>>>> in SuccessAfterNetworkBuffersFailureITCase
> >>>>>
> >>>>> Build seems quite flaky recently.
> >>>>>
> >>>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <
> >>>>>
> >>>> [hidden email]
> >>>>
> >>>>> wrote:
> >>>>>
> >>>>> Rebased on:
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>
> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
> >>>
> >>>> But if the gap between two rebases is large, it's hard to say what
> >>>>>>
> >>>>> the
> >>>
> >>>> problem might be...
> >>>>>>
> >>>>>> The old parent commit (ie, rebase before last rebase) was
> >>>>>>
> >>>>>>
> >>>>>>
> >>>
> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
> >>>
> >>>> -Matthias
> >>>>>>
> >>>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
> >>>>>>
> >>>>>>> What are the commits that you rebased on? Could you maybe narrow
> >>>>>>>
> >>>>>> down
> >>>
> >>>> what
> >>>>>>
> >>>>>>> caused the regression?
> >>>>>>>
> >>>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <
> >>>>>>>
> >>>>>> [hidden email]>
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>> I only report failing tests after a rebase. ;)
> >>>>>>>>
> >>>>>>>> -Matthias
> >>>>>>>>
> >>>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote:
> >>>>>>>>
> >>>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for
> >>>>>>>>>
> >>>>>>>> latest
> >>>>
> >>>>> Flink.
> >>>>>>>>
> >>>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release
> >>>>>>>>>
> >>>>>>>> could
> >>>
> >>>> help.
> >>>>>>
> >>>>>>> - Henry
> >>>>>>>>>
> >>>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
> >>>>>>>>> <[hidden email]> wrote:
> >>>>>>>>>
> >>>>>>>>>> Today, not a single built was successful completely. Please see
> >>>>>>>>>>
> >>>>>>>>> here:
> >>>>>
> >>>>>> Flink Streaming Core:
> >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109
> >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362
> >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124
> >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795
> >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122
> >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441
> >>>>>>>>>>
> >>>>>>>>>> Flink Taychon:
> >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> -Matthias
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Failing Test again

Aljoscha Krettek-2
I've also seen the BufferSpillerTest fail:
https://travis-ci.org/apache/flink/jobs/74057503


On Tue, 4 Aug 2015 at 14:10 Robert Metzger <[hidden email]> wrote:

> I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself.
> Maybe Tachyon 0.7 will fix the issues.
>
> On Tue, Aug 4, 2015 at 1:57 PM, Stephan Ewen <[hidden email]> wrote:
>
> > Yes.
> >
> > We should know, though, whether this is a Java 6 bug, or a bug in our
> > system that just happens to occur only with Java 6 (because of different
> > timings in this other engine)
> >
> > On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler <
> > [hidden email]> wrote:
> >
> > > Aren't we dropping java 6 support?
> > >
> > >
> > > On 04.08.2015 12:21, Stephan Ewen wrote:
> > >
> > >> The "StateCheckpointedITCase" has not failed so far, which also test
> > these
> > >> guarantees thoroughly.
> > >>
> > >> But we need to first rule out the BarrierBuffer. The problem is that
> the
> > >> bug occur only on Java 6 and cannot be reproduced locally...
> > >>
> > >> On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]>
> > wrote:
> > >>
> > >> Honestly I don't think the partitioned state changes have anything to
> do
> > >>> with the stability, only the reworked test case, which now test
> proper
> > >>> exactly-once which was missing before.
> > >>>
> > >>> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K,
> > >>> 12:12):
> > >>>
> > >>> Yes, the build stability is super serious right now.
> > >>>>
> > >>>> Here are the problems in question, and what we could do about this:
> > >>>>
> > >>>>
> > >>>>
> > >>>> BarrierBuffer:
> > >>>> --------------------
> > >>>> Barrier Buffer tests fail in Java 6 builds.
> > >>>>
> > >>>> I have not found a way to diagnose that problem, yet, but if we
> cannot
> > >>>>
> > >>> find
> > >>>
> > >>>> the issue today, I would be willing to revert my latest commits on
> the
> > >>>> barrier buffer to increase the stability.
> > >>>>
> > >>>>
> > >>>> StreamCheckpointingITCase
> > >>>> -------------------------------------------
> > >>>> This seems to have started with either the barrier buffer, or the
> > >>>> updated
> > >>>> partitioned state. If fixing/reverting the barrier buffer does not
> fix
> > >>>>
> > >>> it,
> > >>>
> > >>>> and no fix has come up
> > >>>>
> > >>>> until then, let's revert the latest changes to the partitioned state
> > and
> > >>>> re-add them when they are stable.
> > >>>>
> > >>>>
> > >>>> Tachyon:
> > >>>> -------------
> > >>>> The Tachyon mini cluster has a problem, apparently, the programs
> exit
> > >>>>
> > >>> with
> > >>>
> > >>>> a sysexit or segfault.
> > >>>>
> > >>>> Since we have no Tachyon code ourselves, do we need this test as
> part
> > of
> > >>>> the nightly tests?
> > >>>> Can we make this a "manual" test that we trigger on demand?
> > >>>>
> > >>>>
> > >>>>
> > >>>> Greetings,
> > >>>> Stephan
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <
> > [hidden email]>
> > >>>> wrote:
> > >>>>
> > >>>> I've also seen this fail:
> > >>>>>
> > >>>> https://travis-ci.org/apache/flink/jobs/74025862
> > >>>>
> > >>>>> in SuccessAfterNetworkBuffersFailureITCase
> > >>>>>
> > >>>>> Build seems quite flaky recently.
> > >>>>>
> > >>>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <
> > >>>>>
> > >>>> [hidden email]
> > >>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>> Rebased on:
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>
> >
> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
> > >>>
> > >>>> But if the gap between two rebases is large, it's hard to say what
> > >>>>>>
> > >>>>> the
> > >>>
> > >>>> problem might be...
> > >>>>>>
> > >>>>>> The old parent commit (ie, rebase before last rebase) was
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>
> >
> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
> > >>>
> > >>>> -Matthias
> > >>>>>>
> > >>>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
> > >>>>>>
> > >>>>>>> What are the commits that you rebased on? Could you maybe narrow
> > >>>>>>>
> > >>>>>> down
> > >>>
> > >>>> what
> > >>>>>>
> > >>>>>>> caused the regression?
> > >>>>>>>
> > >>>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <
> > >>>>>>>
> > >>>>>> [hidden email]>
> > >>>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>> I only report failing tests after a rebase. ;)
> > >>>>>>>>
> > >>>>>>>> -Matthias
> > >>>>>>>>
> > >>>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote:
> > >>>>>>>>
> > >>>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for
> > >>>>>>>>>
> > >>>>>>>> latest
> > >>>>
> > >>>>> Flink.
> > >>>>>>>>
> > >>>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release
> > >>>>>>>>>
> > >>>>>>>> could
> > >>>
> > >>>> help.
> > >>>>>>
> > >>>>>>> - Henry
> > >>>>>>>>>
> > >>>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
> > >>>>>>>>> <[hidden email]> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Today, not a single built was successful completely. Please
> see
> > >>>>>>>>>>
> > >>>>>>>>> here:
> > >>>>>
> > >>>>>> Flink Streaming Core:
> > >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109
> > >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362
> > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124
> > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795
> > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122
> > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441
> > >>>>>>>>>>
> > >>>>>>>>>> Flink Taychon:
> > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> -Matthias
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >
> >
>