Today, not a single built was successful completely. Please see here:
Flink Streaming Core: https://travis-ci.org/mjsax/flink/jobs/73938109 https://travis-ci.org/mjsax/flink/jobs/73951362 https://travis-ci.org/apache/flink/jobs/73938124 https://travis-ci.org/apache/flink/jobs/73899795 https://travis-ci.org/apache/flink/jobs/73938122 https://travis-ci.org/apache/flink/jobs/73952441 Flink Taychon: https://travis-ci.org/apache/flink/jobs/73938123 -Matthias |
Thanks for reporting it , Matthias. Will try to run Travis for latest Flink.
Tachyon test is a bit flaky. Maybe updating to latest release could help. - Henry On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax <[hidden email]> wrote: > Today, not a single built was successful completely. Please see here: > > Flink Streaming Core: > https://travis-ci.org/mjsax/flink/jobs/73938109 > https://travis-ci.org/mjsax/flink/jobs/73951362 > https://travis-ci.org/apache/flink/jobs/73938124 > https://travis-ci.org/apache/flink/jobs/73899795 > https://travis-ci.org/apache/flink/jobs/73938122 > https://travis-ci.org/apache/flink/jobs/73952441 > > Flink Taychon: > https://travis-ci.org/apache/flink/jobs/73938123 > > > -Matthias > |
I only report failing tests after a rebase. ;)
-Matthias On 08/03/2015 11:23 PM, Henry Saputra wrote: > Thanks for reporting it , Matthias. Will try to run Travis for latest Flink. > > Tachyon test is a bit flaky. Maybe updating to latest release could help. > > - Henry > > On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > <[hidden email]> wrote: >> Today, not a single built was successful completely. Please see here: >> >> Flink Streaming Core: >> https://travis-ci.org/mjsax/flink/jobs/73938109 >> https://travis-ci.org/mjsax/flink/jobs/73951362 >> https://travis-ci.org/apache/flink/jobs/73938124 >> https://travis-ci.org/apache/flink/jobs/73899795 >> https://travis-ci.org/apache/flink/jobs/73938122 >> https://travis-ci.org/apache/flink/jobs/73952441 >> >> Flink Taychon: >> https://travis-ci.org/apache/flink/jobs/73938123 >> >> >> -Matthias >> |
What are the commits that you rebased on? Could you maybe narrow down what
caused the regression? On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <[hidden email]> wrote: > I only report failing tests after a rebase. ;) > > -Matthias > > On 08/03/2015 11:23 PM, Henry Saputra wrote: > > Thanks for reporting it , Matthias. Will try to run Travis for latest > Flink. > > > > Tachyon test is a bit flaky. Maybe updating to latest release could help. > > > > - Henry > > > > On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > > <[hidden email]> wrote: > >> Today, not a single built was successful completely. Please see here: > >> > >> Flink Streaming Core: > >> https://travis-ci.org/mjsax/flink/jobs/73938109 > >> https://travis-ci.org/mjsax/flink/jobs/73951362 > >> https://travis-ci.org/apache/flink/jobs/73938124 > >> https://travis-ci.org/apache/flink/jobs/73899795 > >> https://travis-ci.org/apache/flink/jobs/73938122 > >> https://travis-ci.org/apache/flink/jobs/73952441 > >> > >> Flink Taychon: > >> https://travis-ci.org/apache/flink/jobs/73938123 > >> > >> > >> -Matthias > >> > > |
Rebased on:
https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 But if the gap between two rebases is large, it's hard to say what the problem might be... The old parent commit (ie, rebase before last rebase) was https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e -Matthias On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: > What are the commits that you rebased on? Could you maybe narrow down what > caused the regression? > > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax <[hidden email]> > wrote: > >> I only report failing tests after a rebase. ;) >> >> -Matthias >> >> On 08/03/2015 11:23 PM, Henry Saputra wrote: >>> Thanks for reporting it , Matthias. Will try to run Travis for latest >> Flink. >>> >>> Tachyon test is a bit flaky. Maybe updating to latest release could help. >>> >>> - Henry >>> >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax >>> <[hidden email]> wrote: >>>> Today, not a single built was successful completely. Please see here: >>>> >>>> Flink Streaming Core: >>>> https://travis-ci.org/mjsax/flink/jobs/73938109 >>>> https://travis-ci.org/mjsax/flink/jobs/73951362 >>>> https://travis-ci.org/apache/flink/jobs/73938124 >>>> https://travis-ci.org/apache/flink/jobs/73899795 >>>> https://travis-ci.org/apache/flink/jobs/73938122 >>>> https://travis-ci.org/apache/flink/jobs/73952441 >>>> >>>> Flink Taychon: >>>> https://travis-ci.org/apache/flink/jobs/73938123 >>>> >>>> >>>> -Matthias >>>> >> >> > |
I've also seen this fail: https://travis-ci.org/apache/flink/jobs/74025862
in SuccessAfterNetworkBuffersFailureITCase Build seems quite flaky recently. On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <[hidden email]> wrote: > Rebased on: > > > https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 > > But if the gap between two rebases is large, it's hard to say what the > problem might be... > > The old parent commit (ie, rebase before last rebase) was > > https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e > > -Matthias > > On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: > > What are the commits that you rebased on? Could you maybe narrow down > what > > caused the regression? > > > > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < > [hidden email]> > > wrote: > > > >> I only report failing tests after a rebase. ;) > >> > >> -Matthias > >> > >> On 08/03/2015 11:23 PM, Henry Saputra wrote: > >>> Thanks for reporting it , Matthias. Will try to run Travis for latest > >> Flink. > >>> > >>> Tachyon test is a bit flaky. Maybe updating to latest release could > help. > >>> > >>> - Henry > >>> > >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > >>> <[hidden email]> wrote: > >>>> Today, not a single built was successful completely. Please see here: > >>>> > >>>> Flink Streaming Core: > >>>> https://travis-ci.org/mjsax/flink/jobs/73938109 > >>>> https://travis-ci.org/mjsax/flink/jobs/73951362 > >>>> https://travis-ci.org/apache/flink/jobs/73938124 > >>>> https://travis-ci.org/apache/flink/jobs/73899795 > >>>> https://travis-ci.org/apache/flink/jobs/73938122 > >>>> https://travis-ci.org/apache/flink/jobs/73952441 > >>>> > >>>> Flink Taychon: > >>>> https://travis-ci.org/apache/flink/jobs/73938123 > >>>> > >>>> > >>>> -Matthias > >>>> > >> > >> > > > > |
Yes, the build stability is super serious right now.
Here are the problems in question, and what we could do about this: BarrierBuffer: -------------------- Barrier Buffer tests fail in Java 6 builds. I have not found a way to diagnose that problem, yet, but if we cannot find the issue today, I would be willing to revert my latest commits on the barrier buffer to increase the stability. StreamCheckpointingITCase ------------------------------------------- This seems to have started with either the barrier buffer, or the updated partitioned state. If fixing/reverting the barrier buffer does not fix it, and no fix has come up until then, let's revert the latest changes to the partitioned state and re-add them when they are stable. Tachyon: ------------- The Tachyon mini cluster has a problem, apparently, the programs exit with a sysexit or segfault. Since we have no Tachyon code ourselves, do we need this test as part of the nightly tests? Can we make this a "manual" test that we trigger on demand? Greetings, Stephan On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]> wrote: > I've also seen this fail: https://travis-ci.org/apache/flink/jobs/74025862 > > in SuccessAfterNetworkBuffersFailureITCase > > Build seems quite flaky recently. > > On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax <[hidden email] > > > wrote: > > > Rebased on: > > > > > > > https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 > > > > But if the gap between two rebases is large, it's hard to say what the > > problem might be... > > > > The old parent commit (ie, rebase before last rebase) was > > > > > https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e > > > > -Matthias > > > > On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: > > > What are the commits that you rebased on? Could you maybe narrow down > > what > > > caused the regression? > > > > > > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < > > [hidden email]> > > > wrote: > > > > > >> I only report failing tests after a rebase. ;) > > >> > > >> -Matthias > > >> > > >> On 08/03/2015 11:23 PM, Henry Saputra wrote: > > >>> Thanks for reporting it , Matthias. Will try to run Travis for latest > > >> Flink. > > >>> > > >>> Tachyon test is a bit flaky. Maybe updating to latest release could > > help. > > >>> > > >>> - Henry > > >>> > > >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > > >>> <[hidden email]> wrote: > > >>>> Today, not a single built was successful completely. Please see > here: > > >>>> > > >>>> Flink Streaming Core: > > >>>> https://travis-ci.org/mjsax/flink/jobs/73938109 > > >>>> https://travis-ci.org/mjsax/flink/jobs/73951362 > > >>>> https://travis-ci.org/apache/flink/jobs/73938124 > > >>>> https://travis-ci.org/apache/flink/jobs/73899795 > > >>>> https://travis-ci.org/apache/flink/jobs/73938122 > > >>>> https://travis-ci.org/apache/flink/jobs/73952441 > > >>>> > > >>>> Flink Taychon: > > >>>> https://travis-ci.org/apache/flink/jobs/73938123 > > >>>> > > >>>> > > >>>> -Matthias > > >>>> > > >> > > >> > > > > > > > > |
Honestly I don't think the partitioned state changes have anything to do
with the stability, only the reworked test case, which now test proper exactly-once which was missing before. Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K, 12:12): > Yes, the build stability is super serious right now. > > Here are the problems in question, and what we could do about this: > > > > BarrierBuffer: > -------------------- > Barrier Buffer tests fail in Java 6 builds. > > I have not found a way to diagnose that problem, yet, but if we cannot find > the issue today, I would be willing to revert my latest commits on the > barrier buffer to increase the stability. > > > StreamCheckpointingITCase > ------------------------------------------- > This seems to have started with either the barrier buffer, or the updated > partitioned state. If fixing/reverting the barrier buffer does not fix it, > and no fix has come up > > until then, let's revert the latest changes to the partitioned state and > re-add them when they are stable. > > > Tachyon: > ------------- > The Tachyon mini cluster has a problem, apparently, the programs exit with > a sysexit or segfault. > > Since we have no Tachyon code ourselves, do we need this test as part of > the nightly tests? > Can we make this a "manual" test that we trigger on demand? > > > > Greetings, > Stephan > > > > > On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]> > wrote: > > > I've also seen this fail: > https://travis-ci.org/apache/flink/jobs/74025862 > > > > in SuccessAfterNetworkBuffersFailureITCase > > > > Build seems quite flaky recently. > > > > On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax < > [hidden email] > > > > > wrote: > > > > > Rebased on: > > > > > > > > > > > > https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 > > > > > > But if the gap between two rebases is large, it's hard to say what the > > > problem might be... > > > > > > The old parent commit (ie, rebase before last rebase) was > > > > > > > > > https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e > > > > > > -Matthias > > > > > > On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: > > > > What are the commits that you rebased on? Could you maybe narrow down > > > what > > > > caused the regression? > > > > > > > > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < > > > [hidden email]> > > > > wrote: > > > > > > > >> I only report failing tests after a rebase. ;) > > > >> > > > >> -Matthias > > > >> > > > >> On 08/03/2015 11:23 PM, Henry Saputra wrote: > > > >>> Thanks for reporting it , Matthias. Will try to run Travis for > latest > > > >> Flink. > > > >>> > > > >>> Tachyon test is a bit flaky. Maybe updating to latest release could > > > help. > > > >>> > > > >>> - Henry > > > >>> > > > >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > > > >>> <[hidden email]> wrote: > > > >>>> Today, not a single built was successful completely. Please see > > here: > > > >>>> > > > >>>> Flink Streaming Core: > > > >>>> https://travis-ci.org/mjsax/flink/jobs/73938109 > > > >>>> https://travis-ci.org/mjsax/flink/jobs/73951362 > > > >>>> https://travis-ci.org/apache/flink/jobs/73938124 > > > >>>> https://travis-ci.org/apache/flink/jobs/73899795 > > > >>>> https://travis-ci.org/apache/flink/jobs/73938122 > > > >>>> https://travis-ci.org/apache/flink/jobs/73952441 > > > >>>> > > > >>>> Flink Taychon: > > > >>>> https://travis-ci.org/apache/flink/jobs/73938123 > > > >>>> > > > >>>> > > > >>>> -Matthias > > > >>>> > > > >> > > > >> > > > > > > > > > > > > > |
The "StateCheckpointedITCase" has not failed so far, which also test these
guarantees thoroughly. But we need to first rule out the BarrierBuffer. The problem is that the bug occur only on Java 6 and cannot be reproduced locally... On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]> wrote: > Honestly I don't think the partitioned state changes have anything to do > with the stability, only the reworked test case, which now test proper > exactly-once which was missing before. > > Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K, > 12:12): > > > Yes, the build stability is super serious right now. > > > > Here are the problems in question, and what we could do about this: > > > > > > > > BarrierBuffer: > > -------------------- > > Barrier Buffer tests fail in Java 6 builds. > > > > I have not found a way to diagnose that problem, yet, but if we cannot > find > > the issue today, I would be willing to revert my latest commits on the > > barrier buffer to increase the stability. > > > > > > StreamCheckpointingITCase > > ------------------------------------------- > > This seems to have started with either the barrier buffer, or the updated > > partitioned state. If fixing/reverting the barrier buffer does not fix > it, > > and no fix has come up > > > > until then, let's revert the latest changes to the partitioned state and > > re-add them when they are stable. > > > > > > Tachyon: > > ------------- > > The Tachyon mini cluster has a problem, apparently, the programs exit > with > > a sysexit or segfault. > > > > Since we have no Tachyon code ourselves, do we need this test as part of > > the nightly tests? > > Can we make this a "manual" test that we trigger on demand? > > > > > > > > Greetings, > > Stephan > > > > > > > > > > On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]> > > wrote: > > > > > I've also seen this fail: > > https://travis-ci.org/apache/flink/jobs/74025862 > > > > > > in SuccessAfterNetworkBuffersFailureITCase > > > > > > Build seems quite flaky recently. > > > > > > On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax < > > [hidden email] > > > > > > > wrote: > > > > > > > Rebased on: > > > > > > > > > > > > > > > > > > https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 > > > > > > > > But if the gap between two rebases is large, it's hard to say what > the > > > > problem might be... > > > > > > > > The old parent commit (ie, rebase before last rebase) was > > > > > > > > > > > > > > https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e > > > > > > > > -Matthias > > > > > > > > On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: > > > > > What are the commits that you rebased on? Could you maybe narrow > down > > > > what > > > > > caused the regression? > > > > > > > > > > On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < > > > > [hidden email]> > > > > > wrote: > > > > > > > > > >> I only report failing tests after a rebase. ;) > > > > >> > > > > >> -Matthias > > > > >> > > > > >> On 08/03/2015 11:23 PM, Henry Saputra wrote: > > > > >>> Thanks for reporting it , Matthias. Will try to run Travis for > > latest > > > > >> Flink. > > > > >>> > > > > >>> Tachyon test is a bit flaky. Maybe updating to latest release > could > > > > help. > > > > >>> > > > > >>> - Henry > > > > >>> > > > > >>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > > > > >>> <[hidden email]> wrote: > > > > >>>> Today, not a single built was successful completely. Please see > > > here: > > > > >>>> > > > > >>>> Flink Streaming Core: > > > > >>>> https://travis-ci.org/mjsax/flink/jobs/73938109 > > > > >>>> https://travis-ci.org/mjsax/flink/jobs/73951362 > > > > >>>> https://travis-ci.org/apache/flink/jobs/73938124 > > > > >>>> https://travis-ci.org/apache/flink/jobs/73899795 > > > > >>>> https://travis-ci.org/apache/flink/jobs/73938122 > > > > >>>> https://travis-ci.org/apache/flink/jobs/73952441 > > > > >>>> > > > > >>>> Flink Taychon: > > > > >>>> https://travis-ci.org/apache/flink/jobs/73938123 > > > > >>>> > > > > >>>> > > > > >>>> -Matthias > > > > >>>> > > > > >> > > > > >> > > > > > > > > > > > > > > > > > > > |
Aren't we dropping java 6 support?
On 04.08.2015 12:21, Stephan Ewen wrote: > The "StateCheckpointedITCase" has not failed so far, which also test these > guarantees thoroughly. > > But we need to first rule out the BarrierBuffer. The problem is that the > bug occur only on Java 6 and cannot be reproduced locally... > > On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]> wrote: > >> Honestly I don't think the partitioned state changes have anything to do >> with the stability, only the reworked test case, which now test proper >> exactly-once which was missing before. >> >> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K, >> 12:12): >> >>> Yes, the build stability is super serious right now. >>> >>> Here are the problems in question, and what we could do about this: >>> >>> >>> >>> BarrierBuffer: >>> -------------------- >>> Barrier Buffer tests fail in Java 6 builds. >>> >>> I have not found a way to diagnose that problem, yet, but if we cannot >> find >>> the issue today, I would be willing to revert my latest commits on the >>> barrier buffer to increase the stability. >>> >>> >>> StreamCheckpointingITCase >>> ------------------------------------------- >>> This seems to have started with either the barrier buffer, or the updated >>> partitioned state. If fixing/reverting the barrier buffer does not fix >> it, >>> and no fix has come up >>> >>> until then, let's revert the latest changes to the partitioned state and >>> re-add them when they are stable. >>> >>> >>> Tachyon: >>> ------------- >>> The Tachyon mini cluster has a problem, apparently, the programs exit >> with >>> a sysexit or segfault. >>> >>> Since we have no Tachyon code ourselves, do we need this test as part of >>> the nightly tests? >>> Can we make this a "manual" test that we trigger on demand? >>> >>> >>> >>> Greetings, >>> Stephan >>> >>> >>> >>> >>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]> >>> wrote: >>> >>>> I've also seen this fail: >>> https://travis-ci.org/apache/flink/jobs/74025862 >>>> in SuccessAfterNetworkBuffersFailureITCase >>>> >>>> Build seems quite flaky recently. >>>> >>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax < >>> [hidden email] >>>> wrote: >>>> >>>>> Rebased on: >>>>> >>>>> >>>>> >> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 >>>>> But if the gap between two rebases is large, it's hard to say what >> the >>>>> problem might be... >>>>> >>>>> The old parent commit (ie, rebase before last rebase) was >>>>> >>>>> >> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e >>>>> -Matthias >>>>> >>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: >>>>>> What are the commits that you rebased on? Could you maybe narrow >> down >>>>> what >>>>>> caused the regression? >>>>>> >>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < >>>>> [hidden email]> >>>>>> wrote: >>>>>> >>>>>>> I only report failing tests after a rebase. ;) >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote: >>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for >>> latest >>>>>>> Flink. >>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release >> could >>>>> help. >>>>>>>> - Henry >>>>>>>> >>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax >>>>>>>> <[hidden email]> wrote: >>>>>>>>> Today, not a single built was successful completely. Please see >>>> here: >>>>>>>>> Flink Streaming Core: >>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109 >>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362 >>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124 >>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795 >>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122 >>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441 >>>>>>>>> >>>>>>>>> Flink Taychon: >>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123 >>>>>>>>> >>>>>>>>> >>>>>>>>> -Matthias >>>>>>>>> >>>>>>> >>>>> |
Yes.
We should know, though, whether this is a Java 6 bug, or a bug in our system that just happens to occur only with Java 6 (because of different timings in this other engine) On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler < [hidden email]> wrote: > Aren't we dropping java 6 support? > > > On 04.08.2015 12:21, Stephan Ewen wrote: > >> The "StateCheckpointedITCase" has not failed so far, which also test these >> guarantees thoroughly. >> >> But we need to first rule out the BarrierBuffer. The problem is that the >> bug occur only on Java 6 and cannot be reproduced locally... >> >> On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]> wrote: >> >> Honestly I don't think the partitioned state changes have anything to do >>> with the stability, only the reworked test case, which now test proper >>> exactly-once which was missing before. >>> >>> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K, >>> 12:12): >>> >>> Yes, the build stability is super serious right now. >>>> >>>> Here are the problems in question, and what we could do about this: >>>> >>>> >>>> >>>> BarrierBuffer: >>>> -------------------- >>>> Barrier Buffer tests fail in Java 6 builds. >>>> >>>> I have not found a way to diagnose that problem, yet, but if we cannot >>>> >>> find >>> >>>> the issue today, I would be willing to revert my latest commits on the >>>> barrier buffer to increase the stability. >>>> >>>> >>>> StreamCheckpointingITCase >>>> ------------------------------------------- >>>> This seems to have started with either the barrier buffer, or the >>>> updated >>>> partitioned state. If fixing/reverting the barrier buffer does not fix >>>> >>> it, >>> >>>> and no fix has come up >>>> >>>> until then, let's revert the latest changes to the partitioned state and >>>> re-add them when they are stable. >>>> >>>> >>>> Tachyon: >>>> ------------- >>>> The Tachyon mini cluster has a problem, apparently, the programs exit >>>> >>> with >>> >>>> a sysexit or segfault. >>>> >>>> Since we have no Tachyon code ourselves, do we need this test as part of >>>> the nightly tests? >>>> Can we make this a "manual" test that we trigger on demand? >>>> >>>> >>>> >>>> Greetings, >>>> Stephan >>>> >>>> >>>> >>>> >>>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek <[hidden email]> >>>> wrote: >>>> >>>> I've also seen this fail: >>>>> >>>> https://travis-ci.org/apache/flink/jobs/74025862 >>>> >>>>> in SuccessAfterNetworkBuffersFailureITCase >>>>> >>>>> Build seems quite flaky recently. >>>>> >>>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax < >>>>> >>>> [hidden email] >>>> >>>>> wrote: >>>>> >>>>> Rebased on: >>>>>> >>>>>> >>>>>> >>>>>> >>> https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 >>> >>>> But if the gap between two rebases is large, it's hard to say what >>>>>> >>>>> the >>> >>>> problem might be... >>>>>> >>>>>> The old parent commit (ie, rebase before last rebase) was >>>>>> >>>>>> >>>>>> >>> https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e >>> >>>> -Matthias >>>>>> >>>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: >>>>>> >>>>>>> What are the commits that you rebased on? Could you maybe narrow >>>>>>> >>>>>> down >>> >>>> what >>>>>> >>>>>>> caused the regression? >>>>>>> >>>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < >>>>>>> >>>>>> [hidden email]> >>>>>> >>>>>>> wrote: >>>>>>> >>>>>>> I only report failing tests after a rebase. ;) >>>>>>>> >>>>>>>> -Matthias >>>>>>>> >>>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote: >>>>>>>> >>>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for >>>>>>>>> >>>>>>>> latest >>>> >>>>> Flink. >>>>>>>> >>>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release >>>>>>>>> >>>>>>>> could >>> >>>> help. >>>>>> >>>>>>> - Henry >>>>>>>>> >>>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax >>>>>>>>> <[hidden email]> wrote: >>>>>>>>> >>>>>>>>>> Today, not a single built was successful completely. Please see >>>>>>>>>> >>>>>>>>> here: >>>>> >>>>>> Flink Streaming Core: >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109 >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362 >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124 >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795 >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122 >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441 >>>>>>>>>> >>>>>>>>>> Flink Taychon: >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>> > |
I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself.
Maybe Tachyon 0.7 will fix the issues. On Tue, Aug 4, 2015 at 1:57 PM, Stephan Ewen <[hidden email]> wrote: > Yes. > > We should know, though, whether this is a Java 6 bug, or a bug in our > system that just happens to occur only with Java 6 (because of different > timings in this other engine) > > On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler < > [hidden email]> wrote: > > > Aren't we dropping java 6 support? > > > > > > On 04.08.2015 12:21, Stephan Ewen wrote: > > > >> The "StateCheckpointedITCase" has not failed so far, which also test > these > >> guarantees thoroughly. > >> > >> But we need to first rule out the BarrierBuffer. The problem is that the > >> bug occur only on Java 6 and cannot be reproduced locally... > >> > >> On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]> > wrote: > >> > >> Honestly I don't think the partitioned state changes have anything to do > >>> with the stability, only the reworked test case, which now test proper > >>> exactly-once which was missing before. > >>> > >>> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K, > >>> 12:12): > >>> > >>> Yes, the build stability is super serious right now. > >>>> > >>>> Here are the problems in question, and what we could do about this: > >>>> > >>>> > >>>> > >>>> BarrierBuffer: > >>>> -------------------- > >>>> Barrier Buffer tests fail in Java 6 builds. > >>>> > >>>> I have not found a way to diagnose that problem, yet, but if we cannot > >>>> > >>> find > >>> > >>>> the issue today, I would be willing to revert my latest commits on the > >>>> barrier buffer to increase the stability. > >>>> > >>>> > >>>> StreamCheckpointingITCase > >>>> ------------------------------------------- > >>>> This seems to have started with either the barrier buffer, or the > >>>> updated > >>>> partitioned state. If fixing/reverting the barrier buffer does not fix > >>>> > >>> it, > >>> > >>>> and no fix has come up > >>>> > >>>> until then, let's revert the latest changes to the partitioned state > and > >>>> re-add them when they are stable. > >>>> > >>>> > >>>> Tachyon: > >>>> ------------- > >>>> The Tachyon mini cluster has a problem, apparently, the programs exit > >>>> > >>> with > >>> > >>>> a sysexit or segfault. > >>>> > >>>> Since we have no Tachyon code ourselves, do we need this test as part > of > >>>> the nightly tests? > >>>> Can we make this a "manual" test that we trigger on demand? > >>>> > >>>> > >>>> > >>>> Greetings, > >>>> Stephan > >>>> > >>>> > >>>> > >>>> > >>>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek < > [hidden email]> > >>>> wrote: > >>>> > >>>> I've also seen this fail: > >>>>> > >>>> https://travis-ci.org/apache/flink/jobs/74025862 > >>>> > >>>>> in SuccessAfterNetworkBuffersFailureITCase > >>>>> > >>>>> Build seems quite flaky recently. > >>>>> > >>>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax < > >>>>> > >>>> [hidden email] > >>>> > >>>>> wrote: > >>>>> > >>>>> Rebased on: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>> > https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 > >>> > >>>> But if the gap between two rebases is large, it's hard to say what > >>>>>> > >>>>> the > >>> > >>>> problem might be... > >>>>>> > >>>>>> The old parent commit (ie, rebase before last rebase) was > >>>>>> > >>>>>> > >>>>>> > >>> > https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e > >>> > >>>> -Matthias > >>>>>> > >>>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: > >>>>>> > >>>>>>> What are the commits that you rebased on? Could you maybe narrow > >>>>>>> > >>>>>> down > >>> > >>>> what > >>>>>> > >>>>>>> caused the regression? > >>>>>>> > >>>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < > >>>>>>> > >>>>>> [hidden email]> > >>>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>> I only report failing tests after a rebase. ;) > >>>>>>>> > >>>>>>>> -Matthias > >>>>>>>> > >>>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote: > >>>>>>>> > >>>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for > >>>>>>>>> > >>>>>>>> latest > >>>> > >>>>> Flink. > >>>>>>>> > >>>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release > >>>>>>>>> > >>>>>>>> could > >>> > >>>> help. > >>>>>> > >>>>>>> - Henry > >>>>>>>>> > >>>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > >>>>>>>>> <[hidden email]> wrote: > >>>>>>>>> > >>>>>>>>>> Today, not a single built was successful completely. Please see > >>>>>>>>>> > >>>>>>>>> here: > >>>>> > >>>>>> Flink Streaming Core: > >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109 > >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362 > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124 > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795 > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122 > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441 > >>>>>>>>>> > >>>>>>>>>> Flink Taychon: > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123 > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> -Matthias > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>> > > > |
I've also seen the BufferSpillerTest fail:
https://travis-ci.org/apache/flink/jobs/74057503 On Tue, 4 Aug 2015 at 14:10 Robert Metzger <[hidden email]> wrote: > I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself. > Maybe Tachyon 0.7 will fix the issues. > > On Tue, Aug 4, 2015 at 1:57 PM, Stephan Ewen <[hidden email]> wrote: > > > Yes. > > > > We should know, though, whether this is a Java 6 bug, or a bug in our > > system that just happens to occur only with Java 6 (because of different > > timings in this other engine) > > > > On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler < > > [hidden email]> wrote: > > > > > Aren't we dropping java 6 support? > > > > > > > > > On 04.08.2015 12:21, Stephan Ewen wrote: > > > > > >> The "StateCheckpointedITCase" has not failed so far, which also test > > these > > >> guarantees thoroughly. > > >> > > >> But we need to first rule out the BarrierBuffer. The problem is that > the > > >> bug occur only on Java 6 and cannot be reproduced locally... > > >> > > >> On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra <[hidden email]> > > wrote: > > >> > > >> Honestly I don't think the partitioned state changes have anything to > do > > >>> with the stability, only the reworked test case, which now test > proper > > >>> exactly-once which was missing before. > > >>> > > >>> Stephan Ewen <[hidden email]> ezt írta (időpont: 2015. aug. 4., K, > > >>> 12:12): > > >>> > > >>> Yes, the build stability is super serious right now. > > >>>> > > >>>> Here are the problems in question, and what we could do about this: > > >>>> > > >>>> > > >>>> > > >>>> BarrierBuffer: > > >>>> -------------------- > > >>>> Barrier Buffer tests fail in Java 6 builds. > > >>>> > > >>>> I have not found a way to diagnose that problem, yet, but if we > cannot > > >>>> > > >>> find > > >>> > > >>>> the issue today, I would be willing to revert my latest commits on > the > > >>>> barrier buffer to increase the stability. > > >>>> > > >>>> > > >>>> StreamCheckpointingITCase > > >>>> ------------------------------------------- > > >>>> This seems to have started with either the barrier buffer, or the > > >>>> updated > > >>>> partitioned state. If fixing/reverting the barrier buffer does not > fix > > >>>> > > >>> it, > > >>> > > >>>> and no fix has come up > > >>>> > > >>>> until then, let's revert the latest changes to the partitioned state > > and > > >>>> re-add them when they are stable. > > >>>> > > >>>> > > >>>> Tachyon: > > >>>> ------------- > > >>>> The Tachyon mini cluster has a problem, apparently, the programs > exit > > >>>> > > >>> with > > >>> > > >>>> a sysexit or segfault. > > >>>> > > >>>> Since we have no Tachyon code ourselves, do we need this test as > part > > of > > >>>> the nightly tests? > > >>>> Can we make this a "manual" test that we trigger on demand? > > >>>> > > >>>> > > >>>> > > >>>> Greetings, > > >>>> Stephan > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek < > > [hidden email]> > > >>>> wrote: > > >>>> > > >>>> I've also seen this fail: > > >>>>> > > >>>> https://travis-ci.org/apache/flink/jobs/74025862 > > >>>> > > >>>>> in SuccessAfterNetworkBuffersFailureITCase > > >>>>> > > >>>>> Build seems quite flaky recently. > > >>>>> > > >>>>> On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax < > > >>>>> > > >>>> [hidden email] > > >>>> > > >>>>> wrote: > > >>>>> > > >>>>> Rebased on: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>> > > > https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3 > > >>> > > >>>> But if the gap between two rebases is large, it's hard to say what > > >>>>>> > > >>>>> the > > >>> > > >>>> problem might be... > > >>>>>> > > >>>>>> The old parent commit (ie, rebase before last rebase) was > > >>>>>> > > >>>>>> > > >>>>>> > > >>> > > > https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e > > >>> > > >>>> -Matthias > > >>>>>> > > >>>>>> On 08/04/2015 08:57 AM, Aljoscha Krettek wrote: > > >>>>>> > > >>>>>>> What are the commits that you rebased on? Could you maybe narrow > > >>>>>>> > > >>>>>> down > > >>> > > >>>> what > > >>>>>> > > >>>>>>> caused the regression? > > >>>>>>> > > >>>>>>> On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax < > > >>>>>>> > > >>>>>> [hidden email]> > > >>>>>> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>> I only report failing tests after a rebase. ;) > > >>>>>>>> > > >>>>>>>> -Matthias > > >>>>>>>> > > >>>>>>>> On 08/03/2015 11:23 PM, Henry Saputra wrote: > > >>>>>>>> > > >>>>>>>>> Thanks for reporting it , Matthias. Will try to run Travis for > > >>>>>>>>> > > >>>>>>>> latest > > >>>> > > >>>>> Flink. > > >>>>>>>> > > >>>>>>>>> Tachyon test is a bit flaky. Maybe updating to latest release > > >>>>>>>>> > > >>>>>>>> could > > >>> > > >>>> help. > > >>>>>> > > >>>>>>> - Henry > > >>>>>>>>> > > >>>>>>>>> On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax > > >>>>>>>>> <[hidden email]> wrote: > > >>>>>>>>> > > >>>>>>>>>> Today, not a single built was successful completely. Please > see > > >>>>>>>>>> > > >>>>>>>>> here: > > >>>>> > > >>>>>> Flink Streaming Core: > > >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73938109 > > >>>>>>>>>> https://travis-ci.org/mjsax/flink/jobs/73951362 > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938124 > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73899795 > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938122 > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73952441 > > >>>>>>>>>> > > >>>>>>>>>> Flink Taychon: > > >>>>>>>>>> https://travis-ci.org/apache/flink/jobs/73938123 > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> -Matthias > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>> > > >>>>>> > > > > > > |
Free forum by Nabble | Edit this page |