I have observed that a Flink-on-Tez test job stalls in two cases on the
Travis CI server. https://travis-ci.org/StephanEwen/incubator-flink/jobs/62302207 It looks like a shuffle fetch is simply not continuing, but freezing. The stack traces suggest at a first glance that this is actually a Tez issue , rather than a Flink issue (all threads stuck in Tez methods), but one cannot be sure. Anyone observed something similar before? |
I think I saw it once, yes. But dismissed it as a fluke.
On Wed, May 13, 2015 at 1:13 AM, Stephan Ewen <[hidden email]> wrote: > I have observed that a Flink-on-Tez test job stalls in two cases on the > Travis CI server. > > https://travis-ci.org/StephanEwen/incubator-flink/jobs/62302207 > > It looks like a shuffle fetch is simply not continuing, but freezing. The > stack traces suggest at a first glance that this is actually a Tez issue , > rather than a Flink issue (all threads stuck in Tez methods), but one > cannot be sure. > > Anyone observed something similar before? |
I saw this failure also multiple times now.
This is another case of it: https://travis-ci.org/apache/flink/jobs/62767646 I think the Tez community is currently voting on a new release. Maybe we should see if this one fixes the issue. Otherwise we should ask on their list. On Wed, May 13, 2015 at 9:35 AM, Aljoscha Krettek <[hidden email]> wrote: > I think I saw it once, yes. But dismissed it as a fluke. > > On Wed, May 13, 2015 at 1:13 AM, Stephan Ewen <[hidden email]> wrote: > > I have observed that a Flink-on-Tez test job stalls in two cases on the > > Travis CI server. > > > > https://travis-ci.org/StephanEwen/incubator-flink/jobs/62302207 > > > > It looks like a shuffle fetch is simply not continuing, but freezing. The > > stack traces suggest at a first glance that this is actually a Tez issue > , > > rather than a Flink issue (all threads stuck in Tez methods), but one > > cannot be sure. > > > > Anyone observed something similar before? > |
Tez has just announced the availability of version 0.6.1.
Maybe that version is more stable. I've filed a jira for upgrading the version: https://issues.apache.org/jira/browse/FLINK-2064 On Sun, May 17, 2015 at 12:04 PM, Robert Metzger <[hidden email]> wrote: > I saw this failure also multiple times now. > This is another case of it: > https://travis-ci.org/apache/flink/jobs/62767646 > > I think the Tez community is currently voting on a new release. Maybe we > should see if this one fixes the issue. > Otherwise we should ask on their list. > > On Wed, May 13, 2015 at 9:35 AM, Aljoscha Krettek <[hidden email]> > wrote: > >> I think I saw it once, yes. But dismissed it as a fluke. >> >> On Wed, May 13, 2015 at 1:13 AM, Stephan Ewen <[hidden email]> wrote: >> > I have observed that a Flink-on-Tez test job stalls in two cases on the >> > Travis CI server. >> > >> > https://travis-ci.org/StephanEwen/incubator-flink/jobs/62302207 >> > >> > It looks like a shuffle fetch is simply not continuing, but freezing. >> The >> > stack traces suggest at a first glance that this is actually a Tez >> issue , >> > rather than a Flink issue (all threads stuck in Tez methods), but one >> > cannot be sure. >> > >> > Anyone observed something similar before? >> > > |
Free forum by Nabble | Edit this page |