Robert Metzger created FLINK-1170:
------------------------------------- Summary: Localization of InputSplits is not working properly Key: FLINK-1170 URL: https://issues.apache.org/jira/browse/FLINK-1170 Project: Flink Issue Type: Bug Components: Distributed Runtime Reporter: Robert Metzger Assignee: Robert Metzger While running some benchmarks, I found that Flink is not properly assigning the InputSplits. On my testing cluster, ALL splits were assigned to remote HDFS DataNodes, which causes a lot of network I/O. -- This message was sent by Atlassian JIRA (v6.3.4#6332) |
This is a critical issue and sounds bit like a release blocker for 0.7 to
me. Other opinions? 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>: > Robert Metzger created FLINK-1170: > ------------------------------------- > > Summary: Localization of InputSplits is not working properly > Key: FLINK-1170 > URL: https://issues.apache.org/jira/browse/FLINK-1170 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime > Reporter: Robert Metzger > Assignee: Robert Metzger > > > While running some benchmarks, I found that Flink is not properly > assigning the InputSplits. > > On my testing cluster, ALL splits were assigned to remote HDFS DataNodes, > which causes a lot of network I/O. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > |
Did you intentionally post to the mailing list?
I'm investigating the issue. So far, I found that the hostname has never been passed to the input split assigner. I guess this issue was introduced by the recent jobmanager changes. And secondly, Flink is using the fully qualified hostname, whereas HDFS is using the hostname only. This caused a string-mismatch. I wouln't cancel the release because we are at a point where it is faster to vote a bugfix release. The issue is not a show stopper for using flink. Its just slow on large datasets. On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]> wrote: > This is a critical issue and sounds bit like a release blocker for 0.7 to > me. > > Other opinions? > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>: > > > Robert Metzger created FLINK-1170: > > ------------------------------------- > > > > Summary: Localization of InputSplits is not working properly > > Key: FLINK-1170 > > URL: https://issues.apache.org/jira/browse/FLINK-1170 > > Project: Flink > > Issue Type: Bug > > Components: Distributed Runtime > > Reporter: Robert Metzger > > Assignee: Robert Metzger > > > > > > While running some benchmarks, I found that Flink is not properly > > assigning the InputSplits. > > > > On my testing cluster, ALL splits were assigned to remote HDFS DataNodes, > > which causes a lot of network I/O. > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > > > |
Yes, that was intentionally.
The whole point of using a parallel engine is to process large datasets. Otherwise you could do it in Python on a single box... Remote reads will severely impact the performance and might cause significant performance regression. 2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>: > Did you intentionally post to the mailing list? > > I'm investigating the issue. > So far, I found that the hostname has never been passed to the input split > assigner. I guess this issue was introduced by the recent jobmanager > changes. > And secondly, Flink is using the fully qualified hostname, whereas HDFS is > using the hostname only. This caused a string-mismatch. > > I wouln't cancel the release because we are at a point where it is faster > to vote a bugfix release. > The issue is not a show stopper for using flink. Its just slow on large > datasets. > > On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]> > wrote: > > > This is a critical issue and sounds bit like a release blocker for 0.7 to > > me. > > > > Other opinions? > > > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>: > > > > > Robert Metzger created FLINK-1170: > > > ------------------------------------- > > > > > > Summary: Localization of InputSplits is not working > properly > > > Key: FLINK-1170 > > > URL: https://issues.apache.org/jira/browse/FLINK-1170 > > > Project: Flink > > > Issue Type: Bug > > > Components: Distributed Runtime > > > Reporter: Robert Metzger > > > Assignee: Robert Metzger > > > > > > > > > While running some benchmarks, I found that Flink is not properly > > > assigning the InputSplits. > > > > > > On my testing cluster, ALL splits were assigned to remote HDFS > DataNodes, > > > which causes a lot of network I/O. > > > > > > > > > > > > -- > > > This message was sent by Atlassian JIRA > > > (v6.3.4#6332) > > > > > > |
I agree, we should cancel the release, fix this, and make a new release
candidate. Stephan On Fri, Oct 17, 2014 at 12:11 PM, Fabian Hueske <[hidden email]> wrote: > Yes, that was intentionally. > > The whole point of using a parallel engine is to process large datasets. > Otherwise you could do it in Python on a single box... > Remote reads will severely impact the performance and might cause > significant performance regression. > > 2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>: > > > Did you intentionally post to the mailing list? > > > > I'm investigating the issue. > > So far, I found that the hostname has never been passed to the input > split > > assigner. I guess this issue was introduced by the recent jobmanager > > changes. > > And secondly, Flink is using the fully qualified hostname, whereas HDFS > is > > using the hostname only. This caused a string-mismatch. > > > > I wouln't cancel the release because we are at a point where it is faster > > to vote a bugfix release. > > The issue is not a show stopper for using flink. Its just slow on large > > datasets. > > > > On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]> > > wrote: > > > > > This is a critical issue and sounds bit like a release blocker for 0.7 > to > > > me. > > > > > > Other opinions? > > > > > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>: > > > > > > > Robert Metzger created FLINK-1170: > > > > ------------------------------------- > > > > > > > > Summary: Localization of InputSplits is not working > > properly > > > > Key: FLINK-1170 > > > > URL: > https://issues.apache.org/jira/browse/FLINK-1170 > > > > Project: Flink > > > > Issue Type: Bug > > > > Components: Distributed Runtime > > > > Reporter: Robert Metzger > > > > Assignee: Robert Metzger > > > > > > > > > > > > While running some benchmarks, I found that Flink is not properly > > > > assigning the InputSplits. > > > > > > > > On my testing cluster, ALL splits were assigned to remote HDFS > > DataNodes, > > > > which causes a lot of network I/O. > > > > > > > > > > > > > > > > -- > > > > This message was sent by Atlassian JIRA > > > > (v6.3.4#6332) > > > > > > > > > > |
In reply to this post by Fabian Hueske
I agree with Fabian. We need to fix this issue, and this would mean extra
overhead for releasing 0.7.1 asap perhaps just for this bug. I vote to cancel the incubator release thread and vote again here. On Fri, Oct 17, 2014 at 12:11 PM, Fabian Hueske <[hidden email]> wrote: > Yes, that was intentionally. > > The whole point of using a parallel engine is to process large datasets. > Otherwise you could do it in Python on a single box... > Remote reads will severely impact the performance and might cause > significant performance regression. > > 2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>: > > > Did you intentionally post to the mailing list? > > > > I'm investigating the issue. > > So far, I found that the hostname has never been passed to the input > split > > assigner. I guess this issue was introduced by the recent jobmanager > > changes. > > And secondly, Flink is using the fully qualified hostname, whereas HDFS > is > > using the hostname only. This caused a string-mismatch. > > > > I wouln't cancel the release because we are at a point where it is faster > > to vote a bugfix release. > > The issue is not a show stopper for using flink. Its just slow on large > > datasets. > > > > On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]> > > wrote: > > > > > This is a critical issue and sounds bit like a release blocker for 0.7 > to > > > me. > > > > > > Other opinions? > > > > > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>: > > > > > > > Robert Metzger created FLINK-1170: > > > > ------------------------------------- > > > > > > > > Summary: Localization of InputSplits is not working > > properly > > > > Key: FLINK-1170 > > > > URL: > https://issues.apache.org/jira/browse/FLINK-1170 > > > > Project: Flink > > > > Issue Type: Bug > > > > Components: Distributed Runtime > > > > Reporter: Robert Metzger > > > > Assignee: Robert Metzger > > > > > > > > > > > > While running some benchmarks, I found that Flink is not properly > > > > assigning the InputSplits. > > > > > > > > On my testing cluster, ALL splits were assigned to remote HDFS > > DataNodes, > > > > which causes a lot of network I/O. > > > > > > > > > > > > > > > > -- > > > > This message was sent by Atlassian JIRA > > > > (v6.3.4#6332) > > > > > > > > > > |
In reply to this post by Fabian Hueske
I agree with Fabian on this. Let's cancel the release and create a new RC.
On 17 Oct 2014, at 12:11, Fabian Hueske <[hidden email]> wrote: > Yes, that was intentionally. > > The whole point of using a parallel engine is to process large datasets. > Otherwise you could do it in Python on a single box... > Remote reads will severely impact the performance and might cause > significant performance regression. > > 2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>: > >> Did you intentionally post to the mailing list? >> >> I'm investigating the issue. >> So far, I found that the hostname has never been passed to the input split >> assigner. I guess this issue was introduced by the recent jobmanager >> changes. >> And secondly, Flink is using the fully qualified hostname, whereas HDFS is >> using the hostname only. This caused a string-mismatch. >> >> I wouln't cancel the release because we are at a point where it is faster >> to vote a bugfix release. >> The issue is not a show stopper for using flink. Its just slow on large >> datasets. >> >> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]> >> wrote: >> >>> This is a critical issue and sounds bit like a release blocker for 0.7 to >>> me. >>> >>> Other opinions? >>> >>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>: >>> >>>> Robert Metzger created FLINK-1170: >>>> ------------------------------------- >>>> >>>> Summary: Localization of InputSplits is not working >> properly >>>> Key: FLINK-1170 >>>> URL: https://issues.apache.org/jira/browse/FLINK-1170 >>>> Project: Flink >>>> Issue Type: Bug >>>> Components: Distributed Runtime >>>> Reporter: Robert Metzger >>>> Assignee: Robert Metzger >>>> >>>> >>>> While running some benchmarks, I found that Flink is not properly >>>> assigning the InputSplits. >>>> >>>> On my testing cluster, ALL splits were assigned to remote HDFS >> DataNodes, >>>> which causes a lot of network I/O. >>>> >>>> >>>> >>>> -- >>>> This message was sent by Atlassian JIRA >>>> (v6.3.4#6332) >>>> >>> >> |
Okay. I see the point.
I'll write on general@incubator to cancel the vote. On Fri, Oct 17, 2014 at 1:03 PM, Ufuk Celebi <[hidden email]> wrote: > I agree with Fabian on this. Let's cancel the release and create a new RC. > > On 17 Oct 2014, at 12:11, Fabian Hueske <[hidden email]> wrote: > > > Yes, that was intentionally. > > > > The whole point of using a parallel engine is to process large datasets. > > Otherwise you could do it in Python on a single box... > > Remote reads will severely impact the performance and might cause > > significant performance regression. > > > > 2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>: > > > >> Did you intentionally post to the mailing list? > >> > >> I'm investigating the issue. > >> So far, I found that the hostname has never been passed to the input > split > >> assigner. I guess this issue was introduced by the recent jobmanager > >> changes. > >> And secondly, Flink is using the fully qualified hostname, whereas HDFS > is > >> using the hostname only. This caused a string-mismatch. > >> > >> I wouln't cancel the release because we are at a point where it is > faster > >> to vote a bugfix release. > >> The issue is not a show stopper for using flink. Its just slow on large > >> datasets. > >> > >> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]> > >> wrote: > >> > >>> This is a critical issue and sounds bit like a release blocker for 0.7 > to > >>> me. > >>> > >>> Other opinions? > >>> > >>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>: > >>> > >>>> Robert Metzger created FLINK-1170: > >>>> ------------------------------------- > >>>> > >>>> Summary: Localization of InputSplits is not working > >> properly > >>>> Key: FLINK-1170 > >>>> URL: https://issues.apache.org/jira/browse/FLINK-1170 > >>>> Project: Flink > >>>> Issue Type: Bug > >>>> Components: Distributed Runtime > >>>> Reporter: Robert Metzger > >>>> Assignee: Robert Metzger > >>>> > >>>> > >>>> While running some benchmarks, I found that Flink is not properly > >>>> assigning the InputSplits. > >>>> > >>>> On my testing cluster, ALL splits were assigned to remote HDFS > >> DataNodes, > >>>> which causes a lot of network I/O. > >>>> > >>>> > >>>> > >>>> -- > >>>> This message was sent by Atlassian JIRA > >>>> (v6.3.4#6332) > >>>> > >>> > >> > > |
Free forum by Nabble | Edit this page |