(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Classic

List

Threaded

8 messages Options

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Robert Metzger created FLINK-1170:
-------------------------------------

Summary: Localization of InputSplits is not working properly
Key: FLINK-1170
URL: https://issues.apache.org/jira/browse/FLINK-1170
Project: Flink
Issue Type: Bug
Components: Distributed Runtime
Reporter: Robert Metzger
Assignee: Robert Metzger

While running some benchmarks, I found that Flink is not properly assigning the InputSplits.

On my testing cluster, ALL splits were assigned to remote HDFS DataNodes, which causes a lot of network I/O.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Fabian Hueske

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

This is a critical issue and sounds bit like a release blocker for 0.7 to
me.

Other opinions?

2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>:

> Robert Metzger created FLINK-1170:
> -------------------------------------
>
> Summary: Localization of InputSplits is not working properly
> Key: FLINK-1170
> URL: https://issues.apache.org/jira/browse/FLINK-1170
> Project: Flink
> Issue Type: Bug
> Components: Distributed Runtime
> Reporter: Robert Metzger
> Assignee: Robert Metzger
>
>
> While running some benchmarks, I found that Flink is not properly
> assigning the InputSplits.
>
> On my testing cluster, ALL splits were assigned to remote HDFS DataNodes,
> which causes a lot of network I/O.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>

Robert Metzger

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Did you intentionally post to the mailing list?

I'm investigating the issue.
So far, I found that the hostname has never been passed to the input split
assigner. I guess this issue was introduced by the recent jobmanager
changes.
And secondly, Flink is using the fully qualified hostname, whereas HDFS is
using the hostname only. This caused a string-mismatch.

I wouln't cancel the release because we are at a point where it is faster
to vote a bugfix release.
The issue is not a show stopper for using flink. Its just slow on large
datasets.

On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]> wrote:

> This is a critical issue and sounds bit like a release blocker for 0.7 to
> me.
>
> Other opinions?
>
> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>:
>
> > Robert Metzger created FLINK-1170:
> > -------------------------------------
> >
> > Summary: Localization of InputSplits is not working properly
> > Key: FLINK-1170
> > URL: https://issues.apache.org/jira/browse/FLINK-1170
> > Project: Flink
> > Issue Type: Bug
> > Components: Distributed Runtime
> > Reporter: Robert Metzger
> > Assignee: Robert Metzger
> >
> >
> > While running some benchmarks, I found that Flink is not properly
> > assigning the InputSplits.
> >
> > On my testing cluster, ALL splits were assigned to remote HDFS DataNodes,
> > which causes a lot of network I/O.
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>

Fabian Hueske

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Yes, that was intentionally.

The whole point of using a parallel engine is to process large datasets.
Otherwise you could do it in Python on a single box...
Remote reads will severely impact the performance and might cause
significant performance regression.

2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>:

> Did you intentionally post to the mailing list?
>
> I'm investigating the issue.
> So far, I found that the hostname has never been passed to the input split
> assigner. I guess this issue was introduced by the recent jobmanager
> changes.
> And secondly, Flink is using the fully qualified hostname, whereas HDFS is
> using the hostname only. This caused a string-mismatch.
>
> I wouln't cancel the release because we are at a point where it is faster
> to vote a bugfix release.
> The issue is not a show stopper for using flink. Its just slow on large
> datasets.
>
> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]>
> wrote:
>
> > This is a critical issue and sounds bit like a release blocker for 0.7 to
> > me.
> >
> > Other opinions?
> >
> > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>:
> >
> > > Robert Metzger created FLINK-1170:
> > > -------------------------------------
> > >
> > > Summary: Localization of InputSplits is not working
> properly
> > > Key: FLINK-1170
> > > URL: https://issues.apache.org/jira/browse/FLINK-1170
> > > Project: Flink
> > > Issue Type: Bug
> > > Components: Distributed Runtime
> > > Reporter: Robert Metzger
> > > Assignee: Robert Metzger
> > >
> > >
> > > While running some benchmarks, I found that Flink is not properly
> > > assigning the InputSplits.
> > >
> > > On my testing cluster, ALL splits were assigned to remote HDFS
> DataNodes,
> > > which causes a lot of network I/O.
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian JIRA
> > > (v6.3.4#6332)
> > >
> >
>

Stephan Ewen

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

I agree, we should cancel the release, fix this, and make a new release
candidate.

Stephan

On Fri, Oct 17, 2014 at 12:11 PM, Fabian Hueske <[hidden email]> wrote:

> Yes, that was intentionally.
>
> The whole point of using a parallel engine is to process large datasets.
> Otherwise you could do it in Python on a single box...
> Remote reads will severely impact the performance and might cause
> significant performance regression.
>
> 2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>:
>
> > Did you intentionally post to the mailing list?
> >
> > I'm investigating the issue.
> > So far, I found that the hostname has never been passed to the input
> split
> > assigner. I guess this issue was introduced by the recent jobmanager
> > changes.
> > And secondly, Flink is using the fully qualified hostname, whereas HDFS
> is
> > using the hostname only. This caused a string-mismatch.
> >
> > I wouln't cancel the release because we are at a point where it is faster
> > to vote a bugfix release.
> > The issue is not a show stopper for using flink. Its just slow on large
> > datasets.
> >
> > On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]>
> > wrote:
> >
> > > This is a critical issue and sounds bit like a release blocker for 0.7
> to
> > > me.
> > >
> > > Other opinions?
> > >
> > > 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>:
> > >
> > > > Robert Metzger created FLINK-1170:
> > > > -------------------------------------
> > > >
> > > > Summary: Localization of InputSplits is not working
> > properly
> > > > Key: FLINK-1170
> > > > URL:
> https://issues.apache.org/jira/browse/FLINK-1170
> > > > Project: Flink
> > > > Issue Type: Bug
> > > > Components: Distributed Runtime
> > > > Reporter: Robert Metzger
> > > > Assignee: Robert Metzger
> > > >
> > > >
> > > > While running some benchmarks, I found that Flink is not properly
> > > > assigning the InputSplits.
> > > >
> > > > On my testing cluster, ALL splits were assigned to remote HDFS
> > DataNodes,
> > > > which causes a lot of network I/O.
> > > >
> > > >
> > > >
> > > > --
> > > > This message was sent by Atlassian JIRA
> > > > (v6.3.4#6332)
> > > >
> > >
> >
>

Kostas Tzoumas-2

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

In reply to this post by Fabian Hueske

I agree with Fabian. We need to fix this issue, and this would mean extra
overhead for releasing 0.7.1 asap perhaps just for this bug. I vote to
cancel the incubator release thread and vote again here.

On Fri, Oct 17, 2014 at 12:11 PM, Fabian Hueske <[hidden email]> wrote:

Ufuk Celebi-2

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

In reply to this post by Fabian Hueske

I agree with Fabian on this. Let's cancel the release and create a new RC.

On 17 Oct 2014, at 12:11, Fabian Hueske <[hidden email]> wrote:

> Yes, that was intentionally.
>
> The whole point of using a parallel engine is to process large datasets.
> Otherwise you could do it in Python on a single box...
> Remote reads will severely impact the performance and might cause
> significant performance regression.
>
> 2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>:
>
>> Did you intentionally post to the mailing list?
>>
>> I'm investigating the issue.
>> So far, I found that the hostname has never been passed to the input split
>> assigner. I guess this issue was introduced by the recent jobmanager
>> changes.
>> And secondly, Flink is using the fully qualified hostname, whereas HDFS is
>> using the hostname only. This caused a string-mismatch.
>>
>> I wouln't cancel the release because we are at a point where it is faster
>> to vote a bugfix release.
>> The issue is not a show stopper for using flink. Its just slow on large
>> datasets.
>>
>> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]>
>> wrote:
>>
>>> This is a critical issue and sounds bit like a release blocker for 0.7 to
>>> me.
>>>
>>> Other opinions?
>>>
>>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>:
>>>
>>>> Robert Metzger created FLINK-1170:
>>>> -------------------------------------
>>>>
>>>> Summary: Localization of InputSplits is not working
>> properly
>>>> Key: FLINK-1170
>>>> URL: https://issues.apache.org/jira/browse/FLINK-1170
>>>> Project: Flink
>>>> Issue Type: Bug
>>>> Components: Distributed Runtime
>>>> Reporter: Robert Metzger
>>>> Assignee: Robert Metzger
>>>>
>>>>
>>>> While running some benchmarks, I found that Flink is not properly
>>>> assigning the InputSplits.
>>>>
>>>> On my testing cluster, ALL splits were assigned to remote HDFS
>> DataNodes,
>>>> which causes a lot of network I/O.
>>>>
>>>>
>>>>
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.3.4#6332)
>>>>
>>>
>>

Robert Metzger

Re: [jira] [Created] (FLINK-1170) Localization of InputSplits is not working properly

Okay. I see the point.

I'll write on general@incubator to cancel the vote.

On Fri, Oct 17, 2014 at 1:03 PM, Ufuk Celebi <[hidden email]> wrote:

> I agree with Fabian on this. Let's cancel the release and create a new RC.
>
> On 17 Oct 2014, at 12:11, Fabian Hueske <[hidden email]> wrote:
>
> > Yes, that was intentionally.
> >
> > The whole point of using a parallel engine is to process large datasets.
> > Otherwise you could do it in Python on a single box...
> > Remote reads will severely impact the performance and might cause
> > significant performance regression.
> >
> > 2014-10-17 12:04 GMT+02:00 Robert Metzger <[hidden email]>:
> >
> >> Did you intentionally post to the mailing list?
> >>
> >> I'm investigating the issue.
> >> So far, I found that the hostname has never been passed to the input
> split
> >> assigner. I guess this issue was introduced by the recent jobmanager
> >> changes.
> >> And secondly, Flink is using the fully qualified hostname, whereas HDFS
> is
> >> using the hostname only. This caused a string-mismatch.
> >>
> >> I wouln't cancel the release because we are at a point where it is
> faster
> >> to vote a bugfix release.
> >> The issue is not a show stopper for using flink. Its just slow on large
> >> datasets.
> >>
> >> On Fri, Oct 17, 2014 at 11:58 AM, Fabian Hueske <[hidden email]>
> >> wrote:
> >>
> >>> This is a critical issue and sounds bit like a release blocker for 0.7
> to
> >>> me.
> >>>
> >>> Other opinions?
> >>>
> >>> 2014-10-17 11:25 GMT+02:00 Robert Metzger (JIRA) <[hidden email]>:
> >>>
> >>>> Robert Metzger created FLINK-1170:
> >>>> -------------------------------------
> >>>>
> >>>> Summary: Localization of InputSplits is not working
> >> properly
> >>>> Key: FLINK-1170
> >>>> URL: https://issues.apache.org/jira/browse/FLINK-1170
> >>>> Project: Flink
> >>>> Issue Type: Bug
> >>>> Components: Distributed Runtime
> >>>> Reporter: Robert Metzger
> >>>> Assignee: Robert Metzger
> >>>>
> >>>>
> >>>> While running some benchmarks, I found that Flink is not properly
> >>>> assigning the InputSplits.
> >>>>
> >>>> On my testing cluster, ALL splits were assigned to remote HDFS
> >> DataNodes,
> >>>> which causes a lot of network I/O.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> This message was sent by Atlassian JIRA
> >>>> (v6.3.4#6332)
> >>>>
> >>>
> >>
>
>