Exception when running WC

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Exception when running WC

Chesnay Schepler
Hello,

tonight i was running a WordCount job with the Python API, and halfway
through i got the exception below.
the issue did not occur again after ressubmitting the job.
DOP=160
taskslots=8
filesize=100GB

    org.apache.flink.client.program.ProgramInvocationException: The
    program execution failed: java.lang.RuntimeException: An error
    occurred while reading the next record: The channel is erroneous.
         at
    org.apache.flink.runtime.util.KeyGroupedIterator$ValuesIterator.hasNext(KeyGroupedIterator.java:202)
         at
    org.apache.flink.languagebinding.api.java.streaming.Sender.sendRecords(Sender.java:57)
         at
    org.apache.flink.languagebinding.api.java.streaming.Streamer.stream(Streamer.java:106)
         at
    org.apache.flink.languagebinding.api.java.python.functions.PythonGroupReduce.reduce(PythonGroupReduce.java:77)
         at
    org.apache.flink.runtime.operators.GroupReduceDriver.run(GroupReduceDriver.java:108)
         at
    org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:509)
         at
    org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:374)
         at
    org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
         at java.lang.Thread.run(Thread.java:745)
    Caused by: java.io.IOException: The channel is erroneous.
         at
    org.apache.flink.runtime.io.disk.iomanager.ChannelAccess.checkErroneous(ChannelAccess.java:132)
         at
    org.apache.flink.runtime.io.disk.iomanager.BlockChannelReader.readBlock(BlockChannelReader.java:75)
         at
    org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.sendReadRequest(ChannelReaderInputView.java:263)
         at
    org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.nextSegment(ChannelReaderInputView.java:226)
         at
    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(AbstractPagedInputView.java:159)
         at
    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(AbstractPagedInputView.java:270)
         at
    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readUnsignedByte(AbstractPagedInputView.java:277)
         at
    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readInt(AbstractPagedInputView.java:340)
         at
    org.apache.flink.api.common.typeutils.base.IntSerializer.deserialize(IntSerializer.java:69)
         at
    org.apache.flink.api.common.typeutils.base.IntSerializer.deserialize(IntSerializer.java:28)
         at
    org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:115)
         at
    org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize(TupleSerializer.java:30)
         at
    org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next(ChannelReaderInputViewIterator.java:86)
         at
    org.apache.flink.runtime.operators.sort.MergeIterator$HeadStream.nextHead(MergeIterator.java:131)
         at
    org.apache.flink.runtime.operators.sort.MergeIterator.next(MergeIterator.java:89)
         at
    org.apache.flink.runtime.util.KeyGroupedIterator$ValuesIterator.hasNext(KeyGroupedIterator.java:177)
         ... 8 more
    Caused by: java.io.IOException: Input/output error
         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
         at sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:46)
         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:149)
         at
    org.apache.flink.runtime.io.disk.iomanager.SegmentReadRequest.read(BlockChannelAccess.java:221)
         at
    org.apache.flink.runtime.io.disk.iomanager.IOManager$ReaderThread.run(IOManager.java:551)
         at org.apache.flink.client.program.Client.run(Client.java:321)
         at org.apache.flink.client.program.Client.run(Client.java:287)
         at org.apache.flink.client.program.Client.run(Client.java:281)
         at
    org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:54)
         at
    org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:519)
         at
    org.apache.flink.languagebinding.api.java.python.PythonExecutor.main(PythonExecutor.java:119)
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
         at
    sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
         at
    sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
         at java.lang.reflect.Method.invoke(Method.java:606)
         at
    org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:389)
         at
    org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:307)
         at org.apache.flink.client.program.Client.run(Client.java:240)
         at
    org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:332)
         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:319)
         at
    org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:930)
         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:954)

Reply | Threaded
Open this post in threaded view
|

Re: Exception when running WC

Robert Metzger
The error message is not very specific. I was unable to find any
explanation on this in the web.

I would ignore the message for now. I think its not related to the python
interface. I'm running a lot of experiments on the cluster and I have not
seen this error there before.
If we see the issue appearing more frequently, we should start looking into
patterns (which machine?, harddisk?, only a specific job, etc.).


On Wed, Sep 10, 2014 at 2:57 AM, Chesnay Schepler <
[hidden email]> wrote:

> Hello,
>
> tonight i was running a WordCount job with the Python API, and halfway
> through i got the exception below.
> the issue did not occur again after ressubmitting the job.
> DOP=160
> taskslots=8
> filesize=100GB
>
>    org.apache.flink.client.program.ProgramInvocationException: The
>    program execution failed: java.lang.RuntimeException: An error
>    occurred while reading the next record: The channel is erroneous.
>         at
>    org.apache.flink.runtime.util.KeyGroupedIterator$
> ValuesIterator.hasNext(KeyGroupedIterator.java:202)
>         at
>    org.apache.flink.languagebinding.api.java.streaming.Sender.sendRecords(
> Sender.java:57)
>         at
>    org.apache.flink.languagebinding.api.java.streaming.Streamer.stream(
> Streamer.java:106)
>         at
>    org.apache.flink.languagebinding.api.java.python.functions.
> PythonGroupReduce.reduce(PythonGroupReduce.java:77)
>         at
>    org.apache.flink.runtime.operators.GroupReduceDriver.
> run(GroupReduceDriver.java:108)
>         at
>    org.apache.flink.runtime.operators.RegularPactTask.run(
> RegularPactTask.java:509)
>         at
>    org.apache.flink.runtime.operators.RegularPactTask.
> invoke(RegularPactTask.java:374)
>         at
>    org.apache.flink.runtime.execution.RuntimeEnvironment.
> run(RuntimeEnvironment.java:265)
>         at java.lang.Thread.run(Thread.java:745)
>    Caused by: java.io.IOException: The channel is erroneous.
>         at
>    org.apache.flink.runtime.io.disk.iomanager.ChannelAccess.
> checkErroneous(ChannelAccess.java:132)
>         at
>    org.apache.flink.runtime.io.disk.iomanager.
> BlockChannelReader.readBlock(BlockChannelReader.java:75)
>         at
>    org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.
> sendReadRequest(ChannelReaderInputView.java:263)
>         at
>    org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.
> nextSegment(ChannelReaderInputView.java:226)
>         at
>    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(
> AbstractPagedInputView.java:159)
>         at
>    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(
> AbstractPagedInputView.java:270)
>         at
>    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.
> readUnsignedByte(AbstractPagedInputView.java:277)
>         at
>    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readInt(
> AbstractPagedInputView.java:340)
>         at
>    org.apache.flink.api.common.typeutils.base.IntSerializer.
> deserialize(IntSerializer.java:69)
>         at
>    org.apache.flink.api.common.typeutils.base.IntSerializer.
> deserialize(IntSerializer.java:28)
>         at
>    org.apache.flink.api.java.typeutils.runtime.
> TupleSerializer.deserialize(TupleSerializer.java:115)
>         at
>    org.apache.flink.api.java.typeutils.runtime.
> TupleSerializer.deserialize(TupleSerializer.java:30)
>         at
>    org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next(
> ChannelReaderInputViewIterator.java:86)
>         at
>    org.apache.flink.runtime.operators.sort.MergeIterator$
> HeadStream.nextHead(MergeIterator.java:131)
>         at
>    org.apache.flink.runtime.operators.sort.MergeIterator.
> next(MergeIterator.java:89)
>         at
>    org.apache.flink.runtime.util.KeyGroupedIterator$
> ValuesIterator.hasNext(KeyGroupedIterator.java:177)
>         ... 8 more
>    Caused by: java.io.IOException: Input/output error
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:46)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:149)
>         at
>    org.apache.flink.runtime.io.disk.iomanager.SegmentReadRequest.read(
> BlockChannelAccess.java:221)
>         at
>    org.apache.flink.runtime.io.disk.iomanager.IOManager$
> ReaderThread.run(IOManager.java:551)
>         at org.apache.flink.client.program.Client.run(Client.java:321)
>         at org.apache.flink.client.program.Client.run(Client.java:287)
>         at org.apache.flink.client.program.Client.run(Client.java:281)
>         at
>    org.apache.flink.client.program.ContextEnvironment.
> execute(ContextEnvironment.java:54)
>         at
>    org.apache.flink.api.java.ExecutionEnvironment.execute(
> ExecutionEnvironment.java:519)
>         at
>    org.apache.flink.languagebinding.api.java.python.PythonExecutor.main(
> PythonExecutor.java:119)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
>    sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:57)
>         at
>    sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
>    org.apache.flink.client.program.PackagedProgram.callMainMethod(
> PackagedProgram.java:389)
>         at
>    org.apache.flink.client.program.PackagedProgram.
> invokeInteractiveModeForExecution(PackagedProgram.java:307)
>         at org.apache.flink.client.program.Client.run(Client.java:240)
>         at
>    org.apache.flink.client.CliFrontend.executeProgram(
> CliFrontend.java:332)
>         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:319)
>         at
>    org.apache.flink.client.CliFrontend.parseParameters(
> CliFrontend.java:930)
>         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:954)
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Exception when running WC

Stephan Ewen
Looks like a spurious I/O failure during a file read, outside of our code.

I agree with Robert, this looks like a rare OS/disk/java error, and is not
for us to worry about.

On Wed, Sep 10, 2014 at 10:57 AM, Robert Metzger <[hidden email]>
wrote:

> The error message is not very specific. I was unable to find any
> explanation on this in the web.
>
> I would ignore the message for now. I think its not related to the python
> interface. I'm running a lot of experiments on the cluster and I have not
> seen this error there before.
> If we see the issue appearing more frequently, we should start looking into
> patterns (which machine?, harddisk?, only a specific job, etc.).
>
>
> On Wed, Sep 10, 2014 at 2:57 AM, Chesnay Schepler <
> [hidden email]> wrote:
>
> > Hello,
> >
> > tonight i was running a WordCount job with the Python API, and halfway
> > through i got the exception below.
> > the issue did not occur again after ressubmitting the job.
> > DOP=160
> > taskslots=8
> > filesize=100GB
> >
> >    org.apache.flink.client.program.ProgramInvocationException: The
> >    program execution failed: java.lang.RuntimeException: An error
> >    occurred while reading the next record: The channel is erroneous.
> >         at
> >    org.apache.flink.runtime.util.KeyGroupedIterator$
> > ValuesIterator.hasNext(KeyGroupedIterator.java:202)
> >         at
> >
> org.apache.flink.languagebinding.api.java.streaming.Sender.sendRecords(
> > Sender.java:57)
> >         at
> >    org.apache.flink.languagebinding.api.java.streaming.Streamer.stream(
> > Streamer.java:106)
> >         at
> >    org.apache.flink.languagebinding.api.java.python.functions.
> > PythonGroupReduce.reduce(PythonGroupReduce.java:77)
> >         at
> >    org.apache.flink.runtime.operators.GroupReduceDriver.
> > run(GroupReduceDriver.java:108)
> >         at
> >    org.apache.flink.runtime.operators.RegularPactTask.run(
> > RegularPactTask.java:509)
> >         at
> >    org.apache.flink.runtime.operators.RegularPactTask.
> > invoke(RegularPactTask.java:374)
> >         at
> >    org.apache.flink.runtime.execution.RuntimeEnvironment.
> > run(RuntimeEnvironment.java:265)
> >         at java.lang.Thread.run(Thread.java:745)
> >    Caused by: java.io.IOException: The channel is erroneous.
> >         at
> >    org.apache.flink.runtime.io.disk.iomanager.ChannelAccess.
> > checkErroneous(ChannelAccess.java:132)
> >         at
> >    org.apache.flink.runtime.io.disk.iomanager.
> > BlockChannelReader.readBlock(BlockChannelReader.java:75)
> >         at
> >    org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.
> > sendReadRequest(ChannelReaderInputView.java:263)
> >         at
> >    org.apache.flink.runtime.io.disk.iomanager.ChannelReaderInputView.
> > nextSegment(ChannelReaderInputView.java:226)
> >         at
> >    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.advance(
> > AbstractPagedInputView.java:159)
> >         at
> >
> org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readByte(
> > AbstractPagedInputView.java:270)
> >         at
> >    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.
> > readUnsignedByte(AbstractPagedInputView.java:277)
> >         at
> >    org.apache.flink.runtime.memorymanager.AbstractPagedInputView.readInt(
> > AbstractPagedInputView.java:340)
> >         at
> >    org.apache.flink.api.common.typeutils.base.IntSerializer.
> > deserialize(IntSerializer.java:69)
> >         at
> >    org.apache.flink.api.common.typeutils.base.IntSerializer.
> > deserialize(IntSerializer.java:28)
> >         at
> >    org.apache.flink.api.java.typeutils.runtime.
> > TupleSerializer.deserialize(TupleSerializer.java:115)
> >         at
> >    org.apache.flink.api.java.typeutils.runtime.
> > TupleSerializer.deserialize(TupleSerializer.java:30)
> >         at
> >    org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next(
> > ChannelReaderInputViewIterator.java:86)
> >         at
> >    org.apache.flink.runtime.operators.sort.MergeIterator$
> > HeadStream.nextHead(MergeIterator.java:131)
> >         at
> >    org.apache.flink.runtime.operators.sort.MergeIterator.
> > next(MergeIterator.java:89)
> >         at
> >    org.apache.flink.runtime.util.KeyGroupedIterator$
> > ValuesIterator.hasNext(KeyGroupedIterator.java:177)
> >         ... 8 more
> >    Caused by: java.io.IOException: Input/output error
> >         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >         at sun.nio.ch.FileDispatcherImpl.read(FileDispatcherImpl.java:46)
> >         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >         at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:149)
> >         at
> >    org.apache.flink.runtime.io.disk.iomanager.SegmentReadRequest.read(
> > BlockChannelAccess.java:221)
> >         at
> >    org.apache.flink.runtime.io.disk.iomanager.IOManager$
> > ReaderThread.run(IOManager.java:551)
> >         at org.apache.flink.client.program.Client.run(Client.java:321)
> >         at org.apache.flink.client.program.Client.run(Client.java:287)
> >         at org.apache.flink.client.program.Client.run(Client.java:281)
> >         at
> >    org.apache.flink.client.program.ContextEnvironment.
> > execute(ContextEnvironment.java:54)
> >         at
> >    org.apache.flink.api.java.ExecutionEnvironment.execute(
> > ExecutionEnvironment.java:519)
> >         at
> >    org.apache.flink.languagebinding.api.java.python.PythonExecutor.main(
> > PythonExecutor.java:119)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >    sun.reflect.NativeMethodAccessorImpl.invoke(
> > NativeMethodAccessorImpl.java:57)
> >         at
> >    sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > DelegatingMethodAccessorImpl.java:43)
> >         at java.lang.reflect.Method.invoke(Method.java:606)
> >         at
> >    org.apache.flink.client.program.PackagedProgram.callMainMethod(
> > PackagedProgram.java:389)
> >         at
> >    org.apache.flink.client.program.PackagedProgram.
> > invokeInteractiveModeForExecution(PackagedProgram.java:307)
> >         at org.apache.flink.client.program.Client.run(Client.java:240)
> >         at
> >    org.apache.flink.client.CliFrontend.executeProgram(
> > CliFrontend.java:332)
> >         at org.apache.flink.client.CliFrontend.run(CliFrontend.java:319)
> >         at
> >    org.apache.flink.client.CliFrontend.parseParameters(
> > CliFrontend.java:930)
> >         at org.apache.flink.client.CliFrontend.main(CliFrontend.java:954)
> >
> >
>