(DEPRECATED) Apache Flink Mailing List archive.

Connection reset by peer

Classic

List

Threaded

5 messages Options

Gyula Fóra-2

Connection reset by peer

Hi guys,

I have a Flink Streaming job running for about a day now without any errors
and then I got this in the job manager log:

15:37:49,905 WARN io.netty.channel.DefaultChannelPipeline
- An exceptionCaught() event was fired, and it reached at
the tail of the pipeline. It usually means the last handler in the
pipeline did not handle the exception.
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)

After this the job did not fail it keeps running. This happened twice.
Can anyone tell me what might cause this exception?

Cheers,

Gyula

Ufuk Celebi-2

Re: Connection reset by peer

Can you please share the complete logs with me? Uce at apache org ;)

On Saturday, 14 November 2015, Gyula Fóra <[hidden email]> wrote:

> Hi guys,
>
> I have a Flink Streaming job running for about a day now without any errors
> and then I got this in the job manager log:
>
> 15:37:49,905 WARN io.netty.channel.DefaultChannelPipeline
> - An exceptionCaught() event was fired, and it reached at
> the tail of the pipeline. It usually means the last handler in the
> pipeline did not handle the exception.
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
> at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
> at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> at java.lang.Thread.run(Thread.java:745)
>
>
> After this the job did not fail it keeps running. This happened twice.
> Can anyone tell me what might cause this exception?
>
>
> Cheers,
>
> Gyula
>

Ufuk Celebi-2

Re: Connection reset by peer

In reply to this post by Gyula Fóra-2

This Exception was not thrown by the data exchange component. This is confirmed by the stack trace you have shared. It shows the DefaultThreadFactory, which we don’t use for the data exchange. Any Exception thrown there will actually fail the program.

My best guess is that this was thrown by the new web interface. Was it running with your job?

My second best guess is that it was thrown by another component running Netty (maybe a Hadoop client?).

– Ufuk

PS Thanks for sharing the logs with me. :)

> On 14 Nov 2015, at 18:14, Gyula Fóra <[hidden email]> wrote:
>
> Hi guys,
>
> I have a Flink Streaming job running for about a day now without any errors
> and then I got this in the job manager log:
>
> 15:37:49,905 WARN io.netty.channel.DefaultChannelPipeline
> - An exceptionCaught() event was fired, and it reached at
> the tail of the pipeline. It usually means the last handler in the
> pipeline did not handle the exception.
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
> at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> at java.lang.Thread.run(Thread.java:745)
>
>
> After this the job did not fail it keeps running. This happened twice.
> Can anyone tell me what might cause this exception?
>
>
> Cheers,
>
> Gyula

Sachin Goel

Re: Connection reset by peer

I used to get a similar exception [I do not remember if the stack
trace was *exactly
*the same but it was from the web interface, and was due to the *connection
reset by peer *]. Currently, the web interface does not handle
exceptionCaught events cleanly.
One of my PRs has addressed this by adding a exception handler at the end
of pipeline. https://github.com/apache/flink/pull/1338

-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Sun, Nov 15, 2015 at 5:48 PM, Ufuk Celebi <[hidden email]> wrote:

> This Exception was not thrown by the data exchange component. This is
> confirmed by the stack trace you have shared. It shows the
> DefaultThreadFactory, which we don’t use for the data exchange. Any
> Exception thrown there will actually fail the program.
>
> My best guess is that this was thrown by the new web interface. Was it
> running with your job?
>
> My second best guess is that it was thrown by another component running
> Netty (maybe a Hadoop client?).
>
> – Ufuk
>
> PS Thanks for sharing the logs with me. :)
>
> > On 14 Nov 2015, at 18:14, Gyula Fóra <[hidden email]> wrote:
> >
> > Hi guys,
> >
> > I have a Flink Streaming job running for about a day now without any
> errors
> > and then I got this in the job manager log:
> >
> > 15:37:49,905 WARN io.netty.channel.DefaultChannelPipeline
> > - An exceptionCaught() event was fired, and it reached at
> > the tail of the pipeline. It usually means the last handler in the
> > pipeline did not handle the exception.
> > java.io.IOException: Connection reset by peer
> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> > at
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
> > at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
> > at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> > at java.lang.Thread.run(Thread.java:745)
> >
> >
> > After this the job did not fail it keeps running. This happened twice.
> > Can anyone tell me what might cause this exception?
> >
> >
> > Cheers,
> >
> > Gyula
>
>

Gyula Fóra

Re: Connection reset by peer

Thanks guys,
Yes I am running with the new web interface. (no hadoop)

I will deploy the new jars once your PR is merged Sachin, and we'll see :)

Cheers,
Gyula

Sachin Goel <[hidden email]> ezt írta (időpont: 2015. nov. 15.,
V, 13:43):

> I used to get a similar exception [I do not remember if the stack
> trace was *exactly
> *the same but it was from the web interface, and was due to the *connection
> reset by peer *]. Currently, the web interface does not handle
> exceptionCaught events cleanly.
> One of my PRs has addressed this by adding a exception handler at the end
> of pipeline. https://github.com/apache/flink/pull/1338
>
>
>
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>
> On Sun, Nov 15, 2015 at 5:48 PM, Ufuk Celebi <[hidden email]> wrote:
>
> > This Exception was not thrown by the data exchange component. This is
> > confirmed by the stack trace you have shared. It shows the
> > DefaultThreadFactory, which we don’t use for the data exchange. Any
> > Exception thrown there will actually fail the program.
> >
> > My best guess is that this was thrown by the new web interface. Was it
> > running with your job?
> >
> > My second best guess is that it was thrown by another component running
> > Netty (maybe a Hadoop client?).
> >
> > – Ufuk
> >
> > PS Thanks for sharing the logs with me. :)
> >
> > > On 14 Nov 2015, at 18:14, Gyula Fóra <[hidden email]> wrote:
> > >
> > > Hi guys,
> > >
> > > I have a Flink Streaming job running for about a day now without any
> > errors
> > > and then I got this in the job manager log:
> > >
> > > 15:37:49,905 WARN io.netty.channel.DefaultChannelPipeline
> > > - An exceptionCaught() event was fired, and it reached at
> > > the tail of the pipeline. It usually means the last handler in the
> > > pipeline did not handle the exception.
> > > java.io.IOException: Connection reset by peer
> > > at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> > > at
> >
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
> > > at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > > at
> >
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > > at
> >
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > > at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > > at
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
> > > at
> >
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> > > at java.lang.Thread.run(Thread.java:745)
> > >
> > >
> > > After this the job did not fail it keeps running. This happened twice.
> > > Can anyone tell me what might cause this exception?
> > >
> > >
> > > Cheers,
> > >
> > > Gyula
> >
> >
>