Connection reset by peer

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Connection reset by peer

Gyula Fóra-2
Hi guys,

I have a Flink Streaming job running for about a day now without any errors
and then I got this in the job manager log:

15:37:49,905 WARN  io.netty.channel.DefaultChannelPipeline
          - An exceptionCaught() event was fired, and it reached at
the tail of the pipeline. It usually means the last handler in the
pipeline did not handle the exception.
java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
        at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
        at java.lang.Thread.run(Thread.java:745)


After this the job did not fail it keeps running. This happened twice.
Can anyone tell me what might cause this exception?


Cheers,

Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Connection reset by peer

Ufuk Celebi-2
Can you please share the complete logs with me? Uce at apache org ;)

On Saturday, 14 November 2015, Gyula Fóra <[hidden email]> wrote:

> Hi guys,
>
> I have a Flink Streaming job running for about a day now without any errors
> and then I got this in the job manager log:
>
> 15:37:49,905 WARN  io.netty.channel.DefaultChannelPipeline
>           - An exceptionCaught() event was fired, and it reached at
> the tail of the pipeline. It usually means the last handler in the
> pipeline did not handle the exception.
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>         at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>         at
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
>         at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
>         at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
>         at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
>         at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
>         at java.lang.Thread.run(Thread.java:745)
>
>
> After this the job did not fail it keeps running. This happened twice.
> Can anyone tell me what might cause this exception?
>
>
> Cheers,
>
> Gyula
>
Reply | Threaded
Open this post in threaded view
|

Re: Connection reset by peer

Ufuk Celebi-2
In reply to this post by Gyula Fóra-2
This Exception was not thrown by the data exchange component. This is confirmed by the stack trace you have shared. It shows the DefaultThreadFactory, which we don’t use for the data exchange. Any Exception thrown there will actually fail the program.

My best guess is that this was thrown by the new web interface. Was it running with your job?

My second best guess is that it was thrown by another component running Netty (maybe a Hadoop client?).

– Ufuk

PS Thanks for sharing the logs with me. :)

> On 14 Nov 2015, at 18:14, Gyula Fóra <[hidden email]> wrote:
>
> Hi guys,
>
> I have a Flink Streaming job running for about a day now without any errors
> and then I got this in the job manager log:
>
> 15:37:49,905 WARN  io.netty.channel.DefaultChannelPipeline
>          - An exceptionCaught() event was fired, and it reached at
> the tail of the pipeline. It usually means the last handler in the
> pipeline did not handle the exception.
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
> at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
> at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> at java.lang.Thread.run(Thread.java:745)
>
>
> After this the job did not fail it keeps running. This happened twice.
> Can anyone tell me what might cause this exception?
>
>
> Cheers,
>
> Gyula

Reply | Threaded
Open this post in threaded view
|

Re: Connection reset by peer

Sachin Goel
I used to get a similar exception [I do not remember if the stack
trace was *exactly
*the same but it was from the web interface, and was due to the *connection
reset by peer *]. Currently, the web interface does not handle
exceptionCaught events cleanly.
One of my PRs has addressed this by adding a exception handler at the end
of pipeline. https://github.com/apache/flink/pull/1338



-- Sachin Goel
Computer Science, IIT Delhi
m. +91-9871457685

On Sun, Nov 15, 2015 at 5:48 PM, Ufuk Celebi <[hidden email]> wrote:

> This Exception was not thrown by the data exchange component. This is
> confirmed by the stack trace you have shared. It shows the
> DefaultThreadFactory, which we don’t use for the data exchange. Any
> Exception thrown there will actually fail the program.
>
> My best guess is that this was thrown by the new web interface. Was it
> running with your job?
>
> My second best guess is that it was thrown by another component running
> Netty (maybe a Hadoop client?).
>
> – Ufuk
>
> PS Thanks for sharing the logs with me. :)
>
> > On 14 Nov 2015, at 18:14, Gyula Fóra <[hidden email]> wrote:
> >
> > Hi guys,
> >
> > I have a Flink Streaming job running for about a day now without any
> errors
> > and then I got this in the job manager log:
> >
> > 15:37:49,905 WARN  io.netty.channel.DefaultChannelPipeline
> >          - An exceptionCaught() event was fired, and it reached at
> > the tail of the pipeline. It usually means the last handler in the
> > pipeline did not handle the exception.
> > java.io.IOException: Connection reset by peer
> >       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >       at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> >       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> >       at
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
> >       at
> io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> >       at
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> >       at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> >       at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> >       at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> >       at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> >       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> >       at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
> >       at
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> >       at java.lang.Thread.run(Thread.java:745)
> >
> >
> > After this the job did not fail it keeps running. This happened twice.
> > Can anyone tell me what might cause this exception?
> >
> >
> > Cheers,
> >
> > Gyula
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Connection reset by peer

Gyula Fóra
Thanks guys,
Yes I am running with the new web interface. (no hadoop)

I will deploy the new jars once your PR is merged Sachin, and we'll see :)

Cheers,
Gyula

Sachin Goel <[hidden email]> ezt írta (időpont: 2015. nov. 15.,
V, 13:43):

> I used to get a similar exception [I do not remember if the stack
> trace was *exactly
> *the same but it was from the web interface, and was due to the *connection
> reset by peer *]. Currently, the web interface does not handle
> exceptionCaught events cleanly.
> One of my PRs has addressed this by adding a exception handler at the end
> of pipeline. https://github.com/apache/flink/pull/1338
>
>
>
> -- Sachin Goel
> Computer Science, IIT Delhi
> m. +91-9871457685
>
> On Sun, Nov 15, 2015 at 5:48 PM, Ufuk Celebi <[hidden email]> wrote:
>
> > This Exception was not thrown by the data exchange component. This is
> > confirmed by the stack trace you have shared. It shows the
> > DefaultThreadFactory, which we don’t use for the data exchange. Any
> > Exception thrown there will actually fail the program.
> >
> > My best guess is that this was thrown by the new web interface. Was it
> > running with your job?
> >
> > My second best guess is that it was thrown by another component running
> > Netty (maybe a Hadoop client?).
> >
> > – Ufuk
> >
> > PS Thanks for sharing the logs with me. :)
> >
> > > On 14 Nov 2015, at 18:14, Gyula Fóra <[hidden email]> wrote:
> > >
> > > Hi guys,
> > >
> > > I have a Flink Streaming job running for about a day now without any
> > errors
> > > and then I got this in the job manager log:
> > >
> > > 15:37:49,905 WARN  io.netty.channel.DefaultChannelPipeline
> > >          - An exceptionCaught() event was fired, and it reached at
> > > the tail of the pipeline. It usually means the last handler in the
> > > pipeline did not handle the exception.
> > > java.io.IOException: Connection reset by peer
> > >       at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >       at sun.nio.ch.IOUtil.read(IOUtil.java:192)
> > >       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> > >       at
> >
> io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)
> > >       at
> > io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
> > >       at
> >
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
> > >       at
> >
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
> > >       at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> > >       at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> > >       at
> >
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> > >       at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> > >       at
> >
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)
> > >       at
> >
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
> > >       at java.lang.Thread.run(Thread.java:745)
> > >
> > >
> > > After this the job did not fail it keeps running. This happened twice.
> > > Can anyone tell me what might cause this exception?
> > >
> > >
> > > Cheers,
> > >
> > > Gyula
> >
> >
>