[jira] [Created] (FLINK-17645) REAPER_THREAD in SafetyNetCloseableRegistry start() failed, causing the repeated failover.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-17645) REAPER_THREAD in SafetyNetCloseableRegistry start() failed, causing the repeated failover.

Shang Yuanchun (Jira)
Zakelly Lan created FLINK-17645:
-----------------------------------

             Summary: REAPER_THREAD in SafetyNetCloseableRegistry start() failed, causing the repeated failover.
                 Key: FLINK-17645
                 URL: https://issues.apache.org/jira/browse/FLINK-17645
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Task
    Affects Versions: 1.6.3
            Reporter: Zakelly Lan


I'm running a modified version of Flink, and encountered the exception below when task start:

 
{code:java}
2020-05-12 00:46:19,037 ERROR [***] org.apache.flink.runtime.taskmanager.Task   - Encountered an unexpected exception
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:802)
        at org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:73)
        at org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
        at java.lang.Thread.run(Thread.java:834)
2020-05-12 00:46:19,038 INFO  [***] org.apache.flink.runtime.taskmanager.Task
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:802)
        at org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:73)
        at org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
        at java.lang.Thread.run(Thread.java:834)
{code}
 

The REAPER_THREAD.start() fails because of OOM, and REAPER_THREAD will never be null. Since then, every time SafetyNetCloseableRegistry init in this VM will cause an IllegalStateException:

 
{code:java}
java.lang.IllegalStateException
        at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
        at org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:71)
        at org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
        at java.lang.Thread.run(Thread.java:834){code}
 

This may happen in very old version of Flink as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)