Zakelly Lan created FLINK-17645:
-----------------------------------
Summary: REAPER_THREAD in SafetyNetCloseableRegistry start() failed, causing the repeated failover.
Key: FLINK-17645
URL:
https://issues.apache.org/jira/browse/FLINK-17645 Project: Flink
Issue Type: Bug
Components: Runtime / Task
Affects Versions: 1.6.3
Reporter: Zakelly Lan
I'm running a modified version of Flink, and encountered the exception below when task start:
{code:java}
2020-05-12 00:46:19,037 ERROR [***] org.apache.flink.runtime.taskmanager.Task - Encountered an unexpected exception
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:802)
at org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:73)
at org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
at java.lang.Thread.run(Thread.java:834)
2020-05-12 00:46:19,038 INFO [***] org.apache.flink.runtime.taskmanager.Task
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:802)
at org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:73)
at org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
at java.lang.Thread.run(Thread.java:834)
{code}
The REAPER_THREAD.start() fails because of OOM, and REAPER_THREAD will never be null. Since then, every time SafetyNetCloseableRegistry init in this VM will cause an IllegalStateException:
{code:java}
java.lang.IllegalStateException
at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179)
at org.apache.flink.core.fs.SafetyNetCloseableRegistry.<init>(SafetyNetCloseableRegistry.java:71)
at org.apache.flink.core.fs.FileSystemSafetyNet.initializeSafetyNetForThread(FileSystemSafetyNet.java:89)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:586)
at java.lang.Thread.run(Thread.java:834){code}
This may happen in very old version of Flink as well.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)