Marcos Klein created FLINK-18352:
------------------------------------ Summary: org.apache.flink.core.execution.DefaultExecutorServiceLoader not thread safe Key: FLINK-18352 URL: https://issues.apache.org/jira/browse/FLINK-18352 Project: Flink Issue Type: Bug Affects Versions: 1.10.0 Reporter: Marcos Klein The singleton nature of the *org.apache.flink.core.execution.DefaultExecutorServiceLoader* class is not thread-safe due to the fact that *java.util.ServiceLoader* class is not thread-safe. [https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/ServiceLoader.html#Concurrency] This can result in *ServiceLoader* class entering into an inconsistent state for processes which attempt to self-heal. This then requires bouncing the process/container in the hopes the race condition does not re-occur. [https://stackoverflow.com/questions/60391499/apache-flink-cannot-find-compatible-factory-for-specified-execution-target-lo] Additionally the following stack traces have been seen when using a *org.apache.flink.streaming.api.environment.RemoteStreamEnvironment* instances. {code:java} java.lang.ArrayIndexOutOfBoundsException: 2 at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:61) at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357) at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393) at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474) at org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706) {code} {code:java} java.util.NoSuchElementException: null at sun.misc.CompoundEnumeration.nextElement(CompoundEnumeration.java:59) at java.util.ServiceLoader$LazyIterator.hasNextService(ServiceLoader.java:357) at java.util.ServiceLoader$LazyIterator.hasNext(ServiceLoader.java:393) at java.util.ServiceLoader$1.hasNext(ServiceLoader.java:474) at org.apache.flink.core.execution.DefaultExecutorServiceLoader.getExecutorFactory(DefaultExecutorServiceLoader.java:60) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1724) at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1706) {code} The workaround for using the ***StreamExecutionEnvironment* implementations is to write a custom implementation of *DefaultExecutorServiceLoader* which is thread-safe and pass that to the *StreamExecutionEnvironment* constructors. -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |