Roman Khachatryan created FLINK-21053:
-----------------------------------------
Summary: Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM
Key: FLINK-21053
URL:
https://issues.apache.org/jira/browse/FLINK-21053 Project: Flink
Issue Type: Improvement
Components: Runtime / Checkpointing
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
Fix For: 1.13.0
In the past, there were multiple bugs caused by throwing/handling RejectedExecutionException in CheckpointCoordinator (FLINK-18290, FLINK-20992).
And I think it's still possible as there are many places where an executor is passed to calls to CompletableFuture.xxxAsync while it can already be shut down.
In FLINK-20992 we discussed two approaches to fix this.
One approach is to check executor state inside a synchronized block every time when it is used.
Second approach is to
# Create executors inside CheckpointCoordinator (both io & timer thread pools)
# Check isShutdown() in their error handlers (if yes and it's RejectedExecutionException then just log; otherwise delegate to FatalExitExceptionHandler)
# (this will allow to remove such RejectedExecutionException checks from coordinator code)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)