(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-21053) Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-21053) Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM

Roman Khachatryan created FLINK-21053:
-----------------------------------------

Summary: Prevent further RejectedExecutionExceptions in CheckpointCoordinator failing JM
Key: FLINK-21053
URL: https://issues.apache.org/jira/browse/FLINK-21053
Project: Flink
Issue Type: Improvement
Components: Runtime / Checkpointing
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
Fix For: 1.13.0

In the past, there were multiple bugs caused by throwing/handling RejectedExecutionException in CheckpointCoordinator (FLINK-18290, FLINK-20992).

And I think it's still possible as there are many places where an executor is passed to calls to CompletableFuture.xxxAsync while it can already be shut down.

In FLINK-20992 we discussed two approaches to fix this.

One approach is to check executor state inside a synchronized block every time when it is used.

Second approach is to
# Create executors inside CheckpointCoordinator (both io & timer thread pools)
# Check isShutdown() in their error handlers (if yes and it's RejectedExecutionException then just log; otherwise delegate to FatalExitExceptionHandler)
# (this will allow to remove such RejectedExecutionException checks from coordinator code)

--
This message was sent by Atlassian Jira
(v8.3.4#803005)