[jira] [Created] (FLINK-20993) Cleaning up checkpoint during shutdown may fail JM

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-20993) Cleaning up checkpoint during shutdown may fail JM

Shang Yuanchun (Jira)
Roman Khachatryan created FLINK-20993:
-----------------------------------------

             Summary: Cleaning up checkpoint during shutdown may fail JM
                 Key: FLINK-20993
                 URL: https://issues.apache.org/jira/browse/FLINK-20993
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.12.0, 1.13.0
            Reporter: Roman Khachatryan


As reported in http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Uncaught-exception-in-FatalExitExceptionHandler-causing-JM-crash-while-canceling-job-td40627.html

{code}
stack: java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask@41554407 rejected from java.util.concurrent.ScheduledThreadPoolExecutor@5d0ec6f7[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 25977]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:326)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622)
at java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:668)
at org.apache.flink.runtime.concurrent.ScheduledExecutorServiceAdapter.execute(ScheduledExecutorServiceAdapter.java:62)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.scheduleTriggerRequest(CheckpointCoordinator.java:1152)
at org.apache.flink.runtime.checkpoint.CheckpointsCleaner.lambda$cleanCheckpoint$0(CheckpointsCleaner.java:58)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

Supposedly, CheckpointsCleaner enques a runnable to an executor while the latter is being shutdown. This causes RejectedExecutionException which propagates to FatalErrorHandler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)