Brian Zhou created FLINK-18641:
---------------------------------- Summary: "Failure to finalize checkpoint" error in MasterTriggerRestoreHook Key: FLINK-18641 URL: https://issues.apache.org/jira/browse/FLINK-18641 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.11.0 Reporter: Brian Zhou https://github.com/pravega/flink-connectors is a Pravega connector for Flink. The ReaderCheckpointHook[1] class uses the Flink `MasterTriggerRestoreHook` interface to trigger the Pravega checkpoint during Flink checkpoints to make sure the data recovery. The checkpoint recovery tests are running fine in Flink 1.10, but it has below issues in Flink 1.11 causing the tests time out. Error stacktrace: {code} 2020-07-09 15:39:39,999 30945 [jobmanager-future-thread-5] WARN o.a.f.runtime.jobmaster.JobMaster - Error while processing checkpoint acknowledgement message org.apache.flink.runtime.checkpoint.CheckpointException: Could not finalize the pending checkpoint 3. Failure reason: Failure to finalize checkpoint. at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1033) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:948) at org.apache.flink.runtime.scheduler.SchedulerBase.lambda$acknowledgeCheckpoint$4(SchedulerBase.java:802) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.flink.util.SerializedThrowable: Pending checkpoint has not been fully acknowledged yet at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195) at org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:298) at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1021) ... 9 common frames omitted {code} More detail in this mailing thread: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Pravega-connector-cannot-recover-from-the-checkpoint-due-to-quot-Failure-to-finalize-checkpoint-quot-td36652.html Also in https://github.com/pravega/flink-connectors/issues/387 -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |