[jira] [Created] (FLINK-14339) The checkpoint ID count wrong on restore savepoint log

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14339) The checkpoint ID count wrong on restore savepoint log

Shang Yuanchun (Jira)
king's uncle created FLINK-14339:
------------------------------------

             Summary: The checkpoint ID count wrong on restore savepoint log
                 Key: FLINK-14339
                 URL: https://issues.apache.org/jira/browse/FLINK-14339
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Checkpointing
    Affects Versions: 1.8.0
            Reporter: king's uncle


I saw the below log when I tested Flink restore from the savepoint.
{code:java}
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Recovering checkpoints from ZooKeeper.
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 0 checkpoints in ZooKeeper.
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying to fetch 0 checkpoints from storage.
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Starting job 00000000000000000000000000000000 from savepoint /nfsdata/ecs/flink-savepoints/flink-savepoint-test/00000000000000000000000000000000/201910080158/savepoint-000000-003c9b080832 (allowing non restored state)
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Reset the checkpoint ID of job 00000000000000000000000000000000 to 12285.
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Recovering checkpoints from ZooKeeper.
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 1 checkpoints in ZooKeeper.
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying to fetch 1 checkpoints from storage.
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying to retrieve checkpoint 12284.
[flink-akka.actor.default-dispatcher-2] INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Restoring job 00000000000000000000000000000000 from latest valid checkpoint: Checkpoint 12284 @ 0 for 00000000000000000000000000000000.
{code}
You can find the final resotre checkpoint ID is 12284, but we can see the log print "Reset the checkpoint ID of job 00000000000000000000000000000000 to 12285". So, I checked the source code.
{code:java}
// Reset the checkpoint ID counter
long nextCheckpointId = savepoint.getCheckpointID() + 1;
checkpointIdCounter.setCount(nextCheckpointId);

LOG.info("Reset the checkpoint ID of job {} to {}.", job, nextCheckpointId);
{code}
I think they should print a checkpoint ID instead of the next checkpoint ID.
{code:java}
LOG.info("Reset the checkpoint ID of job {} to {}.", job, savepoint.getCheckpointID());
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)