(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-19778) Failed job reinitiated with wrong checkpoint after a ZK reconnection

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-19778) Failed job reinitiated with wrong checkpoint after a ZK reconnection

Paul Lin created FLINK-19778:
--------------------------------

Summary: Failed job reinitiated with wrong checkpoint after a ZK reconnection
Key: FLINK-19778
URL: https://issues.apache.org/jira/browse/FLINK-19778
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing
Affects Versions: 1.11.0
Reporter: Paul Lin
Attachments: jm_log

We have a job of Flink 1.11.0 running on YARN that reached FAILED state due to its jobmanager lost leadership during a ZK full GC. But after the ZK connection was recovered, somehow the job was reinitiated again with no checkpoints found in ZK, and hence used an earlier savepoint to restore the job, which rewound the job unexpectedly.

For details please see the jobmanager logs in the attachment.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)