(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-18748) Savepoint would be queued unexpected

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-18748) Savepoint would be queued unexpected

Congxian Qiu(klion26) created FLINK-18748:
---------------------------------------------

Summary: Savepoint would be queued unexpected
Key: FLINK-18748
URL: https://issues.apache.org/jira/browse/FLINK-18748
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing
Affects Versions: 1.11.1, 1.11.0
Reporter: Congxian Qiu(klion26)

After FLINK-17342, when triggering a checkpoint/savepoint, we'll check whether the request can be triggered in {{CheckpointRequestDecider#chooseRequestToExecute}}, the logic is as follow:
{code:java}
Preconditions.checkState(Thread.holdsLock(lock));
// 1.
if (isTriggering || queuedRequests.isEmpty()) {
return Optional.empty();
}

// 2 too many ongoing checkpoitn/savepoint
if (pendingCheckpointsSizeSupplier.get() >= maxConcurrentCheckpointAttempts) {
return Optional.of(queuedRequests.first())
.filter(CheckpointTriggerRequest::isForce)
.map(unused -> queuedRequests.pollFirst());
}

// 3 check the timestamp of last complete checkpoint
long nextTriggerDelayMillis = nextTriggerDelayMillis(lastCompletionMs);
if (nextTriggerDelayMillis > 0) {
return onTooEarly(nextTriggerDelayMillis);
}

return Optional.of(queuedRequests.pollFirst());
{code}
But if currently {{pendingCheckpointsSizeSupplier.get()}} < {{maxConcurrentCheckpointAttempts}}, and the request is a savepoint, the savepoint will still wait some time in step 3.

I think we should trigger the savepoint immediately if {{pendingCheckpointSizeSupplier.get()}} < {{maxConcurrentCheckpointAttempts}}.

--
This message was sent by Atlassian Jira
(v8.3.4#803005)