Stephan Ewen created FLINK-18429:
------------------------------------
Summary: Add default method for CheckpointListener.notifyCheckpointAborted(checkpointId)
Key: FLINK-18429
URL:
https://issues.apache.org/jira/browse/FLINK-18429 Project: Flink
Issue Type: Bug
Components: API / DataStream
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Fix For: 1.11.0
The {{CheckpointListener}} interface is implemented by many users. Adding a new method {{notifyCheckpointAborted(long)}} to the interface without a default method breaks many user programs.
We should turn this method into a default method:
- Avoid breaking programs
- It is in practice less relevant for programs to react to checkpoints being aborted then to being completed. The reason is that on completion you often want to commit side-effects, while on abortion you frequently do not do anything, but let the next successful checkpoint commit all changes up to then.
*Original Confusion*
There was confusion about this originally, going back to a comment by myself suggesting this should not be a default method, incorrectly thinking of it as an internal interface:
https://github.com/apache/flink/pull/8693#issuecomment-542834147See also clarification email on the mailing list:
{noformat}
About the "notifyCheckpointAborted()":
When I wrote that comment, I was (apparently wrongly) assuming we were talking about an internal interface here, because the "abort" signal was originally only intended to cancel the async part of state backend checkpoints.
I just realized that this is exposed to users - and I am actually with Thomas on this one. The "CheckpointListener" is a very public interface that many users implement. The fact that it is tagged "@PublicEvolving" is somehow not aligned with reality. So adding the method here will in reality break lots and lots of user programs.
I think also in practice it is much less relevant for user applications to react to aborted checkpoints. Since the notifications there can not be relied upon (if there is a task failure concurrently) users always have to follow the "newer checkpoint subsumes older checkpoint" contract, so the abort method is probably rarely relevant.
This is something we should change, in my opinion.
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)