http://deprecated-apache-flink-mailing-list-archive.368.s1.nabble.com/DISCUSS-FLIP-170-Adding-Checkpoint-Rejection-Mechanism-tp51212p51269.html
Thanks for the proposal. I have a couple of questions.
to trigger/initiate a checkpoint, instead of declining it. Could it be made
> Here is some brief context about the new feature.
>
> 1. Actively checkpoint rejecting by the operator. Follow by the current
> checkpoint mechanism, one more preliminary step is added to help the
> operator determine that if it is able to take snapshots. The preliminary
> step is a new API provided to the users/developers. The new API will be
> implemented in the Source API (the new one based on FLIP-27) for CDC
> implementation. The new API can also be implemented in other operator if
> necessary.
>
> 2. Handling the failure returned from the operator. If the checkpoint is
> rejected by the operator, an appropriate failure reason needs to be
> returned
> from the operator as well. In the current design, two failure reasons are
> listed, soft failure and hard failure. The previous one would be ignored by
> the Flink and the later one would be counted as continuous checkpoint
> failure according to the current checkpoint failure manager mechanism.
>
> 3. To prevent that the operator keeps reporting soft failure and therefore
> no checkpoint can be completed for a long time, we introduce a new
> configuration about the tolerable checkpoint failure timeout, which is a
> timer that starts with the checkpoint scheduler. Overall, the timer would
> only be reset if and only if the checkpoint completes. Otherwise, it would
> do nothing until the tolerable timeout is hit. If the timer rings, it would
> then trigger the current checkpoint failover.
>
> Question:
> a. According to the current design, the checkpoint might fail for a
> possibly
> long time with a large checkpoint interval, for example. Is there any
> better
> idea to make the checkpoint more likely to succeed? For example, trigger
> the
> checkpoint immediately after the last one is rejected. But it seems
> unappropriate because that would increase the overhead.
> b. Is there any better idea on handling the soft failure?
>
>
>
>
>
> --
> Sent from:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/>