[jira] [Created] (FLINK-14979) TwoPhaseCommitSink fails when checkpoint overtakes a savepoint

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14979) TwoPhaseCommitSink fails when checkpoint overtakes a savepoint

Shang Yuanchun (Jira)
Piotr Nowojski created FLINK-14979:
--------------------------------------

             Summary: TwoPhaseCommitSink fails when checkpoint overtakes a savepoint
                 Key: FLINK-14979
                 URL: https://issues.apache.org/jira/browse/FLINK-14979
             Project: Flink
          Issue Type: Bug
          Components: API / DataStream
    Affects Versions: 1.9.1, 1.8.2, 1.7.2, 1.6.4
            Reporter: Piotr Nowojski


As [reported by a user on the user mailing list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/UNCHECKED-Error-while-confirming-Checkpoint-td23213.html], {{TwoPhaseCommitSinkFunction#notifyCheckpointComplete}} can fail with the following exception:

{noformat}
java.lang.RuntimeException: Error while confirming checkpoint
    at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1205)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: checkpoint completed, but no transaction pending
    at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
    at org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:267)
    at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyCheckpointComplete(AbstractUdfStreamOperator.java:130)
    at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:822)
    at org.apache.flink.runtime.taskmanager.Task$2.run(Task.java:1200)
    ... 5 more
{noformat}
This can happen in the following scenario:
# savepoint is triggered
# checkpoint is triggered
# checkpoint completes (but it doesn't subsume the savepoint, because checkpoints subsume only other checkpoints).  
# savepoint completes

In this case, {{TwoPhaseCommitSinkFunction}} receives first notification that the later checkpoint completed, it commits both savepoint and the checkpoint. Later when savepoint {{notifyCheckpointComplete}} arrives, the above error will occur.

Possible trivial fix is to remove that failing {{checkState}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)