[jira] [Created] (FLINK-22229) Noise stacktraces if checkpoints fails because sources are not ready

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-22229) Noise stacktraces if checkpoints fails because sources are not ready

Shang Yuanchun (Jira)
Chesnay Schepler created FLINK-22229:
----------------------------------------

             Summary: Noise stacktraces if checkpoints fails because sources are not ready
                 Key: FLINK-22229
                 URL: https://issues.apache.org/jira/browse/FLINK-22229
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Checkpointing, Runtime / Task
            Reporter: Chesnay Schepler


Reported by Stephan Ewen:

When checkpoints are being skipped because not all sources are ready, we log a big stack trace.
This can happen multiple times during job startup (while TMs are starting, pulling artifacts, tasks are being scheduled), and it creates the impression that something is wrong, when this is really a totally normal and expected situation.

We can probably reduce this to an INFO level statement, because this isn't something bad or unexpected, and the stack trace doesn't add any information (there are really only RPC worker methods in the trace).

{code}
7862 [Checkpoint Timer] WARN  org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Failed to trigger checkpoint for job b7476b12c6b7da7b72fa4dfb4fc0741a.)
org.apache.flink.runtime.checkpoint.CheckpointException: Checkpoint triggering task Source: numbers -> Map -> Sink: Data stream collect sink (1/1) of job b7476b12c6b7da7b72fa4dfb4fc0741a has not being executed at the moment. Aborting checkpoint. Failure reason: Not all required tasks are currently running.
        at org.apache.flink.runtime.checkpoint.DefaultCheckpointPlanCalculator.checkTasksStarted(DefaultCheckpointPlanCalculator.java:152) ~[classes/:?]
        at org.apache.flink.runtime.checkpoint.DefaultCheckpointPlanCalculator.lambda$calculateCheckpointPlan$1(DefaultCheckpointPlanCalculator.java:114) ~[classes/:?]
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) ~[?:?]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:440) ~[classes/:?]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:208) ~[classes/:?]
        at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:77) ~[classes/:?]
        at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:158) ~[classes/:?]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) ~[scala-library-2.11.12.jar:?]
        at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) ~[scala-library-2.11.12.jar:?]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) ~[scala-library-2.11.12.jar:?]
        at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) ~[scala-library-2.11.12.jar:?]
        at akka.actor.Actor$class.aroundReceive(Actor.scala:517) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.actor.ActorCell.invoke(ActorCell.scala:561) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.dispatch.Mailbox.run(Mailbox.scala:225) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.dispatch.Mailbox.exec(Mailbox.scala:235) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) ~[akka-actor_2.11-2.5.21.jar:2.5.21]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)