[jira] [Created] (FLINK-18962) Improve error message if checkpoint directory is not writable

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-18962) Improve error message if checkpoint directory is not writable

Shang Yuanchun (Jira)
Nico Kruber created FLINK-18962:
-----------------------------------

             Summary: Improve error message if checkpoint directory is not writable
                 Key: FLINK-18962
                 URL: https://issues.apache.org/jira/browse/FLINK-18962
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Checkpointing
    Affects Versions: 1.11.1
            Reporter: Nico Kruber


If the checkpoint directory from {{state.checkpoints.dir}} is not writable by the user that Flink is running with, checkpoints will be declined, but the real cause is not mentioned anywhere:

* the Web UI says: "Cause: The job has failed" (the Flink job is running though)
* the JM log says:
{code}
2020-08-14 12:13:18,820 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Triggering checkpoint 2 (type=CHECKPOINT) @ 1597399998819 for job 2c567b14e8d0833404931ef47dfec266.
2020-08-14 12:13:18,921 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator    [] - Decline checkpoint 2 by task 0d4fd75374ad16c8d963679e3c2171ec of job 2c567b14e8d0833404931ef47dfec266 at a184deea621e3923fbfcb1d899348448 @ Nico-PC.lan (dataPort=35531).
{code}
* the TM log says:
{code}
2020-08-14 12:13:14,102 INFO  org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - Checkpoint 1 has been notified as aborted, would not trigger any checkpoint.
{code}

And that's it. It should have a real error message indicating that the checkpoint (sub)-directory could not be created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)