Nico Kruber created FLINK-18962:
-----------------------------------
Summary: Improve error message if checkpoint directory is not writable
Key: FLINK-18962
URL:
https://issues.apache.org/jira/browse/FLINK-18962 Project: Flink
Issue Type: Improvement
Components: Runtime / Checkpointing
Affects Versions: 1.11.1
Reporter: Nico Kruber
If the checkpoint directory from {{state.checkpoints.dir}} is not writable by the user that Flink is running with, checkpoints will be declined, but the real cause is not mentioned anywhere:
* the Web UI says: "Cause: The job has failed" (the Flink job is running though)
* the JM log says:
{code}
2020-08-14 12:13:18,820 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 2 (type=CHECKPOINT) @ 1597399998819 for job 2c567b14e8d0833404931ef47dfec266.
2020-08-14 12:13:18,921 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Decline checkpoint 2 by task 0d4fd75374ad16c8d963679e3c2171ec of job 2c567b14e8d0833404931ef47dfec266 at a184deea621e3923fbfcb1d899348448 @ Nico-PC.lan (dataPort=35531).
{code}
* the TM log says:
{code}
2020-08-14 12:13:14,102 INFO org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl [] - Checkpoint 1 has been notified as aborted, would not trigger any checkpoint.
{code}
And that's it. It should have a real error message indicating that the checkpoint (sub)-directory could not be created.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)