Hi
The image is not very clear.
For RocksDBStateBackend, do you enable incremental checkpoint?
Currently, checkpoint on TM side contains some steps:
1 barrier align
2 sync snapshot
3 async snapshot
For expired checkpoint, could you please check the tasks in the first operator of the DAG to find out why it timed out.
- is there any backpressure? (affect barrier align)
- is the disk util/network util is high? (affect step 2&3)
- is the task thread is too busy? (this can lead to the barrier processed sometime late)
you can enable the debug log to find out more info.
Hi all,
Why my flink checkpoint always expired, I used RocksDB checkpoint,
and I can’t get any useful messages for this. Could you help me ? Thanks very much.