[jira] [Created] (FLINK-14035) Introduce/Change some log for snapshot to better analysis checkpoint problem

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14035) Introduce/Change some log for snapshot to better analysis checkpoint problem

Shang Yuanchun (Jira)
Congxian Qiu(klion26) created FLINK-14035:
---------------------------------------------

             Summary: Introduce/Change some log for snapshot to better analysis checkpoint problem
                 Key: FLINK-14035
                 URL: https://issues.apache.org/jira/browse/FLINK-14035
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Checkpointing
    Affects Versions: 1.10.0
            Reporter: Congxian Qiu(klion26)


Currently, the information for checkpoint are mostly debug log (especially on TM side). If we want to track where the checkpoint steps and consume time during each step when we have a failed checkpoint or the checkpoint time is too long, we need to restart the job with enabling debug log, this issue wants to improve this situation, wants to change some exist debug log from debug to info, and add some more debug log.  we have changed this log level in our production in Alibaba, and it seems no problem until now.

 

Detail
{{change the log below from debug level to info}} 
 * log about \{{Starting checkpoint xxx }} in TM  side
 * log about Sync complete in TM  side
 * log about async compete in TM  side

Add debug log 
 *  log about receiving the barrier  for exactly once mode  - align from at lease once mode

 

If this issue is valid, then I'm happy to contribute it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)