[jira] [Created] (FLINK-13633) Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of high-availability storage

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-13633) Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of high-availability storage

Shang Yuanchun (Jira)
Yang Wang created FLINK-13633:
---------------------------------

             Summary: Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of  high-availability storage
                 Key: FLINK-13633
                 URL: https://issues.apache.org/jira/browse/FLINK-13633
             Project: Flink
          Issue Type: New Feature
            Reporter: Yang Wang


Currently, if we enable the high-availability, the ha storage directory structure is stored as below. The submittedJobGraph and completedCheckpoint are directly stored under the ha storage path. It is reasonable when the flink cluster finished normally. However, when the Yarn application is failed or killed, the submittedJobGraph and completedCheckpoint will exist there forever. Even we could not know which flink cluster(Yarn application) they belongs to. So i suggest to move them into application subdirectory. Some external tools could be used to clean up these residual files.

Also, we need to do best effort clean-up before the flink cluster finishes. 

 

Current ha storage directory structure
{code:java}
└── /tmp/flink/ha
    ├── submittedJobGraphxxxx
    ├── completedCheckpointxxxx
    ├── application_xxxx_xxxx
    │   ├── blob{code}
 

The new ha storage directory structure
{code:java}
└── /tmp/flink/ha
    ├── application_xxxx_xxxx
    │   ├── blob
    │   ├── submittedJobGraphxxxx
    │   ├── completedCheckpointxxxx
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)