Till Rohrmann created FLINK-6526:
------------------------------------
Summary: BlobStore files might become orphans in case of recovery
Key: FLINK-6526
URL:
https://issues.apache.org/jira/browse/FLINK-6526 Project: Flink
Issue Type: Bug
Components: Distributed Coordination
Affects Versions: 1.3.0, 1.4.0
Reporter: Till Rohrmann
The {{BlobStore}} is used to store {{BlobServer}} files persistently if HA is enabled. The {{BlobLibraryCacheManager}} is responsible for keeping track of a reference count for each file. Once the count is {{0}} the {{BlobLibraryCacheManager}} will eventually delete this file from the {{BlobServer}} and also the {{BlobStore}}. In case of recovery, the {{BlobLibraryCacheManager}} will only recover those files which are actively asked for (e.g. jar files of new job submission or job recovery). All other files which might have had a reference count of {{0}} and were supposed to be eventually deleted, won't be reregistered on the {{BlobLibraryCacheManager}}. Consequently, these files will never be deleted and remain on the BlobStore for all eternity.
I think upon recovery, all files currently being held in the {{BlobStore}} should be re-registered with the {{BlobLibraryCacheManager}} such that they will be eventually deleted once they timed out with a reference count of {{0}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)