[jira] [Created] (FLINK-13245) Network stack is leaking files

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-13245) Network stack is leaking files

Shang Yuanchun (Jira)
Chesnay Schepler created FLINK-13245:
----------------------------------------

             Summary: Network stack is leaking files
                 Key: FLINK-13245
                 URL: https://issues.apache.org/jira/browse/FLINK-13245
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Network
    Affects Versions: 1.9.0
            Reporter: Chesnay Schepler
            Assignee: Chesnay Schepler
             Fix For: 1.9.0


There's file leak in the network stack / shuffle service.

When running the {{SlotCountExceedingParallelismTest}} on Windows a large number of {{.channel}} files continue to reside in a {{flink-netty-shuffle-XXX}} directory.

From what I've gathered so far these files are still being used by a {{BoundedBlockingSubpartition}}. The cleanup logic in this class uses ref-counting to ensure we don't release data while a reader is still present. However, at the end of the job this count has not reached 0, and thus nothing is being released.

The same issue is also present on the {{ResultPartition}} level; the {{ReleaseOnConsumptionResultPartition}} also are being released while the ref-count is greater than 0.

Overall it appears like there's some issue with the notifications for partitions being consumed.

It is feasible that this issue has recently caused issues on Travis where the build were failing due to a lack of disk space.




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)