Gary Yao created FLINK-17794:
--------------------------------
Summary: Tear down installed software in reverse order in Jepsen Tests
Key: FLINK-17794
URL:
https://issues.apache.org/jira/browse/FLINK-17794 Project: Flink
Issue Type: Bug
Components: Tests
Affects Versions: 1.10.1, 1.11.0
Reporter: Gary Yao
Assignee: Gary Yao
Fix For: 1.11.0
Tear down installed software in reverse order in Jepsen Tests. This mitigates the issue that sometimes hadoop's node manager directories cannot be removed using {{rm -rf}} because Flink processes keep running and generate files after the YARN NodeManager is shut down. {{rm -r}} removes files recursively but if files are created in the background concurrently, the command can still fail with a non-zero exit code.
{noformat}
sh -c \"cd /; rm -rf /opt/hadoop\"", :exit 1, :out "", :err "rm: cannot remove '/opt/hadoop/tmp/nm-local-dir/usercache/root/appcache/application_1587567275082_0001/flink-io-3587fdbb-15be-4482-94f2-338bfe6b1acc/job_77be6dd9f1b2aa218348e8b8a2512660_op_StreamMap_5271c210329e73bd743f3227edfb3b71__27_30__uuid_02dbbf1e-d2d5-43e8-ab34-040345f96476/db': Directory not empty\nrm: cannot remove '/opt/hadoop/tmp/nm-local-dir/usercache/root/appcache/application_1587567275082_0001/flink-io-d14f2078-74ee-4b8b-aafe-4299577f214f/job_77be6dd9f1b2aa218348e8b8a2512660_op_StreamMap_7d23c6ceabda05a587f0217e44f21301__17_30__uuid_2de2b67d-0767-4e32-99f0-ddd291460947/db': Directory not empty
{noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)