(DEPRECATED) Apache Flink Mailing List archive.

Containers are not released after job failed

Classic

List

Threaded

1 message

刘建刚

Containers are not released after job failed

I run flink 1.6.2 on yarn. At some time, job is failed becuase of: org.apache.flink.util.FlinkException: The assigned slot container_e708_1555051789618_2644286_01_000061_0 was removed

Then the job restarts. After some time, the container container_e708_1555051789618_2644286_01_000061 is still not released.

The log of container_e708_1555051789618_2644286_01_000061 is as following:

The log shows that two tasks are canceled before successful registration at resource manager and one is canceled after registration. After five minutes, the container registers again. At last, the container is alive but not used.

Anyone have any idea about this problem. Thank you.