Hi there, We were using flink 1.11.2 in production with a large setting. The job runs fine for a couple of days and ends up with a restart loop caused by YARN container memory kill. This is not observed while running against 1.9.1 with the same setting. Here is JVM environment passed to 1.11 as well as 1.9.1 job
After primitive investigation, we found this might not be related to jvm heap space usage nor gc issue. Meanwhile, we observed jvm non heap usage on some containers keep rising while job fails into restart loop as stated below. From a configuration perspective, we would like to learn how the task manager handles classloading and (unloading?) when we set include-user-jar to first. Is there suggestions how we can have a better understanding of how the new memory model introduced in 1.10 affects this issue? cluster.evenly-spread-out-slots: true zookeeper.sasl.disable: true yarn.per-job-cluster.include-user-jar: first yarn.properties-file.location: /usr/local/hadoop/etc/hadoop/ Thanks, Chen |
Hi Chen,
You are right that Flink changed its memory model with Flink 1.10. Now the memory model is better defined and stricter. You can find information about it here [1]. For some pointers towards potential problems please take a look at [2]. What you need to do is to figure out where the non-heap memory is allocated. Maybe you are using a library which leaks some memory. Maybe your code requires more non-heap memory than you have configured the system with. If you are using the per-job mode on Yarn without yarn.per-job-cluster.include-user-jar: disabled, then you should not have any classloader leak problems because the user code should be part of the system classpath. If you set yarn.per-job-cluster.include-user-jar: disabled, then the TaskExecutor will create a user code class loader and keep it as long as the TaskExecutor still has some slots allocated for the job. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_setup_tm.html [2] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/memory/mem_trouble.html Cheers, Till On Sun, Mar 14, 2021 at 12:22 AM Chen Qin <[hidden email]> wrote: > Hi there, > > We were using flink 1.11.2 in production with a large setting. The job > runs fine for a couple of days and ends up with a restart loop caused by > YARN container memory kill. This is not observed while running against > 1.9.1 with the same setting. > Here is JVM environment passed to 1.11 as well as 1.9.1 job > > > env.java.opts.taskmanager: '-XX:+UseG1GC -XX:MaxGCPauseMillis=500 >> -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5 >> -XX:InitiatingHeapOccupancyPercent=45 -XX:NewRatio=1 >> -XX:+PrintClassHistogram -XX:+PrintGCDateStamps -XX:+PrintGCDetails >> -XX:+PrintGCApplicationStoppedTime -Xloggc:<LOG_DIR>/gc.log' >> env.java.opts.jobmanager: '-XX:+UseG1GC -XX:MaxGCPauseMillis=500 >> -XX:ParallelGCThreads=20 -XX:ConcGCThreads=5 >> -XX:InitiatingHeapOccupancyPercent=45 -XX:NewRatio=1 >> -XX:+PrintClassHistogram -XX:+PrintGCDateStamps -XX:+PrintGCDetails >> -XX:+PrintGCApplicationStoppedTime -Xloggc:<LOG_DIR>/gc.log' >> > > After primitive investigation, we found this might not be related to jvm > heap space usage nor gc issue. Meanwhile, we observed jvm non heap usage on > some containers keep rising while job fails into restart loop as stated > below. > [image: image.png] > > From a configuration perspective, we would like to learn how the task > manager handles classloading and (unloading?) when we set include-user-jar > to first. Is there suggestions how we can have a better understanding of > how the new memory model introduced in 1.10 affects this issue? > > > cluster.evenly-spread-out-slots: true > zookeeper.sasl.disable: true > yarn.per-job-cluster.include-user-jar: first > yarn.properties-file.location: /usr/local/hadoop/etc/hadoop/ > > > Thanks, > Chen > > |
Hi Till,
We did some investigation and found this memory usage point to rocksdbstatebackend running on managed memory. So far we have seen this bug in rocksdbstatebackend on managed memory. we followed suggestion [1] and disabled managed memory management so far not seeing issue. I felt this might be a major bug since we run flink 1.11.2 with managed RocksDBstatebackend in mulitple large production jobs and consistency repo yarn kill after job runs a period of time. [1] http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Debugging-quot-Container-is-running-beyond-physical-memory-limits-quot-on-YARN-for-a-long-running-stb-td38227.html -- Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ |
Hi Chen,
with version 1.10 Flink introduced that RocksDB uses Flink's managed memory [1]. This shall prevent RocksDB from exceeding the memory limits of a process/container. Unfortunately, this is not yet perfect due to a problem in RocksDB [2]. Due to this fact, RocksDB can still exceed the managed memory budget. What you could do is to configure a higher off-heap size for your tasks via taskmanager.memory.task.off-heap.size to compensate for this. I also pull in Yu Li who can tell you more about the current limitations of the memory limitation for RocksDB. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/state/state_backends.html#memory-management [2] https://issues.apache.org/jira/browse/FLINK-15532 Cheers, Till On Tue, Mar 30, 2021 at 7:36 PM chenqin <[hidden email]> wrote: > Hi Till, > > We did some investigation and found this memory usage point to > rocksdbstatebackend running on managed memory. So far we have seen this bug > in rocksdbstatebackend on managed memory. we followed suggestion [1] and > disabled managed memory management so far not seeing issue. > > I felt this might be a major bug since we run flink 1.11.2 with managed > RocksDBstatebackend in mulitple large production jobs and consistency repo > yarn kill after job runs a period of time. > > [1] > > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Debugging-quot-Container-is-running-beyond-physical-memory-limits-quot-on-YARN-for-a-long-running-stb-td38227.html > > > > > > -- > Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/ > |
Free forum by Nabble | Edit this page |