ChangZhuo Chen (陳昌倬) created FLINK-16267:
-------------------------------------------- Summary: Flink uses more memory than taskmanager.memory.process.size in Kubernetes Key: FLINK-16267 URL: https://issues.apache.org/jira/browse/FLINK-16267 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.10.0 Environment: * Dockerized Flink 1.10.0, with the following docker file. {{FROM flink:1.10-scala_2.11}} {{RUN mkdir -p /opt/flink/plugins/s3 && \}} {{ ln -s /opt/flink/opt/flink-s3-fs-presto-1.10.0.jar /opt/flink/plugins/s3/}} {{RUN ln -s /opt/flink/opt/flink-metrics-prometheus-1.10.0.jar /opt/flink/lib/}} Reporter: ChangZhuo Chen (陳昌倬) This issue is from [https://stackoverflow.com/questions/60336764/flink-uses-more-memory-than-taskmanager-memory-process-size-in-kubernetes] In Flink 1.10.0, we try to use `taskmanager.memory.process.size` to limit the resource used by taskmanager to ensure they are not killed by Kubernetes. However, we still get lots of taskmanager `OOMKilled` with the following setup. * The Kubernetes setup is the same as described in https://ci.apache.org/projects/flink/flink-docs-release-1.10/ops/deployment/kubernetes.html. * The following is resource configuration for taskmanager deployment in Kubernetes: {{resources:}} {{ requests:}} {{ cpu: 1000m}} {{ memory: 4096Mi}} {{ limits:}} {{ cpu: 1000m}} {{ memory: 4096Mi}} * The following are all memory related configurations in `flink-conf.yaml` in 1.10.0: {{jobmanager.heap.size: 820m}} {{taskmanager.memory.jvm-metaspace.size: 128m}} {{taskmanager.memory.process.size: 4096m}} * We use RocksDB and we don't set `state.backend.rocksdb.memory.managed` in `flink-conf.yaml`. ** Use S3 as checkpoint storage. * The code uses DateStream API ** input/output are both Kafka. * The following is our dependencies FYI. {{val flinkVersion = "1.10.0"}}{{libraryDependencies += "com.squareup.okhttp3" % "okhttp" % "4.2.2"}} {{libraryDependencies += "com.typesafe" % "config" % "1.4.0"}} {{libraryDependencies += "joda-time" % "joda-time" % "2.10.5"}} {{libraryDependencies += "org.apache.flink" %% "flink-connector-kafka" % flinkVersion}} {{libraryDependencies += "org.apache.flink" % "flink-metrics-dropwizard" % flinkVersion}} {{libraryDependencies += "org.apache.flink" %% "flink-scala" % flinkVersion % "provided"}} {{libraryDependencies += "org.apache.flink" %% "flink-statebackend-rocksdb" % flinkVersion % "provided"}} {{libraryDependencies += "org.apache.flink" %% "flink-streaming-scala" % flinkVersion % "provided"}} {{libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.7"}} {{libraryDependencies += "org.log4s" %% "log4s" % "1.8.2"}} {{libraryDependencies += "org.rogach" %% "scallop" % "3.3.1"}} * The configuration we used in Flink 1.9.1 are the following. It does not have `OOMKilled`. * Kubernetes {{resources:}} {{ requests:}} {{ cpu: 1200m}} {{ memory: 2G}} {{ limits:}} {{ cpu: 1500m}} {{ memory: 2G}} * Flink 1.9.1 {{jobmanager.heap.size: 820m}} {{taskmanager.heap.size: 1024m}} -- This message was sent by Atlassian Jira (v8.3.4#803005) |
Free forum by Nabble | Edit this page |