Jark Wu created FLINK-20860:
-------------------------------
Summary: Allow streaming operators to use managed memory
Key: FLINK-20860
URL:
https://issues.apache.org/jira/browse/FLINK-20860 Project: Flink
Issue Type: Sub-task
Components: Runtime / Configuration, Runtime / Task
Reporter: Jark Wu
Fix For: 1.13.0
We are planning to use some batch algorithms
(sorting & bytes hash table) to improve the performance of streaming SQL
operators, especially for the the mini-batch operators introduced by FLIP-145.
Currently, we have to buffer input records and
accumulators in heap (i.e. Java HashMap) which is not efficient and there
are potential risks of full GC and OOM.
With the managed memory, we can fully use the memory to buffer more data
without worrying about OOM and improve the performance a lot. However, the managed memory is not allowed to be used in streaming operators.
As discussed in the mailing list [1], we have reached a consensus that we can extend the configuration {{taskmanager.memory.managed.consumer-weights}} to have 2 more options {{OPERATOR}} and {{STATE_BACKEND}}, the available consumer options will be :
* `OPERATOR` for both streaming and bath operators
* `STATE_BACKEND` for state backends
* `PYTHON` for python processes
* `DATAPROC` as a legacy key for state backend or batch operators if
`STATE_BACKEND` or `OPERATOR` are not specified.
The previous default value is {{DATAPROC:70,PYTHON:30}}, the new default value will be {{OPERATOR:70,STATEBACKEND:70,PYTHON:30}}.
The weight for OPERATOR and STATE_BACKEND will be the same value to align with previous behaviors.
[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Allow-streaming-operators-to-use-managed-memory-td47327.html--
This message was sent by Atlassian Jira
(v8.3.4#803005)