[jira] [Created] (FLINK-20860) Allow streaming operators to use managed memory

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-20860) Allow streaming operators to use managed memory

Shang Yuanchun (Jira)
Jark Wu created FLINK-20860:
-------------------------------

             Summary: Allow streaming operators to use managed memory
                 Key: FLINK-20860
                 URL: https://issues.apache.org/jira/browse/FLINK-20860
             Project: Flink
          Issue Type: Sub-task
          Components: Runtime / Configuration, Runtime / Task
            Reporter: Jark Wu
             Fix For: 1.13.0


We are planning to use some batch algorithms
(sorting & bytes hash table) to improve the performance of streaming SQL
operators, especially for the the mini-batch operators introduced by FLIP-145.

Currently, we have to buffer input records and
accumulators in heap (i.e. Java HashMap) which is not efficient and there
are potential risks of full GC and OOM.
With the managed memory, we can fully use the memory to buffer more data
without worrying about OOM and improve the performance a lot. However, the managed memory is not allowed to be used in streaming operators.

As discussed in the mailing list [1], we have reached a consensus that we can extend the configuration {{taskmanager.memory.managed.consumer-weights}} to have 2 more options {{OPERATOR}} and {{STATE_BACKEND}}, the available consumer options will be :

* `OPERATOR` for both streaming and bath operators
* `STATE_BACKEND` for state backends
* `PYTHON` for python processes
* `DATAPROC` as a legacy key for state backend or batch operators if
`STATE_BACKEND` or `OPERATOR` are not specified.

The previous default value is {{DATAPROC:70,PYTHON:30}}, the new default value will be {{OPERATOR:70,STATEBACKEND:70,PYTHON:30}}.

The weight for OPERATOR and STATE_BACKEND will be the same value to align with previous behaviors.


[1]: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Allow-streaming-operators-to-use-managed-memory-td47327.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)