[jira] [Created] (FLINK-19011) Parallelize the restoreOperation in OperatorStateBackend

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-19011) Parallelize the restoreOperation in OperatorStateBackend

Shang Yuanchun (Jira)
Jiayi Liao created FLINK-19011:
----------------------------------

             Summary: Parallelize the restoreOperation in OperatorStateBackend
                 Key: FLINK-19011
                 URL: https://issues.apache.org/jira/browse/FLINK-19011
             Project: Flink
          Issue Type: Improvement
    Affects Versions: 1.11.1
            Reporter: Jiayi Liao


To restore the states, union state needs to read state handles produced by all operators. And currently during the restore operation, Flink iterates the state handles one by one, which could last tens of minutes if the magnitude of state handles exceeds ten thousand.

To accelerate the process, I propose to parallelize the random reads on HDFS and deserialization. We can create a runnable for each state handle and let it return the metadata and deserialized data, which can be aggregated in main thread.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)