[jira] [Created] (FLINK-18235) Improve the checkpoint strategy for Python UDF execution

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-18235) Improve the checkpoint strategy for Python UDF execution

Shang Yuanchun (Jira)
Dian Fu created FLINK-18235:
-------------------------------

             Summary: Improve the checkpoint strategy for Python UDF execution
                 Key: FLINK-18235
                 URL: https://issues.apache.org/jira/browse/FLINK-18235
             Project: Flink
          Issue Type: Improvement
          Components: API / Python
            Reporter: Dian Fu
             Fix For: 1.12.0


Currently, when a checkpoint is triggered for the Python operator, all the data buffered will be flushed to the Python worker to be processed. This will increase the overall checkpoint time in case there are a lot of elements buffered and Python UDF is slow. We should improve the checkpoint strategy to improve this, e.g. buffering the data into state instead of flushing them out. We can also let users to config the checkpoint strategy if needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)