[jira] [Created] (FLINK-14709) Allow outputting elements in close method of chained drivers.

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-14709) Allow outputting elements in close method of chained drivers.

Shang Yuanchun (Jira)
David Moravek created FLINK-14709:
-------------------------------------

             Summary: Allow outputting elements in close method of chained drivers.
                 Key: FLINK-14709
                 URL: https://issues.apache.org/jira/browse/FLINK-14709
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Task
    Affects Versions: 1.9.1, 1.8.1, 1.7.2
            Reporter: David Moravek


Currently, BatchTask and DataSourceTask only allow outputting elements in close method of "rich" operators, that they directly execute.

Task workflow is as follows:
1) open "head" driver  (calls "open" method on udf)
2) open chained drivers
3) run "head" driver
4) close "head" driver (calls "close" method on udf)
5) close output collector (no elements can be collected after this point)
6) close chained drivers

In order to properly support outputs from close method, we want to switch 6) and 5). We also need to tweak implementation of Reduce / Combine chained drivers, because they dispose sorters in closeTask method (this should be done in the close method).

This would bring huge performance improvement for Beam users, because we could properly implement bundling on batch (whole partition = single bundle).




--
This message was sent by Atlassian Jira
(v8.3.4#803005)