[jira] [Created] (FLINK-1915) Faulty plan selection by optimizer

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-1915) Faulty plan selection by optimizer

Shang Yuanchun (Jira)
Till Rohrmann created FLINK-1915:
------------------------------------

             Summary: Faulty plan selection by optimizer
                 Key: FLINK-1915
                 URL: https://issues.apache.org/jira/browse/FLINK-1915
             Project: Flink
          Issue Type: Bug
            Reporter: Till Rohrmann
            Priority: Minor


The optimizer selects for certain jobs a sub-optimal execution plan.

For example, the {{WebLogAnalysis}} example job contains a coGroup input which consists of a {{Filter}} and a subsequent {{Projection}}. The optimizer inserts a hash partitioning between the filter and the mapper (projection) and a sorting after the projection. It would be more efficient if the hash partitioning would have been done after the projection, because the data is smaller at this stage.

I could observe a similar behaviour for a larger job, where the hash partitioning was executed before a filter operation which was then used as input for a join operator. I suspect that the optimizer considers the two plans (hash partitioning before the filter and after the filter) as equivalent in the absence of proper size estimates. However, executing the hash partitioning after the filter should always be more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)