(DEPRECATED) Apache Flink Mailing List archive.

[jira] [Created] (FLINK-1915) Faulty plan selection by optimizer

Classic

List

Threaded

1 message

Shang Yuanchun (Jira)

[jira] [Created] (FLINK-1915) Faulty plan selection by optimizer

Till Rohrmann created FLINK-1915:
------------------------------------

Summary: Faulty plan selection by optimizer
Key: FLINK-1915
URL: https://issues.apache.org/jira/browse/FLINK-1915
Project: Flink
Issue Type: Bug
Reporter: Till Rohrmann
Priority: Minor

The optimizer selects for certain jobs a sub-optimal execution plan.

For example, the {{WebLogAnalysis}} example job contains a coGroup input which consists of a {{Filter}} and a subsequent {{Projection}}. The optimizer inserts a hash partitioning between the filter and the mapper (projection) and a sorting after the projection. It would be more efficient if the hash partitioning would have been done after the projection, because the data is smaller at this stage.

I could observe a similar behaviour for a larger job, where the hash partitioning was executed before a filter operation which was then used as input for a join operator. I suspect that the optimizer considers the two plans (hash partitioning before the filter and after the filter) as equivalent in the absence of proper size estimates. However, executing the hash partitioning after the filter should always be more efficient.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)