(DEPRECATED) Apache Flink Mailing List archive.

Flink optimizer optimizations

Classic

List

Threaded

5 messages Options

CPC

Flink optimizer optimizations

Hi

When i look for what kind of optimizations flink does, i found
https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals is
it up to date? Also i couldnt understand:

"Reusing of partitionings and sort orders across operators. If one operator
leaves the data in partitioned fashion (and or sorted order), the next
operator will automatically try and reuse these characteristics. The
planning for this is done holistically and can cause earlier operators to
pick more expensive algorithms, if they allow for better reusing of
sort-order and partitioning."

Can you give example for "earlier operators to pick more expensive
algorithms" ?

Regards

Matthias J. Sax-2

Re: Flink optimizer optimizations

Assume you have a groupBy followed by a join.

DataSet1 (nor sorted) -> groupBy(A) --> join(1.A == 2.A)
^
DataSet2 (sorted on A) -----------------+

For groupBy(A) of DataSet1 the optimizer can pick hash-grouping or the
more expensive sort-based-grouping. If the optimizer pick
sort-based-grouping, the join becomes super cheap because if can just
perform a merge-join (with the need to sort the data, because both
datasets will be sorted on A already). Thus, the overhead of sorting in
the group might pay of in the join.

-Matthias

On 04/15/2016 10:50 PM, CPC wrote:

> Hi
>
> When i look for what kind of optimizations flink does, i found
> https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals is
> it up to date? Also i couldnt understand:
>
> "Reusing of partitionings and sort orders across operators. If one operator
> leaves the data in partitioned fashion (and or sorted order), the next
> operator will automatically try and reuse these characteristics. The
> planning for this is done holistically and can cause earlier operators to
> pick more expensive algorithms, if they allow for better reusing of
> sort-order and partitioning."
>
> Can you give example for "earlier operators to pick more expensive
> algorithms" ?
>
> Regards
>

signature.asc (836 bytes) Download Attachment

Ufuk Celebi-2

Re: Flink optimizer optimizations

On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]> wrote:
> (with the need to sort the data, because both
> datasets will be sorted on A already). Thus, the overhead of sorting in
> the group might pay of in the join.

I think you meant to write withOUT the need to the sort the data, right?

Matthias J. Sax-2

Re: Flink optimizer optimizations

Sure. WITHOUT.

Thanks. Good catch :)

On 04/16/2016 01:18 PM, Ufuk Celebi wrote:
> On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]> wrote:
>> (with the need to sort the data, because both
>> datasets will be sorted on A already). Thus, the overhead of sorting in
>> the group might pay of in the join.
>
> I think you meant to write withOUT the need to the sort the data, right?
>

signature.asc (836 bytes) Download Attachment

CPC

Re: Flink optimizer optimizations

Himmm i understand now. Thank you guys:)
On Apr 16, 2016 2:21 PM, "Matthias J. Sax" <[hidden email]> wrote:

> Sure. WITHOUT.
>
> Thanks. Good catch :)
>
> On 04/16/2016 01:18 PM, Ufuk Celebi wrote:
> > On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]>
> wrote:
> >> (with the need to sort the data, because both
> >> datasets will be sorted on A already). Thus, the overhead of sorting in
> >> the group might pay of in the join.
> >
> > I think you meant to write withOUT the need to the sort the data, right?
> >
>
>