Flink optimizer optimizations

classic Classic list List threaded Threaded
5 messages Options
CPC
Reply | Threaded
Open this post in threaded view
|

Flink optimizer optimizations

CPC
Hi

When i look for what kind of optimizations flink does, i found
https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals  is
it up to date? Also i couldnt understand:

"Reusing of partitionings and sort orders across operators. If one operator
leaves the data in partitioned fashion (and or sorted order), the next
operator will automatically try and reuse these characteristics. The
planning for this is done holistically and can cause earlier operators to
pick more expensive algorithms, if they allow for better reusing of
sort-order and partitioning."

Can you give example for "earlier operators to pick more expensive
algorithms" ?

Regards
Reply | Threaded
Open this post in threaded view
|

Re: Flink optimizer optimizations

Matthias J. Sax-2
Assume you have a groupBy followed by a join.

DataSet1 (nor sorted) -> groupBy(A) --> join(1.A == 2.A)
                                        ^
DataSet2 (sorted on A) -----------------+

For groupBy(A) of DataSet1 the optimizer can pick hash-grouping or the
more expensive sort-based-grouping. If the optimizer pick
sort-based-grouping, the join becomes super cheap because if can just
perform a merge-join (with the need to sort the data, because both
datasets will be sorted on A already). Thus, the overhead of sorting in
the group might pay of in the join.

-Matthias

On 04/15/2016 10:50 PM, CPC wrote:

> Hi
>
> When i look for what kind of optimizations flink does, i found
> https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals  is
> it up to date? Also i couldnt understand:
>
> "Reusing of partitionings and sort orders across operators. If one operator
> leaves the data in partitioned fashion (and or sorted order), the next
> operator will automatically try and reuse these characteristics. The
> planning for this is done holistically and can cause earlier operators to
> pick more expensive algorithms, if they allow for better reusing of
> sort-order and partitioning."
>
> Can you give example for "earlier operators to pick more expensive
> algorithms" ?
>
> Regards
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Flink optimizer optimizations

Ufuk Celebi-2
On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]> wrote:
> (with the need to sort the data, because both
> datasets will be sorted on A already). Thus, the overhead of sorting in
> the group might pay of in the join.

I think you meant to write withOUT the need to the sort the data, right?
Reply | Threaded
Open this post in threaded view
|

Re: Flink optimizer optimizations

Matthias J. Sax-2
Sure. WITHOUT.

Thanks. Good catch :)

On 04/16/2016 01:18 PM, Ufuk Celebi wrote:
> On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]> wrote:
>> (with the need to sort the data, because both
>> datasets will be sorted on A already). Thus, the overhead of sorting in
>> the group might pay of in the join.
>
> I think you meant to write withOUT the need to the sort the data, right?
>


signature.asc (836 bytes) Download Attachment
CPC
Reply | Threaded
Open this post in threaded view
|

Re: Flink optimizer optimizations

CPC
Himmm i understand now. Thank you guys:)
On Apr 16, 2016 2:21 PM, "Matthias J. Sax" <[hidden email]> wrote:

> Sure. WITHOUT.
>
> Thanks. Good catch :)
>
> On 04/16/2016 01:18 PM, Ufuk Celebi wrote:
> > On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]>
> wrote:
> >> (with the need to sort the data, because both
> >> datasets will be sorted on A already). Thus, the overhead of sorting in
> >> the group might pay of in the join.
> >
> > I think you meant to write withOUT the need to the sort the data, right?
> >
>
>