Hi
When i look for what kind of optimizations flink does, i found https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals is it up to date? Also i couldnt understand: "Reusing of partitionings and sort orders across operators. If one operator leaves the data in partitioned fashion (and or sorted order), the next operator will automatically try and reuse these characteristics. The planning for this is done holistically and can cause earlier operators to pick more expensive algorithms, if they allow for better reusing of sort-order and partitioning." Can you give example for "earlier operators to pick more expensive algorithms" ? Regards |
Assume you have a groupBy followed by a join.
DataSet1 (nor sorted) -> groupBy(A) --> join(1.A == 2.A) ^ DataSet2 (sorted on A) -----------------+ For groupBy(A) of DataSet1 the optimizer can pick hash-grouping or the more expensive sort-based-grouping. If the optimizer pick sort-based-grouping, the join becomes super cheap because if can just perform a merge-join (with the need to sort the data, because both datasets will be sorted on A already). Thus, the overhead of sorting in the group might pay of in the join. -Matthias On 04/15/2016 10:50 PM, CPC wrote: > Hi > > When i look for what kind of optimizations flink does, i found > https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals is > it up to date? Also i couldnt understand: > > "Reusing of partitionings and sort orders across operators. If one operator > leaves the data in partitioned fashion (and or sorted order), the next > operator will automatically try and reuse these characteristics. The > planning for this is done holistically and can cause earlier operators to > pick more expensive algorithms, if they allow for better reusing of > sort-order and partitioning." > > Can you give example for "earlier operators to pick more expensive > algorithms" ? > > Regards > |
On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]> wrote:
> (with the need to sort the data, because both > datasets will be sorted on A already). Thus, the overhead of sorting in > the group might pay of in the join. I think you meant to write withOUT the need to the sort the data, right? |
Sure. WITHOUT.
Thanks. Good catch :) On 04/16/2016 01:18 PM, Ufuk Celebi wrote: > On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]> wrote: >> (with the need to sort the data, because both >> datasets will be sorted on A already). Thus, the overhead of sorting in >> the group might pay of in the join. > > I think you meant to write withOUT the need to the sort the data, right? > |
Himmm i understand now. Thank you guys:)
On Apr 16, 2016 2:21 PM, "Matthias J. Sax" <[hidden email]> wrote: > Sure. WITHOUT. > > Thanks. Good catch :) > > On 04/16/2016 01:18 PM, Ufuk Celebi wrote: > > On Sat, Apr 16, 2016 at 1:05 PM, Matthias J. Sax <[hidden email]> > wrote: > >> (with the need to sort the data, because both > >> datasets will be sorted on A already). Thus, the overhead of sorting in > >> the group might pay of in the join. > > > > I think you meant to write withOUT the need to the sort the data, right? > > > > |
Free forum by Nabble | Edit this page |