difference between reducefunction and GroupReduceFunction

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

difference between reducefunction and GroupReduceFunction

santosh_rajaguru
i am new to flink and map reduce. My query is
Apart from incrementally combing 2 elements, what are the merits of using reduceFunction over GroupReduceFunction. which usecases suits what functions the most!!!


 
mxm
Reply | Threaded
Open this post in threaded view
|

Re: difference between reducefunction and GroupReduceFunction

mxm
Like you said, it depends on the use case. The GroupReduceFunction is a
generalization of the traditional reduce. Thus, it is more powerful.
However, it is also executed differently; a GroupReduceFunction requires
the whole group to be materialized and passed at once. If your program
doesn't require that, use the normal reduce function.

On Thu, May 21, 2015 at 4:42 PM, santosh_rajaguru <[hidden email]> wrote:

> i am new to flink and map reduce. My query is
> Apart from incrementally combing 2 elements, what are the merits of using
> reduceFunction over GroupReduceFunction. which usecases suits what
> functions
> the most!!!
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/difference-between-reducefunction-and-GroupReduceFunction-tp5768.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>
mxm
Reply | Threaded
Open this post in threaded view
|

Re: difference between reducefunction and GroupReduceFunction

mxm
Pardon, what I said is not completely right. Both functions are
incrementally constructed. This seems obvious for the reduce function but
is also true for the GroupReduce because it receives the values as an
Iterable which, under the hood, can be constructed incrementally as well.

One other difference is that the traditional reduce always applies a
combiner before shuffling the results. The GroupReduceFunction, on the
other hand, does not do that unless you explicitly specify a combiner using
the RichGroupReduceFunction or perform a GroupCombine operation before the
GroupReduce.

Best regards,
Max


On Fri, May 22, 2015 at 10:03 AM, Maximilian Michels <[hidden email]> wrote:

> Like you said, it depends on the use case. The GroupReduceFunction is a
> generalization of the traditional reduce. Thus, it is more powerful.
> However, it is also executed differently; a GroupReduceFunction requires
> the whole group to be materialized and passed at once. If your program
> doesn't require that, use the normal reduce function.
>
> On Thu, May 21, 2015 at 4:42 PM, santosh_rajaguru <[hidden email]>
> wrote:
>
>> i am new to flink and map reduce. My query is
>> Apart from incrementally combing 2 elements, what are the merits of using
>> reduceFunction over GroupReduceFunction. which usecases suits what
>> functions
>> the most!!!
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/difference-between-reducefunction-and-GroupReduceFunction-tp5768.html
>> Sent from the Apache Flink Mailing List archive. mailing list archive at
>> Nabble.com.
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: difference between reducefunction and GroupReduceFunction

santosh_rajaguru
Thanks Maximilian.

My use case is similar to the example given in the graph analysis.
In graph analysis, the reduce function used is a normal reduce function.
I executed that with both scenarios and your justification is right. the normal reduce function have a combiner before sorting unlike the GroupReduce function.
my question, how is it effecting the performance as the result is same in both the situation.


Thanks and Regards,
Santosh

Reply | Threaded
Open this post in threaded view
|

Re: difference between reducefunction and GroupReduceFunction

Stephan Ewen
Performance-wise, a "GroupReduceFunction" with Combiner should right not be
slightly faster than the ReduceFunction, but not much.

Long term, the ReduceFunction may become faster, because it will use hash
aggregation under the hood.


On Fri, May 22, 2015 at 11:58 AM, santosh_rajaguru <[hidden email]>
wrote:

> Thanks Maximilian.
>
> My use case is similar to the example given in the graph analysis.
> In graph analysis, the reduce function used is a normal reduce function.
> I executed that with both scenarios and your justification is right. the
> normal reduce function have a combiner before sorting unlike the
> GroupReduce
> function.
> my question, how is it effecting the performance as the result is same in
> both the situation.
>
>
> Thanks and Regards,
> Santosh
>
>
>
>
>
> --
> View this message in context:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/difference-between-reducefunction-and-GroupReduceFunction-tp5768p5785.html
> Sent from the Apache Flink Mailing List archive. mailing list archive at
> Nabble.com.
>