Streaming groupby and aggregation by field expressions

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming groupby and aggregation by field expressions

Gyula Fóra-2
Hey guys,

Just a quick note on some upcoming API updates for the Streaming api.

Now it will be possible to use field expressions for both grouping and
aggregations in the streaming api. You can check it out here
<https://github.com/mbalassi/incubator-flink/blob/daba36e142537ca0bd7e4d0f1209ce8b0ebecda5/flink-addons/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/wordcount/PojoWordCount.java#L102>
.

Or in a concise form:
DataStream<Word> counts = text.flatMap(new Tokenizer()).groupBy("word")
.sum("frequency");

I will still do some more testing before it will be available in the master
branch.

I am also planning to extend aggregations to more fields at the same time
like
sum(1,2,2) or max("a","c").

Regards,
Gyula
Reply | Threaded
Open this post in threaded view
|

Re: Streaming groupby and aggregation by field expressions

Fabian Hueske
Hi Gyula,

great to see so much progress with Flink Streaming!

Regarding the improved aggregations, there is another effort to improve the
aggregations of the batch API, which was recently discussed on the mailing
list [1].
I think it would make sense to try to keep the streaming and batch APIs
close together. Would the proposed batch approach also work for the
streaming case and if not what is missing?

Cheers, Fabian

[1]
http://mail-archives.apache.org/mod_mbox/flink-dev/201410.mbox/%3C44B1AB07-F993-436F-AE23-8CC4CCC08A54%40tu-berlin.de%3E

2014-11-05 18:37 GMT+01:00 Gyula Fóra <[hidden email]>:

> Hey guys,
>
> Just a quick note on some upcoming API updates for the Streaming api.
>
> Now it will be possible to use field expressions for both grouping and
> aggregations in the streaming api. You can check it out here
> <
> https://github.com/mbalassi/incubator-flink/blob/daba36e142537ca0bd7e4d0f1209ce8b0ebecda5/flink-addons/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/wordcount/PojoWordCount.java#L102
> >
> .
>
> Or in a concise form:
> DataStream<Word> counts = text.flatMap(new Tokenizer()).groupBy("word")
> .sum("frequency");
>
> I will still do some more testing before it will be available in the master
> branch.
>
> I am also planning to extend aggregations to more fields at the same time
> like
> sum(1,2,2) or max("a","c").
>
> Regards,
> Gyula
>
Reply | Threaded
Open this post in threaded view
|

Re: Streaming groupby and aggregation by field expressions

Gyula Fóra-2
Hi Fabian,

The link you sent is broken but I think you are referring the conversation with Viktor.

I agree that we should provide the same api for aggregations and I don’t think there is any reason why the proposed batch approach wouldn’t work on streaming :)

When there is an agreement on the new aggregation api we will implement it for streaming.

Cheers,
Gyula

> On 05 Nov 2014, at 19:26, Fabian Hueske <[hidden email]> wrote:
>
> Hi Gyula,
>
> great to see so much progress with Flink Streaming!
>
> Regarding the improved aggregations, there is another effort to improve the
> aggregations of the batch API, which was recently discussed on the mailing
> list [1].
> I think it would make sense to try to keep the streaming and batch APIs
> close together. Would the proposed batch approach also work for the
> streaming case and if not what is missing?
>
> Cheers, Fabian
>
> [1]
> http://mail-archives.apache.org/mod_mbox/flink-dev/201410.mbox/%3C44B1AB07-F993-436F-AE23-8CC4CCC08A54%40tu-berlin.de%3E
>
> 2014-11-05 18:37 GMT+01:00 Gyula Fóra <[hidden email]>:
>
>> Hey guys,
>>
>> Just a quick note on some upcoming API updates for the Streaming api.
>>
>> Now it will be possible to use field expressions for both grouping and
>> aggregations in the streaming api. You can check it out here
>> <
>> https://github.com/mbalassi/incubator-flink/blob/daba36e142537ca0bd7e4d0f1209ce8b0ebecda5/flink-addons/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/wordcount/PojoWordCount.java#L102
>>>
>> .
>>
>> Or in a concise form:
>> DataStream<Word> counts = text.flatMap(new Tokenizer()).groupBy("word")
>> .sum("frequency");
>>
>> I will still do some more testing before it will be available in the master
>> branch.
>>
>> I am also planning to extend aggregations to more fields at the same time
>> like
>> sum(1,2,2) or max("a","c").
>>
>> Regards,
>> Gyula
>>

Reply | Threaded
Open this post in threaded view
|

Re: Streaming groupby and aggregation by field expressions

Fabian Hueske
Yes, that's the thread I wanted to point to ;-)

2014-11-05 19:40 GMT+01:00 Gyula Fora <[hidden email]>:

> Hi Fabian,
>
> The link you sent is broken but I think you are referring the conversation
> with Viktor.
>
> I agree that we should provide the same api for aggregations and I don’t
> think there is any reason why the proposed batch approach wouldn’t work on
> streaming :)
>
> When there is an agreement on the new aggregation api we will implement it
> for streaming.
>
> Cheers,
> Gyula
>
> > On 05 Nov 2014, at 19:26, Fabian Hueske <[hidden email]> wrote:
> >
> > Hi Gyula,
> >
> > great to see so much progress with Flink Streaming!
> >
> > Regarding the improved aggregations, there is another effort to improve
> the
> > aggregations of the batch API, which was recently discussed on the
> mailing
> > list [1].
> > I think it would make sense to try to keep the streaming and batch APIs
> > close together. Would the proposed batch approach also work for the
> > streaming case and if not what is missing?
> >
> > Cheers, Fabian
> >
> > [1]
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201410.mbox/%3C44B1AB07-F993-436F-AE23-8CC4CCC08A54%40tu-berlin.de%3E
> >
> > 2014-11-05 18:37 GMT+01:00 Gyula Fóra <[hidden email]>:
> >
> >> Hey guys,
> >>
> >> Just a quick note on some upcoming API updates for the Streaming api.
> >>
> >> Now it will be possible to use field expressions for both grouping and
> >> aggregations in the streaming api. You can check it out here
> >> <
> >>
> https://github.com/mbalassi/incubator-flink/blob/daba36e142537ca0bd7e4d0f1209ce8b0ebecda5/flink-addons/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/wordcount/PojoWordCount.java#L102
> >>>
> >> .
> >>
> >> Or in a concise form:
> >> DataStream<Word> counts = text.flatMap(new Tokenizer()).groupBy("word")
> >> .sum("frequency");
> >>
> >> I will still do some more testing before it will be available in the
> master
> >> branch.
> >>
> >> I am also planning to extend aggregations to more fields at the same
> time
> >> like
> >> sum(1,2,2) or max("a","c").
> >>
> >> Regards,
> >> Gyula
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Streaming groupby and aggregation by field expressions

Viktor Rosenfeld
In reply to this post by Gyula Fóra-2
Hi Gyula,

I will take a look at your code once I have figured out how to implement the new batch API.

Best,
Viktor

Gyula Fóra wrote
Hi Fabian,

The link you sent is broken but I think you are referring the conversation with Viktor.

I agree that we should provide the same api for aggregations and I don’t think there is any reason why the proposed batch approach wouldn’t work on streaming :)

When there is an agreement on the new aggregation api we will implement it for streaming.

Cheers,
Gyula

> On 05 Nov 2014, at 19:26, Fabian Hueske <[hidden email]> wrote:
>
> Hi Gyula,
>
> great to see so much progress with Flink Streaming!
>
> Regarding the improved aggregations, there is another effort to improve the
> aggregations of the batch API, which was recently discussed on the mailing
> list [1].
> I think it would make sense to try to keep the streaming and batch APIs
> close together. Would the proposed batch approach also work for the
> streaming case and if not what is missing?
>
> Cheers, Fabian
>
> [1]
> http://mail-archives.apache.org/mod_mbox/flink-dev/201410.mbox/%3C44B1AB07-F993-436F-AE23-8CC4CCC08A54%40tu-berlin.de%3E
>
> 2014-11-05 18:37 GMT+01:00 Gyula Fóra <[hidden email]>:
>
>> Hey guys,
>>
>> Just a quick note on some upcoming API updates for the Streaming api.
>>
>> Now it will be possible to use field expressions for both grouping and
>> aggregations in the streaming api. You can check it out here
>> <
>> https://github.com/mbalassi/incubator-flink/blob/daba36e142537ca0bd7e4d0f1209ce8b0ebecda5/flink-addons/flink-streaming/flink-streaming-examples/src/main/java/org/apache/flink/streaming/examples/wordcount/PojoWordCount.java#L102
>>>
>> .
>>
>> Or in a concise form:
>> DataStream<Word> counts = text.flatMap(new Tokenizer()).groupBy("word")
>> .sum("frequency");
>>
>> I will still do some more testing before it will be available in the master
>> branch.
>>
>> I am also planning to extend aggregations to more fields at the same time
>> like
>> sum(1,2,2) or max("a","c").
>>
>> Regards,
>> Gyula
>>