(DEPRECATED) Apache Flink Mailing List archive.

Re: Streaming

Classic

List

Threaded

2 messages Options

zhangminglei

Re: Streaming

To aitozi.

Cheers
Minglei

> 在 2018年6月27日，下午5:46，shimin yang <[hidden email]> 写道：
>
> Aitozi
>
> We are using hyperloglog to count daily uv, but it only provided an approximate value. I also tried the count distinct in flink table without window, but need to set the retention time.
>
> However, the time resolution of this operator is 1 millisecond, so it ends up with too many timers in the java heap which might leads to OOM.
>
> Cheers
> Shimin
>
> 2018-06-27 17:34 GMT+08:00 zhangminglei <[hidden email] <mailto:[hidden email]>>:
> Aitozi
>
> From my side, I do not think distinct is very easy to deal with. Even though together work with kafka support exactly-once.
>
> For uv, we can use a bloomfilter to filter pv for geting uv in the end.
>
> Window is usually used in an aggregate operation, so I think all should be realized by windows.
>
> I am not familiar with this fields, so I still want to know what others response this question.
>
> Cheers
> Minglei
>
>
>
> > 在 2018年6月27日，下午5:12，aitozi <[hidden email] <mailto:[hidden email]>> 写道：
> >
> > Hi, community
> >
> > I am using flink to deal with some situation.
> >
> > 1. "distinct count" to calculate the uv/pv.
> > 2. calculate the topN of the past 1 hour or 1 day time.
> >
> > Are these all realized by window? Or is there a best practice on doing this?
> >
> > 3. And when deal with the distinct, if there is no need to do the keyBy
> > previous, how does the window deal with this.
> >
> > Thanks
> > Aitozi.
> >
> >
> >
> > --
> > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>
>
>
>

zhangminglei

Re: Streaming

Forward shiming mail to Aitozi.

Aitozi

We are using hyperloglog to count daily uv, but it only provided an approximate value. I also tried the count distinct in flink table without window, but need to set the retention time.

However, the time resolution of this operator is 1 millisecond, so it ends up with too many timers in the java heap which might leads to OOM.

Cheers
Shimin

> 在 2018年6月27日，下午5:34，zhangminglei <[hidden email]> 写道：
>
> Aitozi
>
> From my side, I do not think distinct is very easy to deal with. Even though together work with kafka support exactly-once.
>
> For uv, we can use a bloomfilter to filter pv for geting uv in the end.
>
> Window is usually used in an aggregate operation, so I think all should be realized by windows.
>
> I am not familiar with this fields, so I still want to know what others response this question.
>
> Cheers
> Minglei
>
>
>
>> 在 2018年6月27日，下午5:12，aitozi <[hidden email]> 写道：
>>
>> Hi, community
>>
>> I am using flink to deal with some situation.
>>
>> 1. "distinct count" to calculate the uv/pv.
>> 2. calculate the topN of the past 1 hour or 1 day time.
>>
>> Are these all realized by window? Or is there a best practice on doing this?
>>
>> 3. And when deal with the distinct, if there is no need to do the keyBy
>> previous, how does the window deal with this.
>>
>> Thanks
>> Aitozi.
>>
>>
>>
>> --
>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>