To aitozi.
Cheers Minglei > 在 2018年6月27日,下午5:46,shimin yang <[hidden email]> 写道: > > Aitozi > > We are using hyperloglog to count daily uv, but it only provided an approximate value. I also tried the count distinct in flink table without window, but need to set the retention time. > > However, the time resolution of this operator is 1 millisecond, so it ends up with too many timers in the java heap which might leads to OOM. > > Cheers > Shimin > > 2018-06-27 17:34 GMT+08:00 zhangminglei <[hidden email] <mailto:[hidden email]>>: > Aitozi > > From my side, I do not think distinct is very easy to deal with. Even though together work with kafka support exactly-once. > > For uv, we can use a bloomfilter to filter pv for geting uv in the end. > > Window is usually used in an aggregate operation, so I think all should be realized by windows. > > I am not familiar with this fields, so I still want to know what others response this question. > > Cheers > Minglei > > > > > 在 2018年6月27日,下午5:12,aitozi <[hidden email] <mailto:[hidden email]>> 写道: > > > > Hi, community > > > > I am using flink to deal with some situation. > > > > 1. "distinct count" to calculate the uv/pv. > > 2. calculate the topN of the past 1 hour or 1 day time. > > > > Are these all realized by window? Or is there a best practice on doing this? > > > > 3. And when deal with the distinct, if there is no need to do the keyBy > > previous, how does the window deal with this. > > > > Thanks > > Aitozi. > > > > > > > > -- > > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> > > > |
Forward shiming mail to Aitozi.
Aitozi We are using hyperloglog to count daily uv, but it only provided an approximate value. I also tried the count distinct in flink table without window, but need to set the retention time. However, the time resolution of this operator is 1 millisecond, so it ends up with too many timers in the java heap which might leads to OOM. Cheers Shimin > 在 2018年6月27日,下午5:34,zhangminglei <[hidden email]> 写道: > > Aitozi > > From my side, I do not think distinct is very easy to deal with. Even though together work with kafka support exactly-once. > > For uv, we can use a bloomfilter to filter pv for geting uv in the end. > > Window is usually used in an aggregate operation, so I think all should be realized by windows. > > I am not familiar with this fields, so I still want to know what others response this question. > > Cheers > Minglei > > > >> 在 2018年6月27日,下午5:12,aitozi <[hidden email]> 写道: >> >> Hi, community >> >> I am using flink to deal with some situation. >> >> 1. "distinct count" to calculate the uv/pv. >> 2. calculate the topN of the past 1 hour or 1 day time. >> >> Are these all realized by window? Or is there a best practice on doing this? >> >> 3. And when deal with the distinct, if there is no need to do the keyBy >> previous, how does the window deal with this. >> >> Thanks >> Aitozi. >> >> >> >> -- >> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > |
Free forum by Nabble | Edit this page |