The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

rachelb (6) [Avatar] Offline
#1
I was already familiar with "count min sketch" probabilistic algorithm and was eager to understand the hyperloglog algorithm. I appreciate the section is only a high level overview, but when I finished it I was confused about how the algorithm works.

My understanding is that a hash value is taken for each element in the stream, then the number of leading zeros for each hash value is taken and then the maximum of the number of leading zeros is determined. This maximum is an estimate for log base 2 of the number of distinct elements. Finally multiple estimates are taken and then combined to give the final estimate.

However, the description misses out the need to take a maximum and the fact that it is an estimate log base 2 of the final number - so the algorithm is not clear at all.

Two other areas could also be clarified. The count of leading zeros being an estimate is not immediate obvious. A few sentences to help build intuition in this area would help.

Also the use of the harmonic mean is also not obvious. Again a few sentences to indicate that each distinct count is an estimate and they are combined to produce a better estimate and the harmonic mean is used as it produces a more accurate estimate.

Finally, more guidance on the relationship between number of registers (hence memory) and accuracy would be useful - I am sure like count min sketch there is a formula for this.



andrew.psaltis (33) [Avatar] Offline
#2
Thank you for your detailed analysis and pointing out things that were flawed or where the text was lacking. I have taken all of this into consideration and updated the text accordingly.

Again, sorry for the delay in responding to you regarding this.