[jira] [Created] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (FLINK-11964) Avoid hash collision of partition and bucket in HybridHashTable

Shang Yuanchun (Jira)
Jingsong Lee created FLINK-11964:
------------------------------------

             Summary: Avoid hash collision of partition and bucket in HybridHashTable
                 Key: FLINK-11964
                 URL: https://issues.apache.org/jira/browse/FLINK-11964
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Operators
            Reporter: Jingsong Lee


In HybridHashTable, first select the corresponding partition according to hashCode, and then select the bucket in the partition according to hashCode, so using the same hashCode can easily cause hash collision.

Consider doing some mix to hashCode when choosing bucket.

We can just XOR some shifted bits in the cheapest possible way to reduce systematic lossage, as well as to incorporate impact of the highest bits that would otherwise never be used in index calculations because of table bounds. (bucket use power-of-two masking).  Just like:  (hash ^ (hash >>> 16))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)