1 files changed, 27 insertions, 31 deletions
diff --git a/absl/container/internal/raw_hash_set.h b/absl/container/internal/raw_hash_set.h
index 046a6939..b69a77f0 100644
--- a/absl/container/internal/raw_hash_set.h
+++ b/absl/container/internal/raw_hash_set.h
@@ -95,14 +95,21 @@
 // Storing control bytes in a separate array also has beneficial cache effects,
 // since more logical slots will fit into a cache line.
 //
+// # Hashing
+//
+// We compute two separate hashes, `H1` and `H2`, from the hash of an object.
+// `H1(hash(x))` is an index into `slots`, and essentially the starting point
+// for the probe sequence. `H2(hash(x))` is a 7-bit value used to filter out
+// objects that cannot possibly be the one we are looking for.
+//
 // # Table operations.
 //
-// The key operations are `insert`, `find`, and `erase_at`; the operations below
+// The key operations are `insert`, `find`, and `erase`.
 //
-// `insert` and `erase` are implemented in terms of find, so we describe that
-// one first. To `find` a value `x`, we compute `hash(x)`. From `H1(hash(x))`
-// and the capacity, we construct a `probe_seq` that visits every group of
-// slots in some interesting order.
+// Since `insert` and `erase` are implemented in terms of `find`, we describe
+// `find` first. To `find` a value `x`, we compute `hash(x)`. From
+// `H1(hash(x))` and the capacity, we construct a `probe_seq` that visits every
+// group of slots in some interesting order.
 //
 // We now walk through these indices. At each index, we select the entire group
 // starting with that index and extract potential candidates: occupied slots
@@ -112,32 +119,21 @@
 // next probe index. Tombstones effectively behave like full slots that never
 // match the value we're looking for.
 //
-// The `H2` bits ensure that if we perform a ==, a false positive is very very
-// rare (assuming the hash function looks enough like a random oracle). To see
-// this, note that in a group, there will be at most 8 or 16 `H2` values, but
-// an `H2` can be any one of 128 values. Viewed as a birthday attack, we can use
-// the rule of thumb that the probability of a collision among n choices of m
-// symbols is `p(n, m) ~ n^2/2m. In table form:
-//
-//  n |  p(n) |  n |  p(n)
-//  0 | 0.000 |  8 | 0.250
-//  1 | 0.004 |  9 | 0.316
-//  2 | 0.016 | 10 | 0.391
-//  3 | 0.035 | 11 | 0.473
-//  4 | 0.062 | 12 | 0.562
-//  5 | 0.098 | 13 | 0.660
-//  6 | 0.141 | 14 | 0.766
-//  7 | 0.191 | 15 | 0.879
-//
-// The rule of thumb breaks down at around `n = 12`, but such groups would only
-// occur for tables close to their max load factor. This is far better than an
-// ordinary open-addressing table, which needs to perform an == at every step of
-// the probe sequence. These probabilities don't tell the full story (for
-// example, because elements are inserted into a group from the front, and
-// candidates are =='ed from the front, the collision is only effective in
-// rare cases e.g. another probe sequence inserted into a deleted slot in front
-// of us).
-//
+// The `H2` bits ensure when we compare a slot to an object with `==`, we are
+// likely to have actually found the object.  That is, the chance is low that
+// `==` is called and returns `false`.  Thus, when we search for an object, we
+// are unlikely to call `==` many times.  This likelyhood can be analyzed as
+// follows (assuming that H2 is a random enough hash function).
+//
+// Let's assume that there are `k` "wrong" objects that must be examined in a
+// probe sequence.  For example, when doing a `find` on an object that is in the
+// table, `k` is the number of objects between the start of the probe sequence
+// and the final found object (not including the final found object).  The
+// expected number of objects with an H2 match is then `k/128`.  Measurements
+// and analysis indicate that even at high load factors, `k` is less than 32,
+// meaning that the number of "false positive" comparisons we must perform is
+// less than 1/8 per `find`.
+
 // `insert` is implemented in terms of `unchecked_insert`, which inserts a
 // value presumed to not be in the table (violating this requirement will cause
 // the table to behave erratically). Given `x` and its hash `hash(x)`, to insert