Message352206
Something that may be slightly worth pursuing: in
j = (5*j + 1) % 2**power
to get a full-period sequence hitting every integer in range(2**power) exactly once, the multiplier (5) is a bit delicate (it has to be congruent to 1 mod 4), but the addend (1) only needs to be odd. When the hash code has a long string of high-order 1 bits, the lower bits of `perturb` are all ones repeatedly across iterations, and a string of `power` one bits is congruent to -1 mod 2**power, cancelling out the +1.
Which people stumbled into here: all negative ints with small absolute value _do_ have a long string of high-order 1 bits (as did also my contrived 2**60 - 2).
If we changed the addend to, say, +3 instead, it would still be possible to contrive cases as bad, but you'd be unlikely to stumble into them by accident, and they would be sensitive to the value of `power` (the table's current size). For example, for a table size of 8 (the smallest we use), only ints ending with bits 101 are congruent to -3 mod 8. So maximally "bad" hash codes would be of the form (bits, where "x" is "doesn't matter", and grouped into chunks of PERTURB_SHIFT (5) bits from the right):
... xx101 xx101 ... xx101 xx101 xxxxx
Provided changing +1 to +3 didn't slow anything down (you never know until you try!), that would be fine by me. But I'm not bothered by the current behavior, so I won't be pursuing it. |
|
Date |
User |
Action |
Args |
2019-09-12 15:49:14 | tim.peters | set | recipients:
+ tim.peters, rhettinger, steven.daprano, serhiy.storchaka, hongweipeng, crusaderky |
2019-09-12 15:49:14 | tim.peters | set | messageid: <1568303354.27.0.614001580805.issue38105@roundup.psfhosted.org> |
2019-09-12 15:49:14 | tim.peters | link | issue38105 messages |
2019-09-12 15:49:14 | tim.peters | create | |
|