Author Dmitry Rubanovich
Recipients Dmitry Rubanovich, methane, rhettinger, serhiy.storchaka, xiang.zhang
Date 2017-06-15.23:28:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1497569319.98.0.458656958782.issue30671@psf.upfronthosting.co.za>
In-reply-to
Content
I am looking at the 3.6.1 code.  I am not trying to simulate the index lookup as the lookup function would do it.  

There is no point in that.  The point of the script is to detect collisions among perturbed secondary indexes rather than between primary hashes and secondary.  To that end, it applies the perturbed secondary hash function to **all** primary indexes and stores the results.  This allows to judge the quality of the secondary hash function.  The "old" one was bad because it automatically created collisions by multiplying by zero-divisors of the 2^k ring.  The one used in 3.6.1 is better because it  doesn't always do that, but it still multiplies by zero-divisors of the ring when h mod 2^k is equal to (h >> 5) mod 2^k.  This multiplication by zero-divisors of the 2^k ring is what produces collisions.  The script simply counts them.  

The later iterations of the loop are not very relevant.  They will almost certainly produce further collisions, but that's unavoidable.  

By the way, just as a test, I just changed the value of PERTURB_SHIFT to 1 and, in my proposed solution, it produced lower collision counts, after 3 runs of the loop, in virtually all rings (ie, mod all values of 2^k for k from 8 to 20).  Intuitively it makes sense because there is less information loss.  But I don't have a formal proof as to why that should be.
History
Date User Action Args
2017-06-15 23:28:40Dmitry Rubanovichsetrecipients: + Dmitry Rubanovich, rhettinger, methane, serhiy.storchaka, xiang.zhang
2017-06-15 23:28:39Dmitry Rubanovichsetmessageid: <1497569319.98.0.458656958782.issue30671@psf.upfronthosting.co.za>
2017-06-15 23:28:39Dmitry Rubanovichlinkissue30671 messages
2017-06-15 23:28:39Dmitry Rubanovichcreate