Message279698
I also looked at hashes of strings themselves rather than frozensets to check the hashing of strings directly.
For example, n=3:
['', 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']
rather than:
[frozenset(), frozenset({'a'}), frozenset({'b'}), frozenset({'c'}), frozenset({'b', 'a'}), frozenset({'c', 'a'}), frozenset({'b', 'c'}), frozenset({'b', 'a', 'c'})]
I made a distribution as with the last comment but now using the # of unique last-7 bit sequences in a set of 128 such strings (n=7) and compared to pseudorandom integers, just as was done before with frozensets of the letter combinations. This is shown in the file "str_string_n7_10k.png".
The last 7-bits of the small string hashes produce a distribution much like regular pseudorandom integers.
So if there is a problem with the hash algorithm, it appears to be related to the frozenset hashing and not strings. |
|
Date |
User |
Action |
Args |
2016-10-29 20:37:24 | Eric Appelt | set | recipients:
+ Eric Appelt, tim.peters, rhettinger, vstinner, berker.peksag, martin.panter, serhiy.storchaka |
2016-10-29 20:37:24 | Eric Appelt | set | messageid: <1477773444.32.0.12718368839.issue26163@psf.upfronthosting.co.za> |
2016-10-29 20:37:24 | Eric Appelt | link | issue26163 messages |
2016-10-29 20:37:24 | Eric Appelt | create | |
|