This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author Eric Appelt
Recipients Eric Appelt, berker.peksag, martin.panter, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date 2016-10-29.20:37:24
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1477773444.32.0.12718368839.issue26163@psf.upfronthosting.co.za>
In-reply-to
Content
I also looked at hashes of strings themselves rather than frozensets to check the hashing of strings directly.

For example, n=3:

['', 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']

rather than:

[frozenset(), frozenset({'a'}), frozenset({'b'}), frozenset({'c'}), frozenset({'b', 'a'}), frozenset({'c', 'a'}), frozenset({'b', 'c'}), frozenset({'b', 'a', 'c'})]

I made a distribution as with the last comment but now using the # of unique last-7 bit sequences in a set of 128 such strings (n=7) and compared to pseudorandom integers, just as was done before with frozensets of the letter combinations. This is shown in the file "str_string_n7_10k.png".

The last 7-bits of the small string hashes produce a distribution much like regular pseudorandom integers.

So if there is a problem with the hash algorithm, it appears to be related to the frozenset hashing and not strings.
History
Date User Action Args
2016-10-29 20:37:24Eric Appeltsetrecipients: + Eric Appelt, tim.peters, rhettinger, vstinner, berker.peksag, martin.panter, serhiy.storchaka
2016-10-29 20:37:24Eric Appeltsetmessageid: <1477773444.32.0.12718368839.issue26163@psf.upfronthosting.co.za>
2016-10-29 20:37:24Eric Appeltlinkissue26163 messages
2016-10-29 20:37:24Eric Appeltcreate