Message 279698 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Eric Appelt
Recipients	Eric Appelt, berker.peksag, martin.panter, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date	2016-10-29.20:37:24
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1477773444.32.0.12718368839.issue26163@psf.upfronthosting.co.za>
In-reply-to

Content
I also looked at hashes of strings themselves rather than frozensets to check the hashing of strings directly. For example, n=3: ['', 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc'] rather than: [frozenset(), frozenset({'a'}), frozenset({'b'}), frozenset({'c'}), frozenset({'b', 'a'}), frozenset({'c', 'a'}), frozenset({'b', 'c'}), frozenset({'b', 'a', 'c'})] I made a distribution as with the last comment but now using the # of unique last-7 bit sequences in a set of 128 such strings (n=7) and compared to pseudorandom integers, just as was done before with frozensets of the letter combinations. This is shown in the file "str_string_n7_10k.png". The last 7-bits of the small string hashes produce a distribution much like regular pseudorandom integers. So if there is a problem with the hash algorithm, it appears to be related to the frozenset hashing and not strings.

I also looked at hashes of strings themselves rather than frozensets to check the hashing of strings directly.

For example, n=3:

['', 'a', 'b', 'c', 'ab', 'ac', 'bc', 'abc']

rather than:

[frozenset(), frozenset({'a'}), frozenset({'b'}), frozenset({'c'}), frozenset({'b', 'a'}), frozenset({'c', 'a'}), frozenset({'b', 'c'}), frozenset({'b', 'a', 'c'})]

I made a distribution as with the last comment but now using the # of unique last-7 bit sequences in a set of 128 such strings (n=7) and compared to pseudorandom integers, just as was done before with frozensets of the letter combinations. This is shown in the file "str_string_n7_10k.png".

The last 7-bits of the small string hashes produce a distribution much like regular pseudorandom integers.

So if there is a problem with the hash algorithm, it appears to be related to the frozenset hashing and not strings.

History
Date	User	Action	Args
2016-10-29 20:37:24	Eric Appelt	set	recipients: + Eric Appelt, tim.peters, rhettinger, vstinner, berker.peksag, martin.panter, serhiy.storchaka
2016-10-29 20:37:24	Eric Appelt	set	messageid: <1477773444.32.0.12718368839.issue26163@psf.upfronthosting.co.za>
2016-10-29 20:37:24	Eric Appelt	link	issue26163 messages
2016-10-29 20:37:24	Eric Appelt	create