Message 279792 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	Eric Appelt
Recipients	Eric Appelt, berker.peksag, christian.heimes, martin.panter, rhettinger, serhiy.storchaka, tim.peters, vstinner
Date	2016-10-31.14:43:04
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1477924985.21.0.760437198069.issue26163@psf.upfronthosting.co.za>
In-reply-to

Content
If I understand the reporting properly all tests so far have used SipHash24: Python 3.7.0a0 (default:5b33829badcc+, Oct 30 2016, 17:29:47) [GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import sysconfig >>> sysconfig.get_config_var("Py_HASH_ALGORITHM") 0 >>> import sys >>> sys.hash_info.algorithm 'siphash24' It sounds like it is worth it for me to be more rigorous and perform a battery of tests using FNV and then SipHash24 to compare: - Performing no dispersion after the frozenset hash is initially computed from XORing entry hashes (control) - Performing dispersion using an LCG after the frozenset hash is initially computed from XORing entry hashes (current approach) - Performing dispersion using the selected hash algorithm (FNV/SipHash24) after the frozenset hash is initially computed from XORing entry hashes (proposed approach) I'll take the six plots and merge them into a single PNG, and also post my (short)testing and plotting scripts for reproducibility and checking of the results. I can also write a regression test if you think that would be good to have in the test suite (perhaps skipped by default for time), where instead of using the same seven letters a-g as test strings and varying PYTHONHASHSEED, I could perform the letter test for n=7 with 10000 different sets of short random strings to see if any fell below threshold.

If I understand the reporting properly all tests so far have used SipHash24:

Python 3.7.0a0 (default:5b33829badcc+, Oct 30 2016, 17:29:47) 
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sysconfig
>>> sysconfig.get_config_var("Py_HASH_ALGORITHM")
0
>>> import sys
>>> sys.hash_info.algorithm
'siphash24'

It sounds like it is worth it for me to be more rigorous and perform a battery of tests using FNV and then SipHash24 to compare:

- Performing no dispersion after the frozenset hash is initially computed from XORing entry hashes (control)
- Performing dispersion using an LCG after the frozenset hash is initially computed from XORing entry hashes (current approach)
- Performing dispersion using the selected hash algorithm (FNV/SipHash24) after the frozenset hash is initially computed from XORing entry hashes (proposed approach)

I'll take the six plots and merge them into a single PNG, and also post my (short)testing and plotting scripts for reproducibility and checking of the results.

I can also write a regression test if you think that would be good to have in the test suite (perhaps skipped by default for time), where instead of using the same seven letters a-g as test strings and varying PYTHONHASHSEED, I could perform the letter test for n=7 with 10000 different sets of short random strings to see if any fell below threshold.

History
Date	User	Action	Args
2016-10-31 14:43:05	Eric Appelt	set	recipients: + Eric Appelt, tim.peters, rhettinger, vstinner, christian.heimes, berker.peksag, martin.panter, serhiy.storchaka
2016-10-31 14:43:05	Eric Appelt	set	messageid: <1477924985.21.0.760437198069.issue26163@psf.upfronthosting.co.za>
2016-10-31 14:43:05	Eric Appelt	link	issue26163 messages
2016-10-31 14:43:04	Eric Appelt	create