Message 175093 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	serhiy.storchaka
Recipients	serhiy.storchaka
Date	2012-11-07.12:38:46
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1352291929.32.0.895096534658.issue16427@psf.upfronthosting.co.za>
In-reply-to

Content
In the discussion of issue14621 it was noted that much more complex hash algorithms can overtake the current one due to the fact that they process more data at a time. Here is a patch that implements this idea for the current algorithm. Also code duplication removed. Microbenchmarks: $ ./python -m timeit -n 1 -s "t = b'a' * 10*8" "hash(t)" $ ./python -m timeit -n 1 -s "t = 'a' 10*8" "hash(t)" $ ./python -m timeit -n 1 -s "t = '\u0100' 10*8" "hash(t)" $ ./python -m timeit -n 1 -s "t = '\U00010000' 10**8" "hash(t)" Results on 32-bit Linux on AMD Athlon 64 X2 4600+: original patched speedup bytes 181 msec 45.7 msec 4x UCS1 429 msec 45.7 msec 9.4x UCS2 179 msec 92 msec 1.9x UCS4 183 msec 183 msec 1x If the idea is acceptable, I will create benchmarks for short strings.

In the discussion of issue14621 it was noted that much more complex hash algorithms can overtake the current one due to the fact that they process more data at a time.  Here is a patch that implements this idea for the current algorithm.  Also code duplication removed.

Microbenchmarks:

$ ./python -m timeit -n 1 -s "t = b'a' * 10**8"  "hash(t)"
$ ./python -m timeit -n 1 -s "t = 'a' * 10**8"  "hash(t)"
$ ./python -m timeit -n 1 -s "t = '\u0100' * 10**8"  "hash(t)"
$ ./python -m timeit -n 1 -s "t = '\U00010000' * 10**8"  "hash(t)"

Results on 32-bit Linux on AMD Athlon 64 X2 4600+:

       original  patched    speedup

bytes  181 msec  45.7 msec  4x
UCS1   429 msec  45.7 msec  9.4x
UCS2   179 msec  92 msec    1.9x
UCS4   183 msec  183 msec   1x

If the idea is acceptable, I will create benchmarks for short strings.

History
Date	User	Action	Args
2012-11-07 12:38:49	serhiy.storchaka	set	recipients: + serhiy.storchaka
2012-11-07 12:38:49	serhiy.storchaka	set	messageid: <1352291929.32.0.895096534658.issue16427@psf.upfronthosting.co.za>
2012-11-07 12:38:48	serhiy.storchaka	link	issue16427 messages
2012-11-07 12:38:48	serhiy.storchaka	create