Message 190475 - Python tracker

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author	ebfe
Recipients	asvetlov, christian.heimes, ebfe, gregory.p.smith, isoschiz, jcea, mark.dickinson, neologix, pitrou, python-dev, rhettinger, serhiy.storchaka, skrah, tim.peters, vstinner
Date	2013-06-02.09:30:21
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1370165424.0.0.537763885555.issue16427@psf.upfronthosting.co.za>
In-reply-to

Content
I was investigating a callgrind dump of my code, showing how badly unicode_hash() was affecting my performance. Using google's cityhash instead of the builtin algorithm to hash unicode objects improves overall performance by about 15 to 20 percent for my case - that is quite a thing. Valgrind shows that the number of instructions spent by unicode_hash() drops from ~20% to ~11%. Amdahl crunches the two-fold performance increase to the mentioned 15 percent. Cityhash was chosen because of it's MIT license and advertisement for performance on short strings. I've now found this bug and attached a log for haypo's benchmark which compares native vs. cityhash. Caching was disabled during the test. Cityhash was compiled using -O3 -msse4.2 (cityhash uses cpu-native crc instructions). CPython's unittests fail due to known_hash and gdb output; besides that, everything else seems to work fine. Cityhash is advertised for it's performance with short strings, which does not seem to show in the benchmark. However, longer strings perform much better. If people are insterested, i can repeat the test on a armv7l

I was investigating a callgrind dump of my code, showing how badly unicode_hash() was affecting my performance. Using google's cityhash  instead of the builtin algorithm to hash unicode objects improves overall performance by about 15 to 20 percent for my case - that is quite a thing.
Valgrind shows that the number of instructions spent by unicode_hash() drops from ~20% to ~11%. Amdahl crunches the two-fold performance increase to the mentioned 15 percent.

Cityhash was chosen because of it's MIT license and advertisement for performance on short strings.

I've now found this bug and attached a log for haypo's benchmark which compares native vs. cityhash. Caching was disabled during the test. Cityhash was compiled using -O3 -msse4.2 (cityhash uses cpu-native crc instructions). CPython's unittests fail due to known_hash and gdb output; besides that, everything else seems to work fine.

Cityhash is advertised for it's performance with short strings, which does not seem to show in the benchmark. However, longer strings perform *much* better.

If people are insterested, i can repeat the test on a armv7l

History
Date	User	Action	Args
2013-06-02 09:30:24	ebfe	set	recipients: + ebfe, tim.peters, rhettinger, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, asvetlov, skrah, neologix, python-dev, serhiy.storchaka, isoschiz
2013-06-02 09:30:24	ebfe	set	messageid: <1370165424.0.0.537763885555.issue16427@psf.upfronthosting.co.za>
2013-06-02 09:30:23	ebfe	link	issue16427 messages
2013-06-02 09:30:23	ebfe	create