This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author ebfe
Recipients asvetlov, christian.heimes, ebfe, gregory.p.smith, isoschiz, jcea, mark.dickinson, neologix, pitrou, python-dev, rhettinger, serhiy.storchaka, skrah, tim.peters, vstinner
Date 2013-06-02.09:30:21
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1370165424.0.0.537763885555.issue16427@psf.upfronthosting.co.za>
In-reply-to
Content
I was investigating a callgrind dump of my code, showing how badly unicode_hash() was affecting my performance. Using google's cityhash  instead of the builtin algorithm to hash unicode objects improves overall performance by about 15 to 20 percent for my case - that is quite a thing.
Valgrind shows that the number of instructions spent by unicode_hash() drops from ~20% to ~11%. Amdahl crunches the two-fold performance increase to the mentioned 15 percent.

Cityhash was chosen because of it's MIT license and advertisement for performance on short strings.

I've now found this bug and attached a log for haypo's benchmark which compares native vs. cityhash. Caching was disabled during the test. Cityhash was compiled using -O3 -msse4.2 (cityhash uses cpu-native crc instructions). CPython's unittests fail due to known_hash and gdb output; besides that, everything else seems to work fine.

Cityhash is advertised for it's performance with short strings, which does not seem to show in the benchmark. However, longer strings perform *much* better.

If people are insterested, i can repeat the test on a armv7l
History
Date User Action Args
2013-06-02 09:30:24ebfesetrecipients: + ebfe, tim.peters, rhettinger, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, asvetlov, skrah, neologix, python-dev, serhiy.storchaka, isoschiz
2013-06-02 09:30:24ebfesetmessageid: <1370165424.0.0.537763885555.issue16427@psf.upfronthosting.co.za>
2013-06-02 09:30:23ebfelinkissue16427 messages
2013-06-02 09:30:23ebfecreate