Message190475
I was investigating a callgrind dump of my code, showing how badly unicode_hash() was affecting my performance. Using google's cityhash instead of the builtin algorithm to hash unicode objects improves overall performance by about 15 to 20 percent for my case - that is quite a thing.
Valgrind shows that the number of instructions spent by unicode_hash() drops from ~20% to ~11%. Amdahl crunches the two-fold performance increase to the mentioned 15 percent.
Cityhash was chosen because of it's MIT license and advertisement for performance on short strings.
I've now found this bug and attached a log for haypo's benchmark which compares native vs. cityhash. Caching was disabled during the test. Cityhash was compiled using -O3 -msse4.2 (cityhash uses cpu-native crc instructions). CPython's unittests fail due to known_hash and gdb output; besides that, everything else seems to work fine.
Cityhash is advertised for it's performance with short strings, which does not seem to show in the benchmark. However, longer strings perform *much* better.
If people are insterested, i can repeat the test on a armv7l |
|
Date |
User |
Action |
Args |
2013-06-02 09:30:24 | ebfe | set | recipients:
+ ebfe, tim.peters, rhettinger, gregory.p.smith, jcea, mark.dickinson, pitrou, vstinner, christian.heimes, asvetlov, skrah, neologix, python-dev, serhiy.storchaka, isoschiz |
2013-06-02 09:30:24 | ebfe | set | messageid: <1370165424.0.0.537763885555.issue16427@psf.upfronthosting.co.za> |
2013-06-02 09:30:23 | ebfe | link | issue16427 messages |
2013-06-02 09:30:23 | ebfe | create | |
|