Author haypo
Recipients haypo, serhiy.storchaka
Date 2016-05-19.12:02:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <>
According to Linux perf, the unpickle_list benchmark (of the CPython benchmark suite) heavily depends on the performance of load() and _Unpickler_Read() functions. While running benchmarks with PGO+LTO compilation, I noticed a difference around 5% because one build uses constant propation on _Unpickler_Read() (fast), wheras another build doesn't (slow).

Depending on the result of the PGO, depending on the OS and the compiler, you may or may not get this nice optimization. I propose to implement it manually to not depend on the compiler.

Attached short patch implements manually the optimization. It looks to have a big impact on unpickle_list, but no impact (benchmark is not significant) on fastunpickle and pickle_list:
$ taskset -c 3 python3 ../ref_default/rel/python ../default/rel/python -b unpickle_list,fastunpickle,pickle_list --rigorous -v 
Report on Linux smithers 4.4.9-300.fc23.x86_64 #1 SMP Wed May 4 23:56:27 UTC 2016 x86_64 x86_64
Total CPU cores: 4

### fastunpickle ###
Avg: 0.527359 +/- 0.005932 -> 0.518548 +/- 0.00953: 1.02x faster
Not significant

### pickle_list ###
Avg: 0.269307 +/- 0.017465 -> 0.266015 +/- 0.00423: 1.01x faster
Not significant

### unpickle_list ###
Avg: 0.255805 +/- 0.006942 -> 0.231444 +/- 0.00394: 1.11x faster
Significant (t=21.58)

It would be interesting to also evaluate the computed goto optimization for the load() function. (And also try computed goto for the re/_sre module, but that's a different issue.)

I tuned my system and patched (of the CPython benchmark suite) to get stable benchmark results.
Date User Action Args
2016-05-19 12:02:29hayposetrecipients: + haypo, serhiy.storchaka
2016-05-19 12:02:29hayposetmessageid: <>
2016-05-19 12:02:29haypolinkissue27056 messages
2016-05-19 12:02:28haypocreate