Title: Link Time Optimizations support for GCC and CLANG
Hi All,
This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation. I would like to submit a patch that adds support for Link Time Optimization (LTO) when using GCC and CLANG to compile CPython2 and CPython3. LTO is a compiler assisted optimization technique that is performed by the compiler at link time.
Combined with Profile Guided Optimization (PGO), enabled when running "make profile-opt", and running the Grand Unified Python Benchmark (GUPB), a speedup up to 11%, with a few regressions, was observed comparing with PGO only. Compared with a default build, a performance gain as high as 26% was observed from PGO+LTO. In addition, we are also seeing 2% boost in throughput rate from our OpenStack Swift setup comparing with PGO only. Our GUPB performance evaluation was conducted on Intel SkyLake/Broadwell systems running CentOS/Ubuntu, with CLANG/LLVM and GCC 4.*/5.*. Our OpenStack Swift performance was done on various systems consisting of XEON and Avoton processors.
1. Get the CPython source codes
hg clone cpython
cd cpython
hg update 2.7 (for CPython2)
2. Build the binary
a) Default:
b) PGO:
make profile-opt
Copy the attached patch files
hg import --no-commit lto-cpython3-v01.patch (for CPython3)
hg import --no-commit lto-cpython2-v01.patch (for CPython2)
make profile-opt
Hardware and OS Configuration
Hardware: Intel XEON (Broadwell-DE) 8 Cores
BIOS settings: Intel Turbo Boost Technology: false
Hyper-Threading: false
OS: Ubuntu 14.04.3 LTS Server
OS configuration: Address Space Layout Randomization (ASLR) disabled to reduce run
to run variation by echo 0 > /proc/sys/kernel/randomize_va_space
CPU frequency set fixed at 2.6GHz
GCC version: GCC version 4.9.2
Benchmark: Grand Unified Python Benchmark from
Measurements and Results
A. Repository:
GUPB Benchmark:
hg id : 2979f5ce6a0c tip
hg --debug id -i : 2979f5ce6a0cee994d5485401945d8457bb0afac
hg id : 21a28f6de358
hg id -r 'ancestors(.) and tag()': 374f501f4567 (3.5) v3.5.0
hg --debug id -i : 21a28f6de3582833652c958b8fd6ae8448b61c7c
hg id : a37ea1d56e98 (2.7)
hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
hg --debug id -i : a37ea1d56e98eb158750d3e495a5cf524e8c3980
B. Results:
CPython2 and CPython3 sample results, measured on a Broadwell platform, can be viewed in Table 1 and 2. On the first column (Benchmark) you can see the benchmark name, on the second (%D) the speedup compared with the default version and on the third column (%PGO) the speedup compared with just PGO; a higher value is better.
Table 1. CPython2 results:
Benchmark %D %PGO
raytrace 18 3
chaos 16 5
django_v2 16 6
mako 16 6
pathlib 15 3
simple_logging 15 1
slowpickle 15 5
django 14 4
go 14 4
richards 13 -1
float 12 4
slowunpickle 12 4
etree_process 11 3
fastunpickle 11 6
formatted_logging 11 3
nqueens 11 1
regex_compile 11 3
etree_iterparse 10 4
mako_v2 10 3
telco 10 5
pybench 9 1
hexiom2 9 1
html5lib_warmup 9 3
meteor_contest 9 4
pickle_list 9 5
2to3 8 2
bzr_startup 8 2
chameleon 8 0
etree_generate 8 2
regex_v8 8 3
silent_logging 8 1
fannkuch 7 1
html5lib 7 3
json_load 7 -5
tornado_http 7 3
call_method_slots 6 3
json_dump_v2 6 -4
spambayes 6 2
unpickle_list 6 0
etree_parse 5 3
fastpickle 5 4
rietveld 5 1
call_method 4 -1
normal_startup 4 2
startup_nosite 4 2
slowspitfire 3 0
ssbench 4 2
call_method_unknown 1 -6
json_dump 1 -4
nbody 1 1
pidigits 1 -10
pickle_dict 0 -1
regex_effbot 0 -2
spectral_norm 0 -3
call_simple -3 -3
unpack_sequence -6 -2
Table 2. CPython3 results:
Benchmark %D %PGO
formatted_logging 26 11
raytrace 24 8
simple_logging 24 6
richards 22 3
chaos 21 7
go 21 11
hexiom2 21 8
nbody 21 9
etree_generate 19 5
etree_process 19 5
call_method_slots 18 3
fastunpickle 18 0
pathlib 18 5
regex_compile 18 8
float 17 8
nqueens 17 7
call_method 16 3
etree_iterparse 16 9
json_dump 16 -4
json_load 16 5
silent_logging 15 8
2to3 14 5
fannkuch 14 8
call_simple 12 0
meteor_contest 12 7
call_method_unknown 11 -1
spectral_norm 11 4
json_dump_v2 10 3
telco 10 5
fastpickle 9 -4
etree_parse 8 1
normal_startup 8 3
startup_nosite 7 3
unpack_sequence 7 3
regex_v8 6 4
unpickle_list 5 3
pickle_list 1 -10
pidigits 1 -11
regex_effbot -2 2
pickle_dict -3 -10
Thank you,
Alecsandru |
Date |
User |
Action |
Args |
2015-11-23 08:59:40 | alecsandru.patrascu | set | recipients:
+ alecsandru.patrascu |
2015-11-23 08:59:40 | alecsandru.patrascu | set | messageid: <> |
2015-11-23 08:59:40 | alecsandru.patrascu | link | issue25702 messages |
2015-11-23 08:59:39 | alecsandru.patrascu | create | |