Message255140
Title: Link Time Optimizations support for GCC and CLANG
Hi All,
This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation. I would like to submit a patch that adds support for Link Time Optimization (LTO) when using GCC and CLANG to compile CPython2 and CPython3. LTO is a compiler assisted optimization technique that is performed by the compiler at link time.
Combined with Profile Guided Optimization (PGO), enabled when running "make profile-opt", and running the Grand Unified Python Benchmark (GUPB), a speedup up to 11%, with a few regressions, was observed comparing with PGO only. Compared with a default build, a performance gain as high as 26% was observed from PGO+LTO. In addition, we are also seeing 2% boost in throughput rate from our OpenStack Swift setup comparing with PGO only. Our GUPB performance evaluation was conducted on Intel SkyLake/Broadwell systems running CentOS/Ubuntu, with CLANG/LLVM and GCC 4.*/5.*. Our OpenStack Swift performance was done on various systems consisting of XEON and Avoton processors.
Steps:
======
1. Get the CPython source codes
hg clone https://hg.python.org/cpython cpython
cd cpython
hg update 2.7 (for CPython2)
2. Build the binary
a) Default:
./configure
make
b) PGO:
./configure
make profile-opt
c) PGO+LTO:
Copy the attached patch files
hg import --no-commit lto-cpython3-v01.patch (for CPython3)
hg import --no-commit lto-cpython2-v01.patch (for CPython2)
./configure
make profile-opt
Hardware and OS Configuration
=============================
Hardware: Intel XEON (Broadwell-DE) 8 Cores
BIOS settings: Intel Turbo Boost Technology: false
Hyper-Threading: false
OS: Ubuntu 14.04.3 LTS Server
OS configuration: Address Space Layout Randomization (ASLR) disabled to reduce run
to run variation by echo 0 > /proc/sys/kernel/randomize_va_space
CPU frequency set fixed at 2.6GHz
GCC version: GCC version 4.9.2
Benchmark: Grand Unified Python Benchmark from
https://hg.python.org/benchmarks/
Measurements and Results
========================
A. Repository:
GUPB Benchmark:
hg id : 2979f5ce6a0c tip
hg --debug id -i : 2979f5ce6a0cee994d5485401945d8457bb0afac
CPython3:
hg id : 21a28f6de358
hg id -r 'ancestors(.) and tag()': 374f501f4567 (3.5) v3.5.0
hg --debug id -i : 21a28f6de3582833652c958b8fd6ae8448b61c7c
CPython2:
hg id : a37ea1d56e98 (2.7)
hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
hg --debug id -i : a37ea1d56e98eb158750d3e495a5cf524e8c3980
B. Results:
CPython2 and CPython3 sample results, measured on a Broadwell platform, can be viewed in Table 1 and 2. On the first column (Benchmark) you can see the benchmark name, on the second (%D) the speedup compared with the default version and on the third column (%PGO) the speedup compared with just PGO; a higher value is better.
Table 1. CPython2 results:
Benchmark %D %PGO
--------------------------------
raytrace 18 3
chaos 16 5
django_v2 16 6
mako 16 6
pathlib 15 3
simple_logging 15 1
slowpickle 15 5
django 14 4
go 14 4
richards 13 -1
float 12 4
slowunpickle 12 4
etree_process 11 3
fastunpickle 11 6
formatted_logging 11 3
nqueens 11 1
regex_compile 11 3
etree_iterparse 10 4
mako_v2 10 3
telco 10 5
pybench 9 1
hexiom2 9 1
html5lib_warmup 9 3
meteor_contest 9 4
pickle_list 9 5
2to3 8 2
bzr_startup 8 2
chameleon 8 0
etree_generate 8 2
regex_v8 8 3
silent_logging 8 1
fannkuch 7 1
html5lib 7 3
json_load 7 -5
tornado_http 7 3
call_method_slots 6 3
json_dump_v2 6 -4
spambayes 6 2
unpickle_list 6 0
etree_parse 5 3
fastpickle 5 4
rietveld 5 1
call_method 4 -1
normal_startup 4 2
startup_nosite 4 2
slowspitfire 3 0
ssbench 4 2
call_method_unknown 1 -6
json_dump 1 -4
nbody 1 1
pidigits 1 -10
pickle_dict 0 -1
regex_effbot 0 -2
spectral_norm 0 -3
call_simple -3 -3
unpack_sequence -6 -2
Table 2. CPython3 results:
Benchmark %D %PGO
--------------------------------
formatted_logging 26 11
raytrace 24 8
simple_logging 24 6
richards 22 3
chaos 21 7
go 21 11
hexiom2 21 8
nbody 21 9
etree_generate 19 5
etree_process 19 5
call_method_slots 18 3
fastunpickle 18 0
pathlib 18 5
regex_compile 18 8
float 17 8
nqueens 17 7
call_method 16 3
etree_iterparse 16 9
json_dump 16 -4
json_load 16 5
silent_logging 15 8
2to3 14 5
fannkuch 14 8
call_simple 12 0
meteor_contest 12 7
call_method_unknown 11 -1
spectral_norm 11 4
json_dump_v2 10 3
telco 10 5
fastpickle 9 -4
etree_parse 8 1
normal_startup 8 3
startup_nosite 7 3
unpack_sequence 7 3
regex_v8 6 4
unpickle_list 5 3
pickle_list 1 -10
pidigits 1 -11
regex_effbot -2 2
pickle_dict -3 -10
Thank you,
Alecsandru |
|
Date |
User |
Action |
Args |
2015-11-23 08:59:40 | alecsandru.patrascu | set | recipients:
+ alecsandru.patrascu |
2015-11-23 08:59:40 | alecsandru.patrascu | set | messageid: <1448269180.86.0.895879966461.issue25702@psf.upfronthosting.co.za> |
2015-11-23 08:59:40 | alecsandru.patrascu | link | issue25702 messages |
2015-11-23 08:59:39 | alecsandru.patrascu | create | |
|