Author alecsandru.patrascu
Recipients alecsandru.patrascu
Date 2015-11-23.08:59:39
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1448269180.86.0.895879966461.issue25702@psf.upfronthosting.co.za>
In-reply-to
Content
Title: Link Time Optimizations support for GCC and CLANG

Hi All,

This is Alecsandru from Server Scripting Languages Optimization team at Intel Corporation. I would like to submit a patch that adds support for Link Time Optimization (LTO) when using GCC and CLANG to compile CPython2 and CPython3. LTO is a compiler assisted optimization technique that is performed by the compiler at link time.

Combined with Profile Guided Optimization (PGO), enabled when running "make profile-opt", and running the Grand Unified Python Benchmark (GUPB), a speedup up to 11%, with a few regressions, was observed comparing with PGO only. Compared with a default build, a performance gain as high as 26% was observed from PGO+LTO. In addition, we are also seeing 2% boost in throughput rate from our OpenStack Swift setup comparing with PGO only. Our GUPB performance evaluation was conducted on Intel SkyLake/Broadwell systems running CentOS/Ubuntu, with CLANG/LLVM and GCC 4.*/5.*. Our OpenStack Swift performance was done on various systems consisting of XEON and Avoton processors.

Steps:
======

1. Get the CPython source codes
    hg clone https://hg.python.org/cpython cpython
    cd cpython
    hg update 2.7 (for CPython2)

2. Build the binary
    a) Default:
        ./configure
        make
    
    b) PGO:
        ./configure
        make profile-opt
        
    c) PGO+LTO:
        Copy the attached patch files
        hg import --no-commit lto-cpython3-v01.patch (for CPython3)
        hg import --no-commit lto-cpython2-v01.patch (for CPython2)
        ./configure
        make profile-opt

        
Hardware and OS Configuration
=============================
Hardware:           Intel XEON (Broadwell-DE) 8 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false                  

OS:                 Ubuntu 14.04.3 LTS Server

OS configuration:   Address Space Layout Randomization (ASLR) disabled to reduce run
                    to run variation by echo 0 > /proc/sys/kernel/randomize_va_space
                    CPU frequency set fixed at 2.6GHz

GCC version:        GCC version 4.9.2

Benchmark:          Grand Unified Python Benchmark from 
                    https://hg.python.org/benchmarks/

                    
Measurements and Results
========================
A. Repository:
    GUPB Benchmark:
        hg id :  2979f5ce6a0c tip
        hg --debug id -i : 2979f5ce6a0cee994d5485401945d8457bb0afac

    CPython3:
        hg id : 21a28f6de358
        hg id -r 'ancestors(.) and tag()': 374f501f4567 (3.5) v3.5.0
        hg --debug id -i : 21a28f6de3582833652c958b8fd6ae8448b61c7c

    CPython2:
        hg id : a37ea1d56e98 (2.7)
        hg id -r 'ancestors(.) and tag()': 15c95b7d81dc (2.7) v2.7.10
        hg --debug id -i : a37ea1d56e98eb158750d3e495a5cf524e8c3980


B. Results: 
CPython2 and CPython3 sample results, measured on a Broadwell platform, can be viewed in Table 1 and 2. On the first column (Benchmark) you can see the benchmark name, on the second (%D) the speedup compared with the default version and on the third column (%PGO) the speedup compared with just PGO; a higher value is better.

Table 1. CPython2 results:
Benchmark           %D      %PGO
--------------------------------
raytrace            18      3
chaos               16      5
django_v2           16      6
mako                16      6
pathlib             15      3
simple_logging      15      1
slowpickle          15      5
django              14      4
go                  14      4
richards            13      -1
float               12      4
slowunpickle        12      4
etree_process       11      3
fastunpickle        11      6
formatted_logging   11      3
nqueens             11      1
regex_compile       11      3
etree_iterparse     10      4
mako_v2             10      3
telco               10      5
pybench             9       1
hexiom2             9       1
html5lib_warmup     9       3
meteor_contest      9       4
pickle_list         9       5
2to3                8       2
bzr_startup         8       2
chameleon           8       0
etree_generate      8       2
regex_v8            8       3
silent_logging      8       1
fannkuch            7       1
html5lib            7       3
json_load           7       -5
tornado_http        7       3
call_method_slots   6       3
json_dump_v2        6       -4
spambayes           6       2
unpickle_list       6       0
etree_parse         5       3
fastpickle          5       4
rietveld            5       1
call_method         4       -1
normal_startup      4       2
startup_nosite      4       2
slowspitfire        3       0
ssbench             4       2
call_method_unknown 1       -6
json_dump           1       -4
nbody               1       1
pidigits            1       -10
pickle_dict         0       -1
regex_effbot        0       -2
spectral_norm       0       -3
call_simple         -3      -3
unpack_sequence     -6      -2


Table 2. CPython3 results:
Benchmark           %D      %PGO
--------------------------------
formatted_logging   26      11
raytrace            24      8
simple_logging      24      6
richards            22      3
chaos               21      7
go                  21      11
hexiom2             21      8
nbody               21      9
etree_generate      19      5
etree_process       19      5
call_method_slots   18      3
fastunpickle        18      0
pathlib             18      5
regex_compile       18      8
float               17      8
nqueens             17      7
call_method         16      3
etree_iterparse     16      9
json_dump           16      -4
json_load           16      5
silent_logging      15      8
2to3                14      5
fannkuch            14      8
call_simple         12      0
meteor_contest      12      7
call_method_unknown 11      -1
spectral_norm       11      4
json_dump_v2        10      3
telco               10      5
fastpickle          9       -4
etree_parse         8       1
normal_startup      8       3
startup_nosite      7       3
unpack_sequence     7       3
regex_v8            6       4
unpickle_list       5       3
pickle_list         1       -10
pidigits            1       -11
regex_effbot        -2      2
pickle_dict         -3      -10

Thank you,
Alecsandru
History
Date User Action Args
2015-11-23 08:59:40alecsandru.patrascusetrecipients: + alecsandru.patrascu
2015-11-23 08:59:40alecsandru.patrascusetmessageid: <1448269180.86.0.895879966461.issue25702@psf.upfronthosting.co.za>
2015-11-23 08:59:40alecsandru.patrasculinkissue25702 messages
2015-11-23 08:59:39alecsandru.patrascucreate