Author catalin.manciu
Recipients catalin.manciu
Date 2016-02-18.14:26:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1455805611.43.0.0364707113583.issue26382@psf.upfronthosting.co.za>
In-reply-to
Content
Hi All,

This is Catalin from the Server Scripting Languages Optimization Team at Intel Corporation. I would like to submit a patch that replaces the 'malloc' allocator used by the list object (Objects/listobject.c) with the small object allocator (obmalloc.c) and simplifies the 'list_resize' function by removing a redundant check and properly handling resizing to zero.

Replacing PyMem_* calls with PyObject_* inside the list implementation is beneficial because many PyMem_* calls are made for requesting sizes that are better handled by the small object allocator. For example, when running Tools/pybench.py -w 1 a total of 48.295.840 allocation requests are made by the list implementation (either by using 'PyMem_MALLOC' directly or by calling 'PyMem_RESIZE') out of which 42.581.993 (88%) are requesting sizes that can be handled by the small object allocator (they're equal or less than 512 bytes in size).

The changes to 'list_resize' were made in order to further improve performance by removing a redundant check and handling the 'resize to zero' case separately. The 'empty' state of a list is suggested by the 'PyList_New' function as having the 'ob_item' pointer NULL and the 'ob_size' and 'allocated' members equal with 0. Previously, when being called with zero as a size parameter, 'list_resize' would set 'ob_size' and 'allocated' to zero, but it would also call 'PyMem_RESIZE' which, by its design, would call 'realloc' with a size of 1, thus going through the process of allocating an unnecessary 1 byte and setting the 'ob_item' pointer with the newly obtained address. The proposed implementation just deletes the buffer pointed by 'ob_item' and sets 'ob_size', 'allocated' and 'ob_item' to zero when receiving a 'resize to zero' request.


Hardware and OS Configuration
=============================
Hardware:           Intel XEON (Haswell-EP) 36 Cores / Intel XEON (Broadwell-EP) 36 Cores

BIOS settings:      Intel Turbo Boost Technology: false
                    Hyper-Threading: false                  

OS:                 Ubuntu 14.04.2 LTS

OS configuration:   Address Space Layout Randomization (ASLR) disabled to reduce run
                    to run variation by echo 0 > /proc/sys/kernel/randomize_va_space
                    CPU frequency set fixed at 2.3GHz

GCC version:        GCC version 5.1.0

Benchmark:          Grand Unified Python Benchmark from 
                    https://hg.python.org/benchmarks/

Measurements and Results
========================
A. Repository:
    GUPB Benchmark:
        hg id :  9923b81a1d34 tip
        hg --debug id -i : 9923b81a1d346891f179f57f8780f86dcf5cf3b9

    CPython3:
        hg id : 733a902ac816 tip
        hg id -r 'ancestors(.) and tag()': 737efcadf5a6 (3.4) v3.4.4
        hg --debug id -i : 733a902ac816bd5b7b88884867ae1939844ba2c5

    CPython2:
        hg id : 5715a6d9ff12 (2.7)
        hg id -r 'ancestors(.) and tag()': 6d1b6a68f775 (2.7) v2.7.11
        hg --debug id -i : 5715a6d9ff12053e81f7ad75268ac059b079b351

B. Results:
CPython2 and CPython3 sample results, measured on a Haswell and a Broadwell platform can be viewed in Tables 1, 2, 3 and 4. The first column (Benchmark) is the benchmark name and the second (%D) is the speedup in percents compared with the unpatched version.

Table 1. CPython 3 results on Intel XEON (Haswell-EP) @ 2.3 GHz

Benchmark                   %D
----------------------------------
unpickle_list               20.27
regex_effbot                6.07
fannkuch                    5.87
mako_v2                     5.19
meteor_contest              4.31
simple_logging              3.98
nqueens                     3.40
json_dump_v2                3.14
fastpickle                  2.16
django_v3                   2.03
tornado_http                1.90
pathlib                     1.84
fastunpickle                1.81
call_simple                 1.75
nbody                       1.60
etree_process               1.58
go                          1.54
call_method_unknown         1.53
2to3                        1.26
telco                       1.04
etree_generate              1.02
json_load                   0.85
etree_parse                 0.81
call_method_slots           0.73
etree_iterparse             0.68
call_method                 0.65
normal_startup              0.63
silent_logging              0.56
chameleon_v2                0.56
pickle_list                 0.52
regex_compile               0.50
hexiom2                     0.47
pidigits                    0.39
startup_nosite              0.17
pickle_dict                 0.00
unpack_sequence             0.00
formatted_logging          -0.06
raytrace                   -0.06
float                      -0.18
richards                   -0.37
spectral_norm              -0.51
chaos                      -0.65
regex_v8                   -0.72


Table 2. CPython 3 results on Intel XEON (Broadwell-EP) @ 2.3 GHz

Benchmark                   %D
----------------------------------
unpickle_list               15.75
nqueens                     5.24
mako_v2                     5.17
unpack_sequence             4.44
fannkuch                    4.42
nbody                       3.25
meteor_contest              2.86
regex_effbot                2.45
json_dump_v2                2.44
django_v3                   2.26
call_simple                 2.09
tornado_http                1.74
regex_compile               1.40
regex_v8                    1.16
spectral_norm               0.89
2to3                        0.76   
chameleon_v2                0.70
telco                       0.70
normal_startup              0.64
etree_generate              0.61
etree_process               0.55
hexiom2                     0.51
json_load                   0.51
call_method_slots           0.48
formatted_logging           0.33
call_method                 0.28
startup_nosite             -0.02
fastunpickle               -0.02
pidigits                   -0.20
etree_parse                -0.23
etree_iterparse            -0.27
richards                   -0.30
silent_logging             -0.36
pickle_list                -0.42
simple_logging             -0.82
float                      -0.91
pathlib                    -0.99
go                         -1.16
raytrace                   -1.16
chaos                      -1.26
fastpickle                 -1.72
call_method_unknown        -2.94
pickle_dict                -4.73


Table 3. CPython 2 results on Intel XEON (Haswell-EP) @ 2.3 GHz

Benchmark                   %D
----------------------------------
unpickle_list               15.89
json_load                   11.53
fannkuch                    7.90
mako_v2                     7.01
meteor_contest              4.21
nqueens                     3.81
fastunpickle                3.56
django_v3                   2.91
call_simple                 2.72
call_method_slots           2.45
slowpickle                  2.23
call_method                 2.21
html5lib_warmup             1.90
chaos                       1.89
html5lib                    1.81
regex_v8                    1.81
tornado_http                1.66
2to3                        1.56
json_dump_v2                1.49
nbody                       1.38
rietveld                    1.26
formatted_logging           1.12
regex_compile               0.99
spambayes                   0.92
pickle_list                 0.87
normal_startup              0.82
pybench                     0.74
slowunpickle                0.71
raytrace                    0.67
startup_nosite              0.59
float                       0.47
hexiom2                     0.46
slowspitfire                0.46
pidigits                    0.44
etree_process               0.44
etree_generate              0.37
go                          0.27
telco                       0.24
regex_effbot                0.12
etree_iterparse             0.06
bzr_startup                 0.04
richards                    0.03
etree_parse                 0.00
unpack_sequence             0.00
call_method_unknown        -0.26
pathlib                    -0.57
fastpickle                 -0.64
silent_logging             -0.94
simple_logging             -1.10
chameleon_v2               -1.25
pickle_dict                -1.67
spectral_norm              -3.25


Table 4. CPython 2 results on Intel XEON (Broadwell-EP) @ 2.3 GHz

Benchmark                   %D
----------------------------------
unpickle_list               15.44
json_load                   11.11
fannkuch                    7.55
meteor_contest              5.51
mako_v2                     4.94
nqueens                     3.49
html5lib_warmup             3.15
html5lib                    2.78
call_simple                 2.35
silent_logging              2.33
json_dump_v2                2.14
startup_nosite              2.09
bzr_startup                 1.93
fastunpickle                1.93
slowspitfire                1.91
regex_v8                    1.79
rietveld                    1.74
pybench                     1.59
nbody                       1.57
regex_compile               1.56
pathlib                     1.51
tornado_http                1.33
normal_startup              1.21
2to3                        1.14
chaos                       1.00
spambayes                   0.85
etree_process               0.73
pickle_list                 0.70
float                       0.69
hexiom2                     0.51
slowpickle                  0.44
call_method_unknown         0.42
slowunpickle                0.37
pickle_dict                 0.25
etree_parse                 0.20
go                          0.19
django_v3                   0.12
call_method_slots           0.12
spectral_norm               0.05
call_method                 0.01
unpack_sequence             0.00
raytrace                   -0.08
pidigits                   -0.11
richards                   -0.16
etree_generate             -0.23
regex_effbot               -0.26
telco                      -0.28
simple_logging             -0.32
etree_iterparse            -0.38
formatted_logging          -0.50
fastpickle                 -1.08
chameleon_v2               -1.74
History
Date User Action Args
2016-02-18 14:26:53catalin.manciusetrecipients: + catalin.manciu
2016-02-18 14:26:51catalin.manciusetmessageid: <1455805611.43.0.0364707113583.issue26382@psf.upfronthosting.co.za>
2016-02-18 14:26:51catalin.manciulinkissue26382 messages
2016-02-18 14:26:48catalin.manciucreate