Message260459
Hi All,
This is Catalin from the Server Scripting Languages Optimization Team at Intel Corporation. I would like to submit a patch that replaces the 'malloc' allocator used by the list object (Objects/listobject.c) with the small object allocator (obmalloc.c) and simplifies the 'list_resize' function by removing a redundant check and properly handling resizing to zero.
Replacing PyMem_* calls with PyObject_* inside the list implementation is beneficial because many PyMem_* calls are made for requesting sizes that are better handled by the small object allocator. For example, when running Tools/pybench.py -w 1 a total of 48.295.840 allocation requests are made by the list implementation (either by using 'PyMem_MALLOC' directly or by calling 'PyMem_RESIZE') out of which 42.581.993 (88%) are requesting sizes that can be handled by the small object allocator (they're equal or less than 512 bytes in size).
The changes to 'list_resize' were made in order to further improve performance by removing a redundant check and handling the 'resize to zero' case separately. The 'empty' state of a list is suggested by the 'PyList_New' function as having the 'ob_item' pointer NULL and the 'ob_size' and 'allocated' members equal with 0. Previously, when being called with zero as a size parameter, 'list_resize' would set 'ob_size' and 'allocated' to zero, but it would also call 'PyMem_RESIZE' which, by its design, would call 'realloc' with a size of 1, thus going through the process of allocating an unnecessary 1 byte and setting the 'ob_item' pointer with the newly obtained address. The proposed implementation just deletes the buffer pointed by 'ob_item' and sets 'ob_size', 'allocated' and 'ob_item' to zero when receiving a 'resize to zero' request.
Hardware and OS Configuration
=============================
Hardware: Intel XEON (Haswell-EP) 36 Cores / Intel XEON (Broadwell-EP) 36 Cores
BIOS settings: Intel Turbo Boost Technology: false
Hyper-Threading: false
OS: Ubuntu 14.04.2 LTS
OS configuration: Address Space Layout Randomization (ASLR) disabled to reduce run
to run variation by echo 0 > /proc/sys/kernel/randomize_va_space
CPU frequency set fixed at 2.3GHz
GCC version: GCC version 5.1.0
Benchmark: Grand Unified Python Benchmark from
https://hg.python.org/benchmarks/
Measurements and Results
========================
A. Repository:
GUPB Benchmark:
hg id : 9923b81a1d34 tip
hg --debug id -i : 9923b81a1d346891f179f57f8780f86dcf5cf3b9
CPython3:
hg id : 733a902ac816 tip
hg id -r 'ancestors(.) and tag()': 737efcadf5a6 (3.4) v3.4.4
hg --debug id -i : 733a902ac816bd5b7b88884867ae1939844ba2c5
CPython2:
hg id : 5715a6d9ff12 (2.7)
hg id -r 'ancestors(.) and tag()': 6d1b6a68f775 (2.7) v2.7.11
hg --debug id -i : 5715a6d9ff12053e81f7ad75268ac059b079b351
B. Results:
CPython2 and CPython3 sample results, measured on a Haswell and a Broadwell platform can be viewed in Tables 1, 2, 3 and 4. The first column (Benchmark) is the benchmark name and the second (%D) is the speedup in percents compared with the unpatched version.
Table 1. CPython 3 results on Intel XEON (Haswell-EP) @ 2.3 GHz
Benchmark %D
----------------------------------
unpickle_list 20.27
regex_effbot 6.07
fannkuch 5.87
mako_v2 5.19
meteor_contest 4.31
simple_logging 3.98
nqueens 3.40
json_dump_v2 3.14
fastpickle 2.16
django_v3 2.03
tornado_http 1.90
pathlib 1.84
fastunpickle 1.81
call_simple 1.75
nbody 1.60
etree_process 1.58
go 1.54
call_method_unknown 1.53
2to3 1.26
telco 1.04
etree_generate 1.02
json_load 0.85
etree_parse 0.81
call_method_slots 0.73
etree_iterparse 0.68
call_method 0.65
normal_startup 0.63
silent_logging 0.56
chameleon_v2 0.56
pickle_list 0.52
regex_compile 0.50
hexiom2 0.47
pidigits 0.39
startup_nosite 0.17
pickle_dict 0.00
unpack_sequence 0.00
formatted_logging -0.06
raytrace -0.06
float -0.18
richards -0.37
spectral_norm -0.51
chaos -0.65
regex_v8 -0.72
Table 2. CPython 3 results on Intel XEON (Broadwell-EP) @ 2.3 GHz
Benchmark %D
----------------------------------
unpickle_list 15.75
nqueens 5.24
mako_v2 5.17
unpack_sequence 4.44
fannkuch 4.42
nbody 3.25
meteor_contest 2.86
regex_effbot 2.45
json_dump_v2 2.44
django_v3 2.26
call_simple 2.09
tornado_http 1.74
regex_compile 1.40
regex_v8 1.16
spectral_norm 0.89
2to3 0.76
chameleon_v2 0.70
telco 0.70
normal_startup 0.64
etree_generate 0.61
etree_process 0.55
hexiom2 0.51
json_load 0.51
call_method_slots 0.48
formatted_logging 0.33
call_method 0.28
startup_nosite -0.02
fastunpickle -0.02
pidigits -0.20
etree_parse -0.23
etree_iterparse -0.27
richards -0.30
silent_logging -0.36
pickle_list -0.42
simple_logging -0.82
float -0.91
pathlib -0.99
go -1.16
raytrace -1.16
chaos -1.26
fastpickle -1.72
call_method_unknown -2.94
pickle_dict -4.73
Table 3. CPython 2 results on Intel XEON (Haswell-EP) @ 2.3 GHz
Benchmark %D
----------------------------------
unpickle_list 15.89
json_load 11.53
fannkuch 7.90
mako_v2 7.01
meteor_contest 4.21
nqueens 3.81
fastunpickle 3.56
django_v3 2.91
call_simple 2.72
call_method_slots 2.45
slowpickle 2.23
call_method 2.21
html5lib_warmup 1.90
chaos 1.89
html5lib 1.81
regex_v8 1.81
tornado_http 1.66
2to3 1.56
json_dump_v2 1.49
nbody 1.38
rietveld 1.26
formatted_logging 1.12
regex_compile 0.99
spambayes 0.92
pickle_list 0.87
normal_startup 0.82
pybench 0.74
slowunpickle 0.71
raytrace 0.67
startup_nosite 0.59
float 0.47
hexiom2 0.46
slowspitfire 0.46
pidigits 0.44
etree_process 0.44
etree_generate 0.37
go 0.27
telco 0.24
regex_effbot 0.12
etree_iterparse 0.06
bzr_startup 0.04
richards 0.03
etree_parse 0.00
unpack_sequence 0.00
call_method_unknown -0.26
pathlib -0.57
fastpickle -0.64
silent_logging -0.94
simple_logging -1.10
chameleon_v2 -1.25
pickle_dict -1.67
spectral_norm -3.25
Table 4. CPython 2 results on Intel XEON (Broadwell-EP) @ 2.3 GHz
Benchmark %D
----------------------------------
unpickle_list 15.44
json_load 11.11
fannkuch 7.55
meteor_contest 5.51
mako_v2 4.94
nqueens 3.49
html5lib_warmup 3.15
html5lib 2.78
call_simple 2.35
silent_logging 2.33
json_dump_v2 2.14
startup_nosite 2.09
bzr_startup 1.93
fastunpickle 1.93
slowspitfire 1.91
regex_v8 1.79
rietveld 1.74
pybench 1.59
nbody 1.57
regex_compile 1.56
pathlib 1.51
tornado_http 1.33
normal_startup 1.21
2to3 1.14
chaos 1.00
spambayes 0.85
etree_process 0.73
pickle_list 0.70
float 0.69
hexiom2 0.51
slowpickle 0.44
call_method_unknown 0.42
slowunpickle 0.37
pickle_dict 0.25
etree_parse 0.20
go 0.19
django_v3 0.12
call_method_slots 0.12
spectral_norm 0.05
call_method 0.01
unpack_sequence 0.00
raytrace -0.08
pidigits -0.11
richards -0.16
etree_generate -0.23
regex_effbot -0.26
telco -0.28
simple_logging -0.32
etree_iterparse -0.38
formatted_logging -0.50
fastpickle -1.08
chameleon_v2 -1.74 |
|
Date |
User |
Action |
Args |
2016-02-18 14:26:53 | catalin.manciu | set | recipients:
+ catalin.manciu |
2016-02-18 14:26:51 | catalin.manciu | set | messageid: <1455805611.43.0.0364707113583.issue26382@psf.upfronthosting.co.za> |
2016-02-18 14:26:51 | catalin.manciu | link | issue26382 messages |
2016-02-18 14:26:48 | catalin.manciu | create | |
|