Message217323
I splitted my patch into two parts:
- calloc-4.patch: add new "Calloc" functions including _PyObject_GC_Calloc()
- use_calloc.patch: patch types (bytes, dict, list, set, tuple, etc.) and various modules to use calloc
I reverted my changes on _PyObject_GC_Malloc() and added _PyObject_GC_Calloc(), performance regressions are gone. Creating a large tuple is a little bit (8%) faster. But the real speedup is to build a large bytes strings of null bytes:
$ ./python.orig -m timeit 'bytes(50*1024*1024)'
100 loops, best of 3: 5.7 msec per loop
$ ./python.calloc -m timeit 'bytes(50*1024*1024)'
100000 loops, best of 3: 4.12 usec per loop
On Linux, no memory is allocated, even if you read the bytes content. RSS is almost unchanged.
Ok, now the real use case where it becomes faster: I implemented the same optimization for bytearray.
$ ./python.orig -m timeit 'bytearray(50*1024*1024)'
100 loops, best of 3: 6.33 msec per loop
$ ./python.calloc -m timeit 'bytearray(50*1024*1024)'
100000 loops, best of 3: 4.09 usec per loop
If you overallocate a bytearray and only write a few bytes, the bytes of end of bytearray will not be allocated (at least on Linux).
Result of bench_alloc.py comparing original Python to patched Python (calloc-4.patch + use_calloc.patch).
Common platform:
SCM: hg revision=4b97092aa4bd+ tag=tip branch=default date="2014-04-27 18:02 +0100"
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Python unicode implementation: PEP 393
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Bits: int=32, long=64, long long=64, size_t=64, void*=64
Timer: time.perf_counter
CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
Platform: Linux-3.13.9-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug
Platform of campaign orig:
Timer precision: 42 ns
Date: 2014-04-28 00:27:19
Python version: 3.5.0a0 (default:4b97092aa4bd, Apr 28 2014, 00:24:03) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]
Platform of campaign calloc:
Timer precision: 54 ns
Date: 2014-04-28 00:28:35
Python version: 3.5.0a0 (default:4b97092aa4bd+, Apr 28 2014, 00:25:56) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]
-----------------------------------+-------------+--------------
Tests | orig | calloc
-----------------------------------+-------------+--------------
object() | 61 ns (*) | 71 ns (+16%)
b'A' * 10 | 54 ns (*) | 52 ns
b'A' * 10**3 | 124 ns (*) | 110 ns (-12%)
b'A' * 10**6 | 38.4 us (*) | 38.5 us
'A' * 10 | 59 ns (*) | 62 ns
'A' * 10**3 | 132 ns (*) | 107 ns (-19%)
'A' * 10**6 | 38.5 us (*) | 38.5 us
'A' * 10**8 | 10.3 ms (*) | 10.6 ms
decode 10 null bytes from ASCII | 264 ns (*) | 263 ns
decode 10**3 null bytes from ASCII | 403 ns (*) | 379 ns (-6%)
decode 10**6 null bytes from ASCII | 80.5 us (*) | 80.5 us
decode 10**8 null bytes from ASCII | 17.7 ms (*) | 17.3 ms
(None,) * 10**0 | 29 ns (*) | 28 ns
(None,) * 10**1 | 75 ns (*) | 76 ns
(None,) * 10**2 | 461 ns (*) | 460 ns
(None,) * 10**3 | 3.6 us (*) | 3.57 us
(None,) * 10**4 | 35.7 us (*) | 35.7 us
(None,) * 10**5 | 364 us (*) | 365 us
(None,) * 10**6 | 4.12 ms (*) | 4.11 ms
(None,) * 10**7 | 43.5 ms (*) | 40.3 ms (-7%)
(None,) * 10**8 | 433 ms (*) | 400 ms (-8%)
([None] * 10)[1:-1] | 121 ns (*) | 134 ns (+11%)
([None] * 10**3)[1:-1] | 3.62 us (*) | 3.61 us
([None] * 10**6)[1:-1] | 4.24 ms (*) | 4.22 ms
([None] * 10**8)[1:-1] | 440 ms (*) | 402 ms (-9%)
-----------------------------------+-------------+--------------
Total | 954 ms (*) | 880 ms (-8%)
-----------------------------------+-------------+-------------- |
|
Date |
User |
Action |
Args |
2014-04-27 23:03:31 | vstinner | set | recipients:
+ vstinner, pitrou, njs, skrah, neologix, jtaylor, josh.r |
2014-04-27 23:03:31 | vstinner | set | messageid: <1398639811.45.0.600702417766.issue21233@psf.upfronthosting.co.za> |
2014-04-27 23:03:31 | vstinner | link | issue21233 messages |
2014-04-27 23:03:30 | vstinner | create | |
|