This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author vstinner
Recipients josh.r, jtaylor, neologix, njs, pitrou, skrah, vstinner
Date 2014-04-27.23:03:27
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1398639811.45.0.600702417766.issue21233@psf.upfronthosting.co.za>
In-reply-to
Content
I splitted my patch into two parts:

- calloc-4.patch: add new "Calloc" functions including _PyObject_GC_Calloc()
- use_calloc.patch: patch types (bytes, dict, list, set, tuple, etc.) and various modules to use calloc

I reverted my changes on _PyObject_GC_Malloc() and added _PyObject_GC_Calloc(), performance regressions are gone. Creating a large tuple is a little bit (8%) faster. But the real speedup is to build a large bytes strings of null bytes:


$ ./python.orig -m timeit 'bytes(50*1024*1024)'
100 loops, best of 3: 5.7 msec per loop
$ ./python.calloc -m timeit 'bytes(50*1024*1024)'
100000 loops, best of 3: 4.12 usec per loop

On Linux, no memory is allocated, even if you read the bytes content. RSS is almost unchanged.

Ok, now the real use case where it becomes faster: I implemented the same optimization for bytearray.

$ ./python.orig -m timeit 'bytearray(50*1024*1024)'
100 loops, best of 3: 6.33 msec per loop
$ ./python.calloc -m timeit 'bytearray(50*1024*1024)'
100000 loops, best of 3: 4.09 usec per loop

If you overallocate a bytearray and only write a few bytes, the bytes of end of bytearray will not be allocated (at least on Linux).


Result of bench_alloc.py comparing original Python to patched Python (calloc-4.patch + use_calloc.patch).

Common platform:
SCM: hg revision=4b97092aa4bd+ tag=tip branch=default date="2014-04-27 18:02 +0100"
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Python unicode implementation: PEP 393
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Bits: int=32, long=64, long long=64, size_t=64, void*=64
Timer: time.perf_counter
CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
Platform: Linux-3.13.9-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug

Platform of campaign orig:
Timer precision: 42 ns
Date: 2014-04-28 00:27:19
Python version: 3.5.0a0 (default:4b97092aa4bd, Apr 28 2014, 00:24:03) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

Platform of campaign calloc:
Timer precision: 54 ns
Date: 2014-04-28 00:28:35
Python version: 3.5.0a0 (default:4b97092aa4bd+, Apr 28 2014, 00:25:56) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

-----------------------------------+-------------+--------------
Tests                              |        orig |        calloc
-----------------------------------+-------------+--------------
object()                           |   61 ns (*) |  71 ns (+16%)
b'A' * 10                          |   54 ns (*) |         52 ns
b'A' * 10**3                       |  124 ns (*) | 110 ns (-12%)
b'A' * 10**6                       | 38.4 us (*) |       38.5 us
'A' * 10                           |   59 ns (*) |         62 ns
'A' * 10**3                        |  132 ns (*) | 107 ns (-19%)
'A' * 10**6                        | 38.5 us (*) |       38.5 us
'A' * 10**8                        | 10.3 ms (*) |       10.6 ms
decode 10 null bytes from ASCII    |  264 ns (*) |        263 ns
decode 10**3 null bytes from ASCII |  403 ns (*) |  379 ns (-6%)
decode 10**6 null bytes from ASCII | 80.5 us (*) |       80.5 us
decode 10**8 null bytes from ASCII | 17.7 ms (*) |       17.3 ms
(None,) * 10**0                    |   29 ns (*) |         28 ns
(None,) * 10**1                    |   75 ns (*) |         76 ns
(None,) * 10**2                    |  461 ns (*) |        460 ns
(None,) * 10**3                    |  3.6 us (*) |       3.57 us
(None,) * 10**4                    | 35.7 us (*) |       35.7 us
(None,) * 10**5                    |  364 us (*) |        365 us
(None,) * 10**6                    | 4.12 ms (*) |       4.11 ms
(None,) * 10**7                    | 43.5 ms (*) | 40.3 ms (-7%)
(None,) * 10**8                    |  433 ms (*) |  400 ms (-8%)
([None] * 10)[1:-1]                |  121 ns (*) | 134 ns (+11%)
([None] * 10**3)[1:-1]             | 3.62 us (*) |       3.61 us
([None] * 10**6)[1:-1]             | 4.24 ms (*) |       4.22 ms
([None] * 10**8)[1:-1]             |  440 ms (*) |  402 ms (-9%)
-----------------------------------+-------------+--------------
Total                              |  954 ms (*) |  880 ms (-8%)
-----------------------------------+-------------+--------------
History
Date User Action Args
2014-04-27 23:03:31vstinnersetrecipients: + vstinner, pitrou, njs, skrah, neologix, jtaylor, josh.r
2014-04-27 23:03:31vstinnersetmessageid: <1398639811.45.0.600702417766.issue21233@psf.upfronthosting.co.za>
2014-04-27 23:03:31vstinnerlinkissue21233 messages
2014-04-27 23:03:30vstinnercreate