classification
Title: Add *Calloc functions to CPython memory allocation API
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: haypo, josh.r, jtaylor, neologix, njs, pitrou, python-dev, skrah
Priority: normal Keywords: patch

Created on 2014-04-15 08:56 by njs, last changed 2014-06-02 20:29 by haypo. This issue is now closed.

Files
File name Uploaded Description Edit
calloc.patch haypo, 2014-04-15 21:27 review
calloc-2.patch haypo, 2014-04-16 04:21 review
calloc-3.patch haypo, 2014-04-16 19:48 review
bench_alloc.py haypo, 2014-04-27 10:36
test.c neologix, 2014-04-27 18:31
calloc-4.patch haypo, 2014-04-27 23:03 review
use_calloc.patch haypo, 2014-04-27 23:03 review
bench_alloc2.py haypo, 2014-04-27 23:15
calloc-5.patch haypo, 2014-04-28 09:01 review
calloc-6.patch haypo, 2014-04-29 20:59 review
Messages (95)
msg216281 - (view) Author: Nathaniel Smith (njs) * Date: 2014-04-15 08:55
Numpy would like to switch to using the CPython allocator interface in order to take advantage of the new tracemalloc infrastructure in 3.4. But, numpy relies on the availability of calloc(), and the CPython allocator API does not expose calloc().
  https://docs.python.org/3.5/c-api/memory.html#c.PyMemAllocator

So, we should add *Calloc variants. This met general approval on python-dev. Thread here:
  https://mail.python.org/pipermail/python-dev/2014-April/133985.html

This would involve adding a new .calloc field to the PyMemAllocator struct, exposed through new API functions PyMem_RawCalloc, PyMem_Calloc, PyObject_Calloc. [It's not clear that all 3 would really be used, but since we have only one PyMemAllocator struct that they all share, it'd be hard to add support to only one or two of these domains and not the rest. And the higher-level calloc variants might well be used. Numpy array buffers are often small (e.g., holding only a single value), and these small buffers benefit from small-alloc optimizations; meanwhile, large buffers benefit from calloc optimizations. So it might be optimal to use a single allocator that has both.]

We might also have to rename the PyMemAllocator struct to ensure that compiling old code with the new headers doesn't silently leave garbage in the .calloc field and lead to crashes.
msg216390 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-15 21:27
Here is a first patch adding the following functions:

  void* PyMem_RawCalloc(size_t n);
  void* PyMem_Calloc(size_t n);
  void* PyObject_Calloc(size_t n);
  PyObject* _PyObject_GC_Calloc(size_t);

It adds the following field after malloc field to PyMemAllocator structure:

  void* (*calloc) (void *ctx, size_t size);

It changes the tracemalloc module to trace "calloc" allocations, add new tests and document new functions.

The patch also contains an important change: PyType_GenericAlloc() uses calloc instead of malloc+memset(0). It may be faster, I didn't check.
msg216394 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-15 21:39
So what is the point of _PyObject_GC_Calloc ?
msg216399 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-04-15 22:05
General comment on patch: For the flag value that toggles zero-ing, perhaps use a different name, e.g. setzero, clearmem, initzero or somesuch instead of calloc? calloc already gets used to refer to both the C standard function and the function pointer structure member; it's mildly confusing to have it *also* refer to a boolean flag as well.
msg216403 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-04-15 22:17
Additional comment on clarity: Might it make sense to make the calloc structure member take both the num and size arguments that the underlying calloc takes? That is, instead of:

void* (*calloc) (void *ctx, size_t size);

Declare it as:

void* (*calloc) (void *ctx, size_t num, size_t size);

Beyond potentially allowing more detailed tracing info at some later point (and much like the original calloc, potentially allowing us to verify that the components do not overflow on multiply, instead of assuming every caller must multiply and check for themselves), it also seems like it's a bit more friendly to have the prototype for the structure calloc to follow the same pattern as the other members for consistency (Principle of Least Surprise): A context pointer, plus the arguments expected by the equivalent C function.
msg216404 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-04-15 22:20
Sorry for breaking it up, but the same comment on consistent prototypes mirroring the C standard lib calloc would apply to all the API functions as well, e.g. PyMem_RawCalloc, PyMem_Calloc, PyObject_Calloc and _PyObject_GC_Calloc, not just the structure function pointer.
msg216422 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-16 02:40
> So what is the point of _PyObject_GC_Calloc ?

It calls calloc(size) instead of malloc(size), calloc() which can be faster than malloc()+memset(), see:
https://mail.python.org/pipermail/python-dev/2014-April/133985.html

_PyObject_GC_Calloc() is used by PyType_GenericAlloc(). If I understand directly, it is the default allocator to allocate Python objects.
msg216425 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-16 02:49
In numpy, I found the two following functions:


/*NUMPY_API
 * Allocates memory for array data.
 */
void* PyDataMem_NEW(size_t size);

/*NUMPY_API
 * Allocates zeroed memory for array data.
 */
void* PyDataMem_NEW_ZEROED(size_t size, size_t elsize);

So it looks like it needs two size_t parameters. Prototype of the C function calloc():

void *calloc(size_t nmemb, size_t size);

I agree that it's better to provide the same prototype than calloc().
msg216431 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-16 04:21
New patch:

- replace "size_t size" with "size_t nelem, size_t elsize" in the prototype of calloc functions (the parameter names come from the POSIX standard)
- replace "int calloc" with "int zero" in helper functions
msg216433 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-16 05:34
Le 16/04/2014 04:40, STINNER Victor a écrit :
>
> STINNER Victor added the comment:
>
>> So what is the point of _PyObject_GC_Calloc ?
>
> It calls calloc(size) instead of malloc(size)

No, the question is why you didn't simply change _PyObject_GC_Malloc 
(which is a private function).
msg216444 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-16 07:18
>> So what is the point of _PyObject_GC_Calloc ?
>
> It calls calloc(size) instead of malloc(size), calloc() which can be faster than malloc()+memset(), see:
> https://mail.python.org/pipermail/python-dev/2014-April/133985.html

It will only make a difference if the allocated region is large enough
to be allocated by mmap (so not for 90% of objects).
msg216451 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-16 08:04
>>> So what is the point of _PyObject_GC_Calloc ?
>>
>> It calls calloc(size) instead of malloc(size)
>
> No, the question is why you didn't simply change _PyObject_GC_Malloc
> (which is a private function).

Oh ok, I didn't understand. I don't like changing the behaviour of
functions, but it's maybe fine if the function is private.
msg216452 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-16 08:06
2014-04-16 3:18 GMT-04:00 Charles-François Natali <report@bugs.python.org>:
>> It calls calloc(size) instead of malloc(size), calloc() which can be faster than malloc()+memset(), see:
>> https://mail.python.org/pipermail/python-dev/2014-April/133985.html
>
> It will only make a difference if the allocated region is large enough
> to be allocated by mmap (so not for 90% of objects).

Even if there are only 10% of cases where it may be faster, I think
that it's interesting to use calloc() to allocate Python objects. You
may create large Python objects ;-)

I didn't check which objects use (indirectly) _PyObject_GC_Calloc().
msg216455 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-16 09:54
I left a Rietveld comment, which probably did not get mailed.
msg216515 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-16 17:47
On mer., 2014-04-16 at 08:06 +0000, STINNER Victor wrote:
> I didn't check which objects use (indirectly) _PyObject_GC_Calloc().

I've checked: lists, tuples, dicts and sets at least seem to use it.
Obviously, objects which are not tracked by the GC (such as str and
bytes) won't use it.
msg216567 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-16 19:48
Patch version 3: remove _PyObject_GC_Calloc(), modify _PyObject_GC_Malloc() instead of use calloc() instead of malloc()+memset(0).
msg216668 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-17 07:19
Do you have benchmarks?

(I'm not looking for an improvement, just no regression.)
msg216671 - (view) Author: Julian Taylor (jtaylor) Date: 2014-04-17 08:04
won't replacing _PyObject_GC_Malloc with a calloc cause Var objects (PyObject_NewVar) to be completely zeroed which I think they didn't before?
Some numeric programs stuff a lot of data into var objects and could care about python suddenly setting them to zero when they don't need it.
An example would be tinyarray.
msg216681 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-04-17 10:35
Julian: No. See the diff: http://bugs.python.org/review/21233/diff/11644/Objects/typeobject.c

The original GC_Malloc was explicitly memset-ing after confirming that it received a non-NULL pointer from the underlying malloc call; that memset is removed in favor of using the calloc call.
msg216682 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-04-17 10:39
Well, to be more specific, PyType_GenericAlloc was originally calling one of two methods that didn't zero the memory (one of which was GC_Malloc), then memset-ing. Just realized you're talking about something else; not sure if you're correct about this now, but I have to get to work, will check later if no one else does.
msg216686 - (view) Author: Julian Taylor (jtaylor) Date: 2014-04-17 11:35
I just tested it, PyObject_NewVar seems to use RawMalloc not the GC malloc so its probably fine.
msg217228 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 00:05
I read again some remarks about alignement, it was suggested to provide allocators providing an address aligned to a requested alignement. This topic was already discussed in #18835.

If Python doesn't provide such memory allocators, it was suggested to provide a "trace" function which can be called on the result of a successful allocator to "trace" an allocation (and a similar function for free). But this is very different from the design of the PEP 445 (new malloc API). Basically, it requires to rewrite the PEP 445.
msg217242 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 08:30
> I read again some remarks about alignement, it was suggested to provide allocators providing an address aligned to a requested alignement. This topic was already discussed in #18835.

The alignement issue is really orthogonal to the calloc one, so IMO
this shouldn't be discussed here (and FWIW I don't think we should
expose those: alignement only matters either for concurrency or SIMD
instructions, and I don't think we should try to standardize this kind
of API, it's way to special-purpose (then we'd have to think about
huge pages, etc...). Whereas calloc is a simple and immediately useful
addition, not only for Numpy but also CPython).
msg217246 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 09:51
2014-04-27 10:30 GMT+02:00 Charles-François Natali <report@bugs.python.org>:
>> I read again some remarks about alignement, it was suggested to provide allocators providing an address aligned to a requested alignement. This topic was already discussed in #18835.
>
> The alignement issue is really orthogonal to the calloc one, so IMO
> this shouldn't be discussed here (and FWIW I don't think we should
> expose those: alignement only matters either for concurrency or SIMD
> instructions, and I don't think we should try to standardize this kind
> of API, it's way to special-purpose (then we'd have to think about
> huge pages, etc...). Whereas calloc is a simple and immediately useful
> addition, not only for Numpy but also CPython).

This issue was opened to be able to use tracemalloc on numpy. I would
like to make sure that calloc is enough for numpy. I would prefer to
change the malloc API only once.
msg217251 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 10:20
> This issue was opened to be able to use tracemalloc on numpy. I would
> like to make sure that calloc is enough for numpy. I would prefer to
> change the malloc API only once.

Then please at least rename the issue. Also, I don't see why
everything should be done at once: calloc support is a self-contained
change, which is useful outside of numpy. Enhanced tracemalloc support
for numpy certainly belongs to another issue.

Regarding the *Calloc functions: how about we provide a sane API
instead of reproducing the cumbersome C API?

I mean, why not expose:
PyAPI_FUNC(void *) PyMem_Calloc(size_t size);
insteaf of
PyAPI_FUNC(void *) PyMem_Calloc(size_t nelem, size_t elsize);

AFAICT, the two arguments are purely historical (it was used when
malloc() didn't guarantee suitable alignment, and has the advantage of
performing overflow check when doing the multiplication, but in our
code we always check for it anyway).
See
https://groups.google.com/forum/#!topic/comp.lang.c/jZbiyuYqjB4
http://stackoverflow.com/questions/4083916/two-arguments-to-calloc

And http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/malloc/malloc.c?view=markup
to check that calloc(nelem, elsize) is implemented as calloc(nelem *
elsize)

I'm also concerned about the change to _PyObject_GC_Malloc(): it now
calls calloc() instead of malloc(): it can definitely be slower.
msg217252 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 10:32
Note to numpy devs: it would be great if some of you followed the
python-dev mailing list (I know it can be quite volume intensive, but
maybe simple filters could help keep the noise down): you guys have
definitely both expertise and real-life applications that could be
very valuable in helping us design the best possible public/private
APIs. It's always great to have downstream experts/end-users!
msg217253 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 10:36
I wrote a short microbenchmark allocating objects using my benchmark.py script.

It looks like the operation "(None,) * N" is slower with calloc-3.patch, but it's unclear how much times slower it is. I don't understand why only this operation has different speed.

Do you have ideas for other benchmarks?

Using the timeit module:

$ ./python.orig -m timeit '(None,) * 10**5'
1000 loops, best of 3: 357 usec per loop
$ ./python.calloc -m timeit '(None,) * 10**5'
1000 loops, best of 3: 698 usec per loop

But with different parameters, the difference is lower:

$ ./python.orig -m timeit -r 20 -n '1000' '(None,) * 10**5'
1000 loops, best of 20: 362 usec per loop
$ ./python.calloc -m timeit -r 20 -n '1000' '(None,) * 10**5'
1000 loops, best of 20: 392 usec per loop


Results of bench_alloc.py:

Common platform:
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
CPU model: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz
Python unicode implementation: PEP 393
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Timer: time.perf_counter
SCM: hg revision=462470859e57+ branch=default date="2014-04-26 19:01 -0400"
Platform: Linux-3.13.8-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug
Bits: int=32, long=64, long long=64, size_t=64, void*=64

Platform of campaign orig:
Timer precision: 42 ns
Date: 2014-04-27 12:27:26
Python version: 3.5.0a0 (default:462470859e57, Apr 27 2014, 11:52:55) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

Platform of campaign calloc:
Timer precision: 45 ns
Date: 2014-04-27 12:29:10
Python version: 3.5.0a0 (default:462470859e57+, Apr 27 2014, 12:04:57) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

-----------------------------------+--------------+---------------
Tests                              |         orig |         calloc
-----------------------------------+--------------+---------------
object()                           |    61 ns (*) |          62 ns
b'A' * 10                          |    55 ns (*) |    51 ns (-7%)
b'A' * 10**3                       |    99 ns (*) |          94 ns
b'A' * 10**6                       |  37.5 us (*) |        36.6 us
'A' * 10                           |    62 ns (*) |    58 ns (-7%)
'A' * 10**3                        |   107 ns (*) |         104 ns
'A' * 10**6                        |    37 us (*) |        36.6 us
'A' * 10**8                        |  16.2 ms (*) |        16.4 ms
decode 10 null bytes from ASCII    |   253 ns (*) |         248 ns
decode 10**3 null bytes from ASCII |   359 ns (*) |         357 ns
decode 10**6 null bytes from ASCII |  78.8 us (*) |        78.7 us
decode 10**8 null bytes from ASCII |  26.2 ms (*) |        25.9 ms
(None,) * 10**0                    |    30 ns (*) |          30 ns
(None,) * 10**1                    |    78 ns (*) |          77 ns
(None,) * 10**2                    |   427 ns (*) |   460 ns (+8%)
(None,) * 10**3                    |   3.5 us (*) |   3.7 us (+6%)
(None,) * 10**4                    |  34.7 us (*) |  37.2 us (+7%)
(None,) * 10**5                    |   357 us (*) |   390 us (+9%)
(None,) * 10**6                    |  3.86 ms (*) | 4.43 ms (+15%)
(None,) * 10**7                    |  50.4 ms (*) |        50.3 ms
(None,) * 10**8                    |   505 ms (*) |         504 ms
([None] * 10)[1:-1]                |   121 ns (*) |         120 ns
([None] * 10**3)[1:-1]             |  3.57 us (*) |        3.57 us
([None] * 10**6)[1:-1]             |  4.61 ms (*) |        4.59 ms
([None] * 10**8)[1:-1]             |   585 ms (*) |         582 ms
-----------------------------------+--------------+---------------
Total                              | 1.19 sec (*) |       1.19 sec
-----------------------------------+--------------+---------------
msg217254 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-27 11:02
> Regarding the *Calloc functions: how about we provide a sane API
> instead of reproducing the cumbersome C API?

Isn't the point of reproducing the C API to allow quickly switching from calloc() to PyObject_Calloc()?
(besides, it seems the OpenBSD guys like the two-argument form :-))
msg217255 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-27 11:05
Just to add another data point, I don't find the calloc() API
cumbersome.
msg217256 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 11:12
It looks like calloc-3.patch is wrong: it modify _PyObject_GC_Malloc() to fill the newly allocated buffer with zeros, but _PyObject_GC_Malloc() is not only called by PyType_GenericAlloc(): it is also used by _PyObject_GC_New() and _PyObject_GC_NewVar(). The patch is maybe a little bit slower because it writes zeros twice.

calloc.patch adds "PyObject* _PyObject_GC_Calloc(size_t);" and doesn't have this issue.
msg217257 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-27 11:26
Actually, I think we have to match the C-API:  For instance, in
Modules/_decimal/_decimal.c:5527 the libmpdec allocators are
set to the Python allocators.

So I'd need to do:

mpd_callocfunc = PyMem_Calloc;


I suppose that's a common use case.
msg217262 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 13:19
> It looks like calloc-3.patch is wrong: it modify _PyObject_GC_Malloc() to fill the newly allocated buffer with zeros, but _PyObject_GC_Malloc() is not only called by PyType_GenericAlloc(): it is also used by _PyObject_GC_New() and _PyObject_GC_NewVar(). The patch is maybe a little bit slower because it writes zeros twice.

Exactly (sorry, I thought you'd already seen that, otherwise I could
have told you!)

> Actually, I think we have to match the C-API:  For instance, in
> Modules/_decimal/_decimal.c:5527 the libmpdec allocators are
> set to the Python allocators.

Hmm, ok then, I didn't know we were plugging our allocators for
external libraries: that's indeed a very good reason to keep the same
prototype.

But I still find this API cumbersome: calloc is exactly like malloc
except for the zeroing, so the prototype could be simpler (a quick
look at Victor's patch shows a lot of calloc(1, n), which is a sign
something's wrong). Maybe it's just me ;-)

Otherwise, a random thought: by changing PyType_GenericAlloc() from
malloc() + memset(0) to calloc(), there could be a subtle side effect:
if a given type relies on the 0-setting (which is documented), and
doesn't do any other work on the allocated area behind the scenes
(think about a mmap-like object), we could lose our capacity to detect
MemoryError, and run into segfaults instead.

Because if a code creates many such objects which basically just do
calloc(), on operating systems with memory overommitting (such as
Linux), the calloc() allocations will pretty much always succeed, but
will segfault when the page is first written to in case of low memory.

I don't think such use cases should be common: I would expect most
types to use tp_alloc(type, 0) and then use an internal additional
pointer for the allocations it needs, or immediately write to the
allocated memory area right after allocation, but that's something to
keep in mind.
msg217274 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 15:43
"And http://www.eglibc.org/cgi-bin/viewvc.cgi/trunk/libc/malloc/malloc.c?view=markup
to check that calloc(nelem, elsize) is implemented as calloc(nelem *
elsize)"

__libc_calloc() starts with a check on integer overflow.
msg217276 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 15:59
> __libc_calloc() starts with a check on integer overflow.

Yes, see my previous message:
"""
AFAICT, the two arguments are purely historical (it was used when
malloc() didn't guarantee suitable alignment, and has the advantage of
performing overflow check when doing the multiplication, but in our
code we always check for it anyway).
"""
msg217282 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 16:31
list: items are allocated in a second memory block. PyList_New() uses memset(0) to set all items to NULL.

tuple: header and items are stored in a single structure (PyTupleObject), in a single memory block. PyTuple_New() fills the items will NULL (so write again null bytes). Something can be optimized here.

dict: header, keys and values are stored in 3 different memory blocks. It may be interesting to use calloc() to allocate keys and values. Initialization of keys and values to NULL uses a dummy loop. I expect that memset(0) would be faster.

Anyway, I expect that all items of builtin containers (tuple, list, dict, etc.) are set to non-NULL values. So the lazy initialization to zeros may be useless for them.

It means that benchmarking builtin containers should not show any speedup. Something else (numpy?) should be used to see an interesting speedup.
msg217283 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 16:38
"Because if a code creates many such objects which basically just do
calloc(), on operating systems with memory overommitting (such as
Linux), the calloc() allocations will pretty much always succeed, but
will segfault when the page is first written to in case of low memory."

Overcommit leads to segmentation fault when there is no more memory, but I don't see how calloc() is worse then malloc()+memset(0). It will crash in both cases, no?

In my experience (embedded device with low memory), programs crash because they don't check the result of malloc() (return NULL on allocation failure), not because of overcommit.
msg217284 - (view) Author: Nathaniel Smith (njs) * Date: 2014-04-27 16:39
@Charles-François: I think your worries about calloc and overcommit are unjustified. First, calloc and malloc+memset actually behave the same way here -- with a large allocation and overcommit enabled, malloc and calloc will both go ahead and return the large allocation, and then the actual out-of-memory (OOM) event won't occur until the memory is accessed. In the malloc+memset case this access will occur immediately after the malloc, during the memset -- but this is still too late for us to detect the malloc failure. Second, OOM does not cause segfaults on any system I know. On Linux it wakes up the OOM killer, which shoots some random (possibly guilty) process in the head. The actual program which triggered the OOM is quite likely to escape unscathed. In practice, the *only* cases where you can get a MemoryError on modern systems are (a) if the user has turned overcommit off, (b) you're on a tiny embedded system that doesn't have overcommit, (c) if you run out of virtual address space. None of these cases are affected by the differences between malloc and calloc.

Regarding the calloc API: it's a wart, but it seems like a pretty unavoidable wart at this point, and the API compatibility argument is strong. I think we should just keep the two argument form and live with it...
msg217291 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 17:29
> @Charles-François: I think your worries about calloc and overcommit are unjustified. First, calloc and malloc+memset actually behave the same way here -- with a large allocation and overcommit enabled, malloc and calloc will both go ahead and return the large allocation, and then the actual out-of-memory (OOM) event won't occur until the memory is accessed. In the malloc+memset case this access will occur immediately after the malloc, during the memset -- but this is still too late for us to detect the malloc failure.

Not really: what you describe only holds for a single object.
But if you allocate let's say 1000 such objects at once:
- in the malloc + memset case, the committed pages are progressively
accessed (i.e. the pages for object N are accessed before the memory
is allocated for object N+1), so they will be counted not only as
committed, but also as active (for example the RSS will increase
gradually): so at some point, even though by default the Linux VM
subsystem is really lenient toward overcommitting, you'll likely have
malloc/mmap return NULL because of this
- in the calloc() case, all the memory is first committed, but not
touched: the kernel will likely happily overcommit all of this. Only
when you start progressively accessing the pages will the OOM kick in.

> Second, OOM does not cause segfaults on any system I know. On Linux it wakes up the OOM killer, which shoots some random (possibly guilty) process in the head. The actual program which triggered the OOM is quite likely to escape unscathed.

Ah, did I say segfault?
Sorry, I of course meant that the process will get nuked by the OOM killer.

> In practice, the *only* cases where you can get a MemoryError on modern systems are (a) if the user has turned overcommit off, (b) you're on a tiny embedded system that doesn't have overcommit, (c) if you run out of virtual address space. None of these cases are affected by the differences between malloc and calloc.

That's a common misconception: provided that the memory allocated is
accessed progressively (see above point), you'll often get ENOMEM,
even with overcommitting:

$ /sbin/sysctl -a | grep overcommit
vm.nr_overcommit_hugepages = 0
vm.overcommit_memory = 0
vm.overcommit_ratio = 50

$ cat /tmp/test.py
l = []

with open('/proc/self/status') as f:
    try:
        for i in range(50000000):
            l.append(i)
    except MemoryError:
        for line in f:
            if 'VmPeak' in line:
                print(line)
        raise

$ python /tmp/test.py
VmPeak:   720460 kB

Traceback (most recent call last):
  File "/tmp/test.py", line 7, in <module>
    l.append(i)
MemoryError

I have a 32-bit machine, but the process definitely has more than
720MB of address space ;-)

If your statement were true, this would mean that it's almost
impossible to get ENOMEM with overcommitting on a 64-bit machine,
which is - fortunately - not true. Just try python -c "[i for i in
range(<large value>)]" on a 64-bit machine, I'll bet you'll get a
MemoryError (ENOMEM).
msg217294 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-27 17:40
> Just try python -c "[i for i in
> range(<large value>)]" on a 64-bit machine, I'll bet you'll get a
> MemoryError (ENOMEM).

Hmm, I get an OOM kill here.
msg217295 - (view) Author: Nathaniel Smith (njs) * Date: 2014-04-27 17:41
On my laptop (x86-64, Linux 3.13, 12 GB RAM):

$ python3 -c "[i for i in range(999999999)]"
zsh: killed     python3 -c "[i for i in range(999999999)]"

$ dmesg | tail -n 2
[404714.401901] Out of memory: Kill process 10752 (python3) score 687 or sacrifice child
[404714.401903] Killed process 10752 (python3) total-vm:17061508kB, anon-rss:10559004kB, file-rss:52kB

And your test.py produces the same result. Are you sure you don't have a ulimit set on address space?
msg217297 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 17:48
> And your test.py produces the same result. Are you sure you don't have a ulimit set on address space?

Yep, I'm sure:
$  ulimit -v
unlimited

It's probably due to the exponential over-allocation used by the array
(to guarantee amortized constant cost).

How about:
python -c "b = bytes('x' * <large>)"
msg217298 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 17:53
Dammit, read:

python -c 'b"x" * (2**48)'
msg217302 - (view) Author: Nathaniel Smith (njs) * Date: 2014-04-27 18:27
Right, python3 -c 'b"x" * (2 ** 48)' does give an instant MemoryError for me. So I was wrong about it being the VM limit indeed.

The documentation on this is terrible! But, if I'm reading this right:
   http://lxr.free-electrons.com/source/mm/util.c#L434
the actual rules are:

overcommit mode 1: allocating a VM range always succeeds.
overcommit mode 2: (Slightly simplified) You can allocate total VM ranges up to (swap + RAM * overcommit_ratio), and overcommit_ratio is 50% by default. So that's a bit odd, but whatever. This is still entirely a limit on VM size.
overcommit mode 0 ("guess", the default): when allocating a VM range, the kernel imagines what would happen if you immediately used all those pages. If that would put you OOM, then we fall back to mode 2 rules. If that would *not* put you OOM, then the allocation unconditionally succeeds.

So yeah, touching pages can affect whether a later malloc returns ENOMEM.

I'm not sure any of this actually matters in the Python case though :-). There's still no reason to go touching pages pre-emptively just in case we might write to them later -- all that does is increase the interpreter's memory footprint, which can't help anything. If people are worried about overcommit, then they should turn off overcommit, not try and disable it on a piece-by-piece basis by trying to get individual programs to memory before they need it.
msg217303 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 18:31
Alright, it bothered me so I wrote a small C testcase (attached),
which calls malloc in a loop, and can call memset upon the allocated
block right after allocation:

$ gcc -o /tmp/test /tmp/test.c; /tmp/test
malloc() returned NULL after 3050MB
$ gcc -DDO_MEMSET -o /tmp/test /tmp/test.c; /tmp/test
malloc() returned NULL after 2130MB

Without memset, the kernel happily allocates until we reach the 3GB
user address space limit.
With memset, it bails out way before.

I don't know what this'll give on 64-bit, but I assume one should get
comparable result.

I would guess that the reason why the Python list allocation fails is
because of the exponential allocation scheme: since memory is
allocated in large chunks before being used, the kernel happily
overallocates.
With a more progressive allocation+usage, it should return ENOMEM at some point.

Anyway, that's probably off-topic!
msg217304 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 18:36
> So yeah, touching pages can affect whether a later malloc returns ENOMEM.
>
> I'm not sure any of this actually matters in the Python case though :-). There's still no reason to go touching pages pre-emptively just in case we might write to them later -- all that does is increase the interpreter's memory footprint, which can't help anything. If people are worried about overcommit, then they should turn off overcommit, not try and disable it on a piece-by-piece basis by trying to get individual programs to memory before they need it.

Absolutely: that's why I'm really in favor of exposing calloc, this
could definitely help many workloads.

Victor, did you run any non-trivial benchmark, like pybench & Co?

As I said, I'm not expecting any improvement, I just want to make sure
there's not hidden regression somewhere (like the one for GC-tracked
objects above).
msg217305 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-27 18:37
> $ gcc -o /tmp/test /tmp/test.c; /tmp/test
> malloc() returned NULL after 3050MB
> $ gcc -DDO_MEMSET -o /tmp/test /tmp/test.c; /tmp/test
> malloc() returned NULL after 2130MB
> 
> Without memset, the kernel happily allocates until we reach the 3GB
> user address space limit.
> With memset, it bails out way before.
> 
> I don't know what this'll give on 64-bit, but I assume one should get
> comparable result.

Both OOM here (3.11.0-20-generic, 64-bit, Ubuntu).
msg217306 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-27 18:49
This is probably offtopic, but I think people who want reliable
MemoryErrors can use limits, e.g. via djb's softlimit (daemontools):

$ softlimit -m 100000000 ./python
Python 3.5.0a0 (default:462470859e57+, Apr 27 2014, 19:34:06)
[GCC 4.7.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> [i for i in range(9999999)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
MemoryError
msg217307 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 19:03
> Both OOM here (3.11.0-20-generic, 64-bit, Ubuntu).

Hm...
What's /proc/sys/vm/overcommit_memory ?
If it's set to 0, then the kernel will always overcommit.

If you set it to 2, normally you'd definitely get ENOMEM (which is IMO
much nicer than getting nuked by the OOM killer, especially because,
like in real life, there's often collateral damage ;-)
msg217308 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 19:07
> Hm...
> What's /proc/sys/vm/overcommit_memory ?
> If it's set to 0, then the kernel will always overcommit.

I meant 1 (damn, I need sleep).
msg217309 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-27 19:09
> Hm...
> What's /proc/sys/vm/overcommit_memory ?
> If it's set to 0, then the kernel will always overcommit.

Ah, indeed.

> If you set it to 2, normally you'd definitely get ENOMEM

You're right, but with weird results:

$ gcc -o /tmp/test test.c; /tmp/test
malloc() returned NULL after 600MB
$ gcc -DDO_MEMSET -o /tmp/test test.c; /tmp/test
malloc() returned NULL after 600MB

(I'm supposed to have gigabytes free?!)
msg217310 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-27 19:15
>> Hm...
>> What's /proc/sys/vm/overcommit_memory ?
>> If it's set to 0, then the kernel will always overcommit.
>
> Ah, indeed.

See above, I mistyped: 0 is the default (which is already quite
optimistic), 1 is always.

>> If you set it to 2, normally you'd definitely get ENOMEM
>
> You're right, but with weird results:
>
> $ gcc -o /tmp/test test.c; /tmp/test
> malloc() returned NULL after 600MB
> $ gcc -DDO_MEMSET -o /tmp/test test.c; /tmp/test
> malloc() returned NULL after 600MB
>
> (I'm supposed to have gigabytes free?!)

The formula is RAM * vm.overcommit_ratio /100 + swap

So if you don't have swap, or a low overcommit_ratio, it could explain
why it returns so early.
Or maybe you have some processes with a lot of mapped-yet-unused
memory (chromium is one of those for example).

Anyway, it's really a mess!
msg217323 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 23:03
I splitted my patch into two parts:

- calloc-4.patch: add new "Calloc" functions including _PyObject_GC_Calloc()
- use_calloc.patch: patch types (bytes, dict, list, set, tuple, etc.) and various modules to use calloc

I reverted my changes on _PyObject_GC_Malloc() and added _PyObject_GC_Calloc(), performance regressions are gone. Creating a large tuple is a little bit (8%) faster. But the real speedup is to build a large bytes strings of null bytes:


$ ./python.orig -m timeit 'bytes(50*1024*1024)'
100 loops, best of 3: 5.7 msec per loop
$ ./python.calloc -m timeit 'bytes(50*1024*1024)'
100000 loops, best of 3: 4.12 usec per loop

On Linux, no memory is allocated, even if you read the bytes content. RSS is almost unchanged.

Ok, now the real use case where it becomes faster: I implemented the same optimization for bytearray.

$ ./python.orig -m timeit 'bytearray(50*1024*1024)'
100 loops, best of 3: 6.33 msec per loop
$ ./python.calloc -m timeit 'bytearray(50*1024*1024)'
100000 loops, best of 3: 4.09 usec per loop

If you overallocate a bytearray and only write a few bytes, the bytes of end of bytearray will not be allocated (at least on Linux).


Result of bench_alloc.py comparing original Python to patched Python (calloc-4.patch + use_calloc.patch).

Common platform:
SCM: hg revision=4b97092aa4bd+ tag=tip branch=default date="2014-04-27 18:02 +0100"
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Python unicode implementation: PEP 393
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
Bits: int=32, long=64, long long=64, size_t=64, void*=64
Timer: time.perf_counter
CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
Platform: Linux-3.13.9-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug

Platform of campaign orig:
Timer precision: 42 ns
Date: 2014-04-28 00:27:19
Python version: 3.5.0a0 (default:4b97092aa4bd, Apr 28 2014, 00:24:03) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

Platform of campaign calloc:
Timer precision: 54 ns
Date: 2014-04-28 00:28:35
Python version: 3.5.0a0 (default:4b97092aa4bd+, Apr 28 2014, 00:25:56) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

-----------------------------------+-------------+--------------
Tests                              |        orig |        calloc
-----------------------------------+-------------+--------------
object()                           |   61 ns (*) |  71 ns (+16%)
b'A' * 10                          |   54 ns (*) |         52 ns
b'A' * 10**3                       |  124 ns (*) | 110 ns (-12%)
b'A' * 10**6                       | 38.4 us (*) |       38.5 us
'A' * 10                           |   59 ns (*) |         62 ns
'A' * 10**3                        |  132 ns (*) | 107 ns (-19%)
'A' * 10**6                        | 38.5 us (*) |       38.5 us
'A' * 10**8                        | 10.3 ms (*) |       10.6 ms
decode 10 null bytes from ASCII    |  264 ns (*) |        263 ns
decode 10**3 null bytes from ASCII |  403 ns (*) |  379 ns (-6%)
decode 10**6 null bytes from ASCII | 80.5 us (*) |       80.5 us
decode 10**8 null bytes from ASCII | 17.7 ms (*) |       17.3 ms
(None,) * 10**0                    |   29 ns (*) |         28 ns
(None,) * 10**1                    |   75 ns (*) |         76 ns
(None,) * 10**2                    |  461 ns (*) |        460 ns
(None,) * 10**3                    |  3.6 us (*) |       3.57 us
(None,) * 10**4                    | 35.7 us (*) |       35.7 us
(None,) * 10**5                    |  364 us (*) |        365 us
(None,) * 10**6                    | 4.12 ms (*) |       4.11 ms
(None,) * 10**7                    | 43.5 ms (*) | 40.3 ms (-7%)
(None,) * 10**8                    |  433 ms (*) |  400 ms (-8%)
([None] * 10)[1:-1]                |  121 ns (*) | 134 ns (+11%)
([None] * 10**3)[1:-1]             | 3.62 us (*) |       3.61 us
([None] * 10**6)[1:-1]             | 4.24 ms (*) |       4.22 ms
([None] * 10**8)[1:-1]             |  440 ms (*) |  402 ms (-9%)
-----------------------------------+-------------+--------------
Total                              |  954 ms (*) |  880 ms (-8%)
-----------------------------------+-------------+--------------
msg217324 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 23:15
bench_alloc2.py: updated benchmark script. I added bytes(n) and bytearray(n) tests and removed the test decoding from ASCII.

Common platform:
Timer: time.perf_counter
Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
Platform: Linux-3.13.9-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug
SCM: hg revision=4b97092aa4bd+ tag=tip branch=default date="2014-04-27 18:02 +0100"
Python unicode implementation: PEP 393
Bits: int=32, long=64, long long=64, size_t=64, void*=64
CFLAGS: -Wno-unused-result -Werror=declaration-after-statement -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
CPU model: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz

Platform of campaign orig:
Date: 2014-04-28 01:11:49
Timer precision: 39 ns
Python version: 3.5.0a0 (default:4b97092aa4bd, Apr 28 2014, 01:02:01) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

Platform of campaign calloc:
Date: 2014-04-28 01:12:29
Timer precision: 44 ns
Python version: 3.5.0a0 (default:4b97092aa4bd+, Apr 28 2014, 01:06:54) [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)]

-----------------------+-------------+----------------
Tests                  |        orig |          calloc
-----------------------+-------------+----------------
object()               |   62 ns (*) |    72 ns (+16%)
b'A' * 10              |   53 ns (*) |           52 ns
b'A' * 10**3           |   96 ns (*) |   110 ns (+15%)
b'A' * 10**6           | 38.5 us (*) |         38.6 us
'A' * 10               |   59 ns (*) |           61 ns
'A' * 10**3            |  105 ns (*) |          108 ns
'A' * 10**6            | 38.6 us (*) |         38.6 us
'A' * 10**8            | 10.3 ms (*) |         10.4 ms
(None,) * 10**0        |   29 ns (*) |           29 ns
(None,) * 10**1        |   75 ns (*) |           76 ns
(None,) * 10**2        |  432 ns (*) |    461 ns (+7%)
(None,) * 10**3        | 3.58 us (*) |          3.6 us
(None,) * 10**4        | 35.8 us (*) |         35.7 us
(None,) * 10**5        |  365 us (*) |          365 us
(None,) * 10**6        |  4.1 ms (*) |         4.13 ms
(None,) * 10**7        | 43.6 ms (*) |   40.3 ms (-8%)
(None,) * 10**8        |  433 ms (*) |    401 ms (-7%)
([None] * 10)[1:-1]    |  122 ns (*) |   134 ns (+10%)
([None] * 10**3)[1:-1] |  3.6 us (*) |         3.62 us
([None] * 10**6)[1:-1] | 4.22 ms (*) |          4.2 ms
([None] * 10**8)[1:-1] |  441 ms (*) |    402 ms (-9%)
bytes(10)              |  137 ns (*) |          136 ns
bytes(10**3)           |  181 ns (*) |    191 ns (+5%)
bytes(10**6)           | 38.7 us (*) |         39.2 us
bytes(10**8)           | 10.3 ms (*) | 4.36 us (-100%)
bytearray(10)          |  138 ns (*) |   153 ns (+11%)
bytearray(10**3)       |  184 ns (*) |   211 ns (+14%)
bytearray(10**6)       | 38.7 us (*) |         39.3 us
bytearray(10**8)       | 10.3 ms (*) | 4.32 us (-100%)
-----------------------+-------------+----------------
Total                  |  957 ms (*) |   862 ms (-10%)
-----------------------+-------------+----------------
msg217325 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-27 23:16
> Common platform:
> Timer: time.perf_counter
> Timer info: namespace(adjustable=False, implementation='clock_gettime(CLOCK_MONOTONIC)', monotonic=True, resolution=1e-09)
> Platform: Linux-3.13.9-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug
                                                               ^^^^^^^^^
Are you sure this is a good platform for performance reports? :)
msg217326 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 23:20
> Are you sure this is a good platform for performance reports? :)

Don't hesitate to rerun my benchmark on more different platforms?
msg217330 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-27 23:34
> Don't hesitate to rerun my benchmark on more different platforms?

Oops, I wanted to write ";-)" not "?".
msg217331 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-04-27 23:35
> Ok, now the real use case where it becomes faster: I implemented the
> same optimization for bytearray.

The real use case I envision is with huge powers of two. If I write:

  x = 2 ** 1000000

then all of x's bytes except the highest one will be zeros. If we map those to /dev/zero, it will be a massive saving for programs using huge powers of two.
msg217333 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-28 00:09
> The real use case I envision is with huge powers of two.

I'm not sure that it's a common use case, but it can be nice to optimize this case if it doesn't make longobject.c more complex. It looks like calloc() becomes interesting for objects larger than 1 MB.
msg217346 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-28 07:51
It looks like Windows supports also lazy initialization of memory pages initialized to zero.

According to my microbenchmark on Linux and Windows, only bytes(n) and bytearray(n) are really faster with use_calloc.patch. Most changes of use_calloc.patch are maybe useless since all bytes are initilized to zero, but just after that they are replaced with new bytes.

Results of bench_alloc2.py on Windows 7: original vs calloc-4.patch+use_calloc.patch:

Common platform:
Timer: time.perf_counter
Python unicode implementation: PEP 393
Bits: int=32, long=32, long long=64, size_t=32, void*=32
Platform: Windows-7-6.1.7601-SP1
CFLAGS: None
Timer info: namespace(adjustable=False, implementation='QueryPerformanceCounter(
)', monotonic=True, resolution=1e-08)

Platform of campaign orig:
SCM: hg revision=4b97092aa4bd branch=default date="2014-04-27 18:02 +0100"
Date: 2014-04-28 09:35:30
Python version: 3.5.0a0 (default, Apr 28 2014, 09:33:30) [MSC v.1600 32 bit (Int
el)]
Timer precision: 4.47 us

Platform of campaign calloc:
SCM: hg revision=4f0aaa8804c6 tag=tip branch=default date="2014-04-28 09:27 +020
0"
Date: 2014-04-28 09:37:37
Python version: 3.5.0a0 (default:4f0aaa8804c6, Apr 28 2014, 09:37:03) [MSC v.160
0 32 bit (Intel)]
Timer precision: 4.44 us

-----------------------+-------------+----------------
Tests                  |        orig |          calloc
-----------------------+-------------+----------------
object()               |  121 ns (*) |   109 ns (-10%)
b'A' * 10              |   77 ns (*) |           79 ns
b'A' * 10**3           |  159 ns (*) |    168 ns (+5%)
b'A' * 10**6           |  428 us (*) |          415 us
'A' * 10               |   87 ns (*) |           89 ns
'A' * 10**3            |  175 ns (*) |          177 ns
'A' * 10**6            |  429 us (*) |    454 us (+6%)
'A' * 10**8            | 48.4 ms (*) |           49 ms
(None,) * 10**0        |   49 ns (*) |           51 ns
(None,) * 10**1        |  115 ns (*) |    99 ns (-14%)
(None,) * 10**2        |  433 ns (*) |          422 ns
(None,) * 10**3        | 3.58 us (*) |         3.57 us
(None,) * 10**4        | 34.9 us (*) |         34.9 us
(None,) * 10**5        |  347 us (*) |          351 us
(None,) * 10**6        | 5.14 ms (*) |   4.85 ms (-6%)
(None,) * 10**7        | 53.2 ms (*) |   50.2 ms (-6%)
(None,) * 10**8        |  563 ms (*) |    515 ms (-9%)
([None] * 10)[1:-1]    |  217 ns (*) |          217 ns
([None] * 10**3)[1:-1] | 3.89 us (*) |         3.92 us
([None] * 10**6)[1:-1] | 5.13 ms (*) |         5.17 ms
([None] * 10**8)[1:-1] |  634 ms (*) |   533 ms (-16%)
bytes(10)              |  193 ns (*) |    206 ns (+7%)
bytes(10**3)           |  266 ns (*) |   296 ns (+12%)
bytes(10**6)           |  414 us (*) |  3.89 us (-99%)
bytes(10**8)           | 44.2 ms (*) | 4.56 us (-100%)
bytearray(10)          |  229 ns (*) |    243 ns (+6%)
bytearray(10**3)       |  301 ns (*) |   330 ns (+10%)
bytearray(10**6)       |  421 us (*) |  3.89 us (-99%)
bytearray(10**8)       | 44.4 ms (*) | 4.56 us (-100%)
-----------------------+-------------+----------------
Total                  | 1.4 sec (*) | 1.16 sec (-17%)
-----------------------+-------------+----------------
msg217348 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-28 08:31
Changes on the pickle module don't look like an interesting optimization. It even looks slower.

$ python perf.py -b fastpickle,fastunpickle,pickle,pickle_dict,pickle_list,slowpickle,slowunpickle,unpickle ../default/python.orig ../default/python.calloc
...

Report on Linux selma 3.13.9-200.fc20.x86_64 #1 SMP Fri Apr 4 12:13:05 UTC 2014 x86_64 x86_64
Total CPU cores: 4

### fastpickle ###
Min: 0.364510 -> 0.374144: 1.03x slower
Avg: 0.367882 -> 0.377714: 1.03x slower
Significant (t=-11.54)
Stddev: 0.00493 -> 0.00347: 1.4209x smaller

The following not significant results are hidden, use -v to show them:
fastunpickle, pickle_dict, pickle_list.
msg217349 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-28 09:01
Patch version 5. This patch is ready for a review.

Summary of calloc-5.patch:

- add the following functions:

  * void* PyMem_RawCalloc(size_t nelem, size_t elsize)
  * void* PyMem_Calloc(size_t nelem, size_t elsize)
  * void* PyObject_Calloc(size_t nelem, size_t elsize)
  * PyObject* _PyObject_GC_Calloc(size_t basicsize)

- add "void* calloc(void *ctx, size_t nelem, size_t elsize)" field to the PyMemAllocator structure
- optimize bytes(n) and bytearray(n) to allocate objects using calloc() instead of malloc()
- update tracemalloc to trace also calloc()
- document new functions and add unit tests for the calloc "hook" (in _testcapi)


Changes between versions 4 and 5:

- revert all changes except bytes(n) and bytearray(n) of use_calloc.patch: they were useless according to benchmarks
- _PyObject_GC_Calloc() now takes a single parameter
- add versionadded and versionchanged fields in the documentation


According to benchmarks, calloc() is only useful for large allocation (1 MB?) if only a part of the memory block is modified (to non-zero bytes) just after the allocation. Untouched memory pages don't use physical memory and don't use RSS memory pages, but it is possible to read their content (null bytes). Using calloc() instead of malloc()+memset(0) doens't look to be faster (it may be a little bit slower) if all bytes are set just after the allocation.

I chose to only use one parameter for _PyObject_GC_Calloc() because this function is used to allocate Python objects. A structure of a Python object must start with PyObject_HEAD or PyObject_VAR_HEAD and so the total size of an object cannot be expressed as NELEM * ELEMSIZE.

I have no use case for _PyObject_GC_Calloc(), but it makes sense to use it to allocate a large Python object tracked by the GC and using a single memory block for the Python header + data.

PyObject_Calloc() simply use memset(0) for small objects (<= 512 bytes). It delegates the allocation to PyMem_RawCalloc(), and so indirectly to calloc(), for larger objects.

Note: use_calloc.patch is no more needed, I merged the two patches since only bytes(n) and bytearray(n) now use calloc().
msg217351 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-28 09:15
Demo of calloc-5.patch on Linux. Thanks to calloc(), bytes(50 * 1024 * 1024) doesn't allocate memory for null bytes and so the RSS memory is unchanged (+148 kB, not +50 MB), but tracemalloc says that 50 MB were allocated.

$ ./python -X tracemalloc
Python 3.5.0a0 (default:4b97092aa4bd+, Apr 28 2014, 10:40:53) 
[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os, tracemalloc
>>> os.system("grep RSS /proc/%s/status" % os.getpid())
VmRSS:	   10736 kB
0
>>> before = tracemalloc.get_traced_memory()[0]
>>> large = bytes(50 * 1024 * 1024)
>>> import sys
>>> sys.getsizeof(large) / 1024.
51200.0478515625
>>> (tracemalloc.get_traced_memory()[0] - before) / 1024.
51198.1962890625
>>> os.system("grep RSS /proc/%s/status" % os.getpid())
VmRSS:	   10884 kB
0
msg217357 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-28 10:43
With the latest patch the decimal benchmark with a lot of small
allocations is consistently 2% slower. Large factorials (where
the operands are initialized to zero for the number-theoretic
transform) have the same performance with and without the patch.

It would be interesting to see some NumPy benchmarks (Nathaniel?).
msg217375 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-28 14:09
> With the latest patch the decimal benchmark with a lot of small
> allocations is consistently 2% slower.

Does your benchmark use bytes(int) or bytearray(int)? If not, I guess that your benchmark is not reliable because only these two functions are changed by calloc-5.patch, except if there is a bug in my patch.
msg217380 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-28 15:52
Hmm, obmalloc.c changed as well, so already the gcc optimizer can take
different paths and produce different results.

Also I did set mpd_callocfunc to PyMem_Calloc(). 2% slowdown is far
from being a tragic result, so I guess we can ignore that.

The bytes() speedup is very nice. Allocations that took one second
are practically instant now.
msg217382 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-28 16:31
> Also I did set mpd_callocfunc to PyMem_Calloc(). 2% slowdown is far
> from being a tragic result, so I guess we can ignore that.

Agreed.

> The bytes() speedup is very nice. Allocations that took one second
> are practically instant now.

Indeed.
Victor, thanks for the great work!
msg217423 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-28 21:16
> Hmm, obmalloc.c changed as well, so already the gcc optimizer can take
> different paths and produce different results.

If decimal depends on allocator performances, you should maybe try to
implement a freelist.

> Also I did set mpd_callocfunc to PyMem_Calloc().

I don't understand. 2% slowdown is when you use calloc? Do you have the
same speed if you don't use calloc? According to my benchmarks, calloc is
slower if some bytes are modified later.

> The bytes() speedup is very nice. Allocations that took one second
> are practically instant now.

Is it really useful? Who need bytes(10**8) object?

Faster creation of bytearray(int) may be useful in real applications. I
really like bytearray and memoryview to avoid memory copies.
msg217436 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-28 22:48
The order of the nelem/elsize matters for readability. Otherwise it is
not intuitive what happens after the jump to redirect in _PyObject_Alloc().

Why would you assert that 'nelem' is one?
msg217445 - (view) Author: Nathaniel Smith (njs) * Date: 2014-04-28 23:33
> It would be interesting to see some NumPy benchmarks (Nathaniel?).

What is it you want to see? NumPy already uses calloc; we benchmarked it when we added it and it made a huge difference to various realistic workloads :-). What NumPy gets out of this isn't calloc, it's access to tracemalloc.
msg217549 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-29 20:59
Patch version 6:

- I renamed "int zero" parameter to "int use_calloc" and move the new parameter at the first position to avoid confusion with nelem. For example, _PyObject_Alloc(ctx, 1, nbytes, 0) becomes _PyObject_Alloc(0, ctx, 1, nbytes). It also more logical to put it in the first position. In bytesobject.c, I leaved it at the parameter at the end since its meaning is different (fill bytes with zero or not) IMO.

- I removed my hack (premature optimization) "assert(nelem == 1); ... malloc(elsize);" and replaced it with a less surprising "... malloc(nelem * elsize);"

Stefan & Charles-François: I hope that the patch looks better to you.
msg217553 - (view) Author: Charles-François Natali (neologix) * (Python committer) Date: 2014-04-29 21:14
LGTM!
msg217594 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-30 10:02
@Stefan: Can you please review calloc-6.patch? Charles-François wrote that the patch looks good, but for such critical operation (memory allocation), I would prefer a second review ;)
msg217617 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-30 12:58
Victor, sure, maybe not right away.  If you prefer to commit very soon,
I promise to do a post commit review.
msg217619 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-04-30 13:23
>  If you prefer to commit very soon,
> I promise to do a post commit review.

There is no need to hurry.
msg217785 - (view) Author: Roundup Robot (python-dev) Date: 2014-05-02 20:31
New changeset 5b0fda8f5718 by Victor Stinner in branch 'default':
Issue #21233: Add new C functions: PyMem_RawCalloc(), PyMem_Calloc(),
http://hg.python.org/cpython/rev/5b0fda8f5718
msg217786 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-05-02 20:35
> There is no need to hurry.

I changed my mind :-p It should be easier for numpy to test the development version of Python.

Let's wait for buildbots.
msg217794 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-05-02 21:13
Antoine Pitrou wrote:
>  The real use case I envision is with huge powers of two. If I write:
> x = 2 ** 1000000

I created the issue #21419 for this idea.
msg217797 - (view) Author: Roundup Robot (python-dev) Date: 2014-05-02 21:26
New changeset 62438d1b11c7 by Victor Stinner in branch 'default':
Issue #21233: Oops, Fix _PyObject_Alloc(): initialize nbytes before going to
http://hg.python.org/cpython/rev/62438d1b11c7
msg217826 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-05-03 19:29
I did a post-commit review.  A couple of things:


1) I think Victor and I have a different view of the calloc() parameters.

      calloc(size_t nmemb, size_t size)

   If a memory region of bytes is allocated, IMO 'nbytes' should be in the
   place of 'nmemb' and '1' should be in the place of 'size'. That is,
   "allocate nbytes elements of size 1":

      calloc(nbytes, 1)


   In the commit the parameters are reversed in many places, which confuses
   me quite a bit, since it means "allocate one element of size nbytes".

      calloc(1, nbytes)


2) I'm not happy with the refactoring in bytearray_init(). I think it would
   be safer to make focused minimal changes in PyByteArray_Resize() instead.
   In fact, there is a behavior change which isn't correct:

    Before:
    =======
        >>> x = bytearray(0)
        >>> m = memoryview(x)
        >>> x.__init__(10)
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        BufferError: Existing exports of data: object cannot be re-sized

     Now:
     ====
        >>> x = bytearray(0)
        >>> m = memoryview(x)
        >>> x.__init__(10)
        >>> x[0]
        0
        >>> m[0]
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
        IndexError: index out of bounds

3) Somewhat similarly, I wonder if it was necessary to refactor
   PyBytes_FromStringAndSize(). I find the new version more difficult
   to understand.


4) _PyObject_Alloc(): assert(nelem <= PY_SSIZE_T_MAX / elsize) can be called
   with elsize = 0.
msg217829 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-05-03 21:39
I forgot one thing:

5) If WITH_VALGRIND is defined, nbytes is uninitialized in _PyObject_Alloc().
msg217831 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-05-03 21:57
Another thing:

6) We need some kind of prominent documentation that existing
   programs need to be changed:

Python 3.5.0a0 (default:62438d1b11c7+, May  3 2014, 23:35:03) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import failmalloc
>>> failmalloc.enable()
>>> bytes(1)
Segmentation fault (core dumped)
msg217832 - (view) Author: Nathaniel Smith (njs) * Date: 2014-05-03 22:01
A simple solution would be to change the name of the struct, so that non-updated libraries will get a compile error instead of a runtime crash.
msg217838 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-05-03 22:49
> 6) We need some kind of prominent documentation that existing
>    programs need to be changed:

My final commit includes an addition to What's New in Python 3.5 doc,
including a notice in the porting section. It is not enough?

Even if the API is public, the PyMemAllocator thing is low level. It's not
part of the stable ABI. Except failmalloc, I don't know any user. I don't
expect a lot of complain and it's easy to port the code.
msg217839 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-05-03 22:51
> 5) If WITH_VALGRIND is defined, nbytes is uninitialized in
_PyObject_Alloc().

Did you see my second commit? It's nlt already fixed?
msg217840 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-05-03 22:59
> > 5) If WITH_VALGRIND is defined, nbytes is uninitialized in
> _PyObject_Alloc().
> 
> Did you see my second commit? It's nlt already fixed?

I don't think so, I have revision 5d076506b3f5 here.
msg217841 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-05-03 23:00
>    "allocate nbytes elements of size 1"

PyObject_Malloc(100) asks to allocate one object of 100 bytes.

For PyMem_Malloc() and PyMem_RawMalloc(), it's more difficult to guess, but
IMO it's sane to bet that a single memory block of size bytes is requested.

I consider that char data[100] is a object of 100 bytes, but you call it
100 object of 1 byte.

I don't think that using nelem or elsize matters in practice.
msg217844 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-05-03 23:25
STINNER Victor <report@bugs.python.org> wrote:
> PyObject_Malloc(100) asks to allocate one object of 100 bytes.

Okay, then let's please call it:

_PyObject_Calloc(void *ctx, size_t nobjs, size_t objsize)

_PyObject_Alloc(int use_calloc, void *ctx, size_t nobjs, size_t objsize)
msg217866 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-05-04 11:12
STINNER Victor <report@bugs.python.org> wrote:
> My final commit includes an addition to What's New in Python 3.5 doc,
> including a notice in the porting section. It is not enough?

I'm not sure: The usual case with ABI changes is that extensions may segfault
if they are *not* recompiled [1].  In that case documenting it in What's New is
standard procedure.

Here the extension *is* recompiled and still segfaults.

> Even if the API is public, the PyMemAllocator thing is low level. It's not
> part of the stable ABI. Except failmalloc, I don't know any user. I don't
> expect a lot of complain and it's easy to port the code.

Perhaps it's worth asking on python-dev. Nathaniel's suggestion isn't bad
either (e.g. name it PyMemAllocatorEx).

[1] I was told on python-dev that many people in fact do not recompile.
msg217972 - (view) Author: Roundup Robot (python-dev) Date: 2014-05-06 09:32
New changeset 358a12f4d4bc by Victor Stinner in branch 'default':
Issue #21233: Fix _PyObject_Alloc() when compiled with WITH_VALGRIND defined
http://hg.python.org/cpython/rev/358a12f4d4bc
msg219627 - (view) Author: Roundup Robot (python-dev) Date: 2014-06-02 19:57
New changeset 6374c2d957a9 by Victor Stinner in branch 'default':
Issue #21233: Rename the C structure "PyMemAllocator" to "PyMemAllocatorEx" to
http://hg.python.org/cpython/rev/6374c2d957a9
msg219628 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-06-02 20:13
> I'm not sure: The usual case with ABI changes is that extensions may segfault if they are *not* recompiled [1].

Ok, I renamed the structure PyMemAllocator to PyMemAllocatorEx, so the compilation fails because PyMemAllocator name is not defined. Modules compiled for Python 3.4 will crash on Python 3.5 if they are not recompiled, but I hope that you recompile your modules when you don't use the stable ABI.

Using PyMemAllocator is now more complex because it depends on the Python version. See for example the patch for pyfailmalloc:
https://bitbucket.org/haypo/pyfailmalloc/commits/9db92f423ac5f060d6ff499ee4bb74ebc0cf4761

Using the C preprocessor, it's possible to limit the changes.
msg219630 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-06-02 20:16
"Okay, then let's please call it:
_PyObject_Calloc(void *ctx, size_t nobjs, size_t objsize)
_PyObject_Alloc(int use_calloc, void *ctx, size_t nobjs, size_t objsize)"

"void * PyMem_RawCalloc(size_t nelem, size_t elsize);" prototype comes from the POSIX standad:
http://pubs.opengroup.org/onlinepubs/009695399/functions/calloc.html

I'm don't want to change the prototype in Python. Extract of Python documentation:

.. c:function:: void* PyMem_RawCalloc(size_t nelem, size_t elsize)

   Allocates *nelem* elements each whose size in bytes is *elsize* (...)
msg219631 - (view) Author: Roundup Robot (python-dev) Date: 2014-06-02 20:23
New changeset dff6b4b61cac by Victor Stinner in branch 'default':
Issue #21233: Revert bytearray(int) optimization using calloc()
http://hg.python.org/cpython/rev/dff6b4b61cac
msg219634 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-06-02 20:28
"2) I'm not happy with the refactoring in bytearray_init(). (...)

3) Somewhat similarly, I wonder if it was necessary to refactor
   PyBytes_FromStringAndSize(). (...)"

Ok, I reverted the change on bytearray(int) and opened the issue #21644 to discuss these two optimizations.
msg219635 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-06-02 20:29
I reread the issue. I hope that I now addressed all issues. The remaining issue, bytearray(int) is now tracked by the new issue #21644.
History
Date User Action Args
2014-06-02 20:29:45hayposetstatus: open -> closed
resolution: fixed
messages: + msg219635
2014-06-02 20:28:08hayposetmessages: + msg219634
2014-06-02 20:23:26python-devsetmessages: + msg219631
2014-06-02 20:16:21hayposetmessages: + msg219630
2014-06-02 20:13:25hayposetmessages: + msg219628
2014-06-02 19:57:40python-devsetmessages: + msg219627
2014-05-06 09:32:48python-devsetmessages: + msg217972
2014-05-04 11:12:04skrahsetmessages: + msg217866
2014-05-03 23:25:44skrahsetmessages: + msg217844
2014-05-03 23:00:51hayposetmessages: + msg217841
2014-05-03 22:59:28skrahsetmessages: + msg217840
2014-05-03 22:51:32hayposetmessages: + msg217839
2014-05-03 22:49:18hayposetmessages: + msg217838
2014-05-03 22:01:03njssetmessages: + msg217832
2014-05-03 21:57:18skrahsetmessages: + msg217831
2014-05-03 21:39:14skrahsetmessages: + msg217829
2014-05-03 19:29:59skrahsetmessages: + msg217826
2014-05-02 21:26:17python-devsetmessages: + msg217797
2014-05-02 21:13:20hayposetmessages: + msg217794
2014-05-02 20:35:08hayposetmessages: + msg217786
2014-05-02 20:31:29python-devsetnosy: + python-dev
messages: + msg217785
2014-04-30 13:23:43hayposetmessages: + msg217619
2014-04-30 12:58:52skrahsetmessages: + msg217617
2014-04-30 10:02:30hayposetmessages: + msg217594
2014-04-29 21:14:30neologixsetmessages: + msg217553
2014-04-29 20:59:54hayposetfiles: + calloc-6.patch

messages: + msg217549
2014-04-28 23:33:05njssetmessages: + msg217445
2014-04-28 22:48:12skrahsetmessages: + msg217436
2014-04-28 21:16:01hayposetmessages: + msg217423
2014-04-28 16:31:59neologixsetmessages: + msg217382
2014-04-28 15:52:17skrahsetmessages: + msg217380
2014-04-28 14:09:39hayposetmessages: + msg217375
2014-04-28 10:43:19skrahsetmessages: + msg217357
2014-04-28 09:15:06hayposetmessages: + msg217351
2014-04-28 09:01:14hayposetfiles: + calloc-5.patch

messages: + msg217349
2014-04-28 08:31:41hayposetmessages: + msg217348
2014-04-28 07:51:02hayposetmessages: + msg217346
2014-04-28 00:09:33hayposetmessages: + msg217333
2014-04-27 23:35:24pitrousetmessages: + msg217331
2014-04-27 23:34:41hayposetmessages: + msg217330
2014-04-27 23:20:04hayposetmessages: + msg217326
2014-04-27 23:16:51pitrousetmessages: + msg217325
2014-04-27 23:15:48hayposetfiles: + bench_alloc2.py

messages: + msg217324
2014-04-27 23:03:40hayposetfiles: + use_calloc.patch
2014-04-27 23:03:31hayposetfiles: + calloc-4.patch

messages: + msg217323
2014-04-27 19:15:06neologixsetmessages: + msg217310
2014-04-27 19:09:37pitrousetmessages: + msg217309
2014-04-27 19:07:37neologixsetmessages: + msg217308
2014-04-27 19:03:45neologixsetmessages: + msg217307
2014-04-27 18:49:35skrahsetmessages: + msg217306
2014-04-27 18:37:28pitrousetmessages: + msg217305
2014-04-27 18:36:51neologixsetmessages: + msg217304
2014-04-27 18:31:50neologixsetfiles: + test.c

messages: + msg217303
2014-04-27 18:27:31njssetmessages: + msg217302
2014-04-27 17:53:22neologixsetmessages: + msg217298
2014-04-27 17:48:44neologixsetmessages: + msg217297
2014-04-27 17:41:49njssetmessages: + msg217295
2014-04-27 17:40:15pitrousetmessages: + msg217294
2014-04-27 17:29:04neologixsetmessages: + msg217291
2014-04-27 16:39:10njssetmessages: + msg217284
2014-04-27 16:38:51hayposetmessages: + msg217283
2014-04-27 16:31:56hayposetmessages: + msg217282
2014-04-27 15:59:55neologixsetmessages: + msg217276
2014-04-27 15:43:01hayposetmessages: + msg217274
2014-04-27 13:19:03neologixsetmessages: + msg217262
2014-04-27 11:26:55skrahsetmessages: + msg217257
2014-04-27 11:12:44hayposetmessages: + msg217256
2014-04-27 11:05:55skrahsetmessages: + msg217255
2014-04-27 11:02:55pitrousetmessages: + msg217254
2014-04-27 10:36:05hayposetfiles: + bench_alloc.py

messages: + msg217253
2014-04-27 10:32:47neologixsetmessages: + msg217252
2014-04-27 10:21:00neologixsetmessages: + msg217251
2014-04-27 09:51:46hayposetmessages: + msg217246
2014-04-27 08:30:36neologixsetmessages: + msg217242
2014-04-27 00:05:50hayposetmessages: + msg217228
2014-04-17 11:35:08jtaylorsetmessages: + msg216686
2014-04-17 10:39:32josh.rsetmessages: + msg216682
2014-04-17 10:35:42josh.rsetmessages: + msg216681
2014-04-17 08:04:36jtaylorsetnosy: + jtaylor
messages: + msg216671
2014-04-17 07:19:26neologixsetmessages: + msg216668
2014-04-16 19:48:02hayposetfiles: + calloc-3.patch

messages: + msg216567
2014-04-16 17:47:13pitrousetmessages: + msg216515
2014-04-16 09:54:37skrahsetmessages: + msg216455
2014-04-16 08:06:13hayposetmessages: + msg216452
2014-04-16 08:04:31hayposetmessages: + msg216451
2014-04-16 07:18:37neologixsetmessages: + msg216444
2014-04-16 05:34:58pitrousetmessages: + msg216433
2014-04-16 04:21:26hayposetfiles: + calloc-2.patch

messages: + msg216431
2014-04-16 02:49:46hayposetmessages: + msg216425
2014-04-16 02:40:56hayposetmessages: + msg216422
2014-04-15 22:20:09josh.rsetmessages: + msg216404
2014-04-15 22:17:19josh.rsetmessages: + msg216403
2014-04-15 22:05:23josh.rsetmessages: + msg216399
2014-04-15 21:39:08pitrousetmessages: + msg216394
2014-04-15 21:30:03josh.rsetnosy: + josh.r
2014-04-15 21:28:08hayposetnosy: + pitrou, neologix
2014-04-15 21:27:57hayposetfiles: + calloc.patch
keywords: + patch
messages: + msg216390
2014-04-15 15:37:10eric.araujosetnosy: + haypo
2014-04-15 09:41:10skrahsetnosy: + skrah
2014-04-15 08:56:01njscreate