classification
Title: API for setting the memory allocator used by Python
Type: enhancement Stage:
Components: Interpreter Core Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Rhamphoryncus, amaury.forgeotdarc, barry, gregory.p.smith, haypo, jcea, jlaurila, jszakmeister, kristjan.jonsson, ncoghlan, neilo, pitrou, pjmcnerney, python-dev, rhettinger, tlesher, trent
Priority: normal Keywords: patch

Created on 2008-07-09 19:48 by jlaurila, last changed 2013-07-07 17:36 by kristjan.jonsson. This issue is now closed.

Files
File name Uploaded Description Edit
pymem.h kristjan.jonsson, 2013-06-03 09:37 locally patched version of pymem.h
ccpmem.h kristjan.jonsson, 2013-06-03 09:40
Capture.JPG kristjan.jonsson, 2013-06-07 09:33 Profiling email
py_setallocators-filename.patch haypo, 2013-06-10 23:05 review
pybench.txt haypo, 2013-06-12 14:48
benchmarks.txt haypo, 2013-06-12 14:48
py_setallocators-9.patch haypo, 2013-07-02 23:01 review
Messages (45)
msg69482 - (view) Author: Jukka Laurila (jlaurila) Date: 2008-07-09 19:48
Currently Python always uses the C library malloc/realloc/free as the
underlying mechanism for requesting memory from the OS, but especially
on memory-limited platforms it is often desirable to be able to override
the allocator and to redirect all Python's allocations to use a special
heap. This will make it possible to free memory back to the operating
system without restarting the process, and to reduce fragmentation by
separating Python's allocations from the rest of the program.

The proposal is to make it possible to set the allocator used by the
Python interpreter by calling the following function before Py_Initialize():

void Py_SetAllocator(void* (*alloc)(size_t), void* (*realloc)(void*,
size_t), void (*free)(void*))

Direct function calls to malloc/realloc/free in obmalloc.c must be
replaced with calls through the function pointers set through this
function. By default these would of course point to the C stdlib
malloc/realloc/free.
msg69483 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2008-07-09 19:55
Is registering pointers to functions really necessary, or would defining
macros work as well? From a performance perspective I would like to
avoid having a pointer indirection step every time malloc/realloc/free
is called.

I guess my question becomes, Jukka, is this more for alternative
implementations of Python where changes to source are already expected,
or for apps that embed Python where a change of malloc/realloc/free
varies from app to app that dynamically loads Python?
msg69484 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-07-09 20:06
How would this allow you to free all memory?  The interpreter will still
reference it, so you'd have to have called Py_Finalize already, and
promise not to call Py_Initialize afterwords.  This further supposes the
process will live a long time after killing off the interpreter, but in
that case I recommend putting python in a child process instead.
msg69494 - (view) Author: Jukka Laurila (jlaurila) Date: 2008-07-10 08:05
Brett, the ability to define the allocator dynamically at runtime could
be a compile time option, turned on by default only on small memory
platforms. On most platforms you can live with plain old malloc and may
want to avoid the indirection. If no other platform is interested in
this, we can just make it a Symbian-specific extension but I wanted to
see if there's general interest in this.

The application would control the lifecycle of the Python heap, and this
seemed like the most natural way for the application to tell the
interpreter which heap instance to use.

Adam, the cleanup would work by freeing the entire heap used by Python
after calling Py_Finalize. In the old PyS60 code we made Python 2.2.2
clean itself completely by freeing the Python-specific heap and making
sure all pointers to heap-allocated items are properly reinitialized.

Yes, there are various static pointers that are initially set to NULL,
initialized to point at things on the heap and not reset to NULL at
Py_Finalize, and these are currently an obstacle to calling
Py_Initialize again. I'm considering submitting a separate ticket about
that since it seems like the ability to free the heap combined with the
ability to reinitialize the static pointers could together make full
cleanup possible.
msg69497 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2008-07-10 09:59
Given where we are in the release cycle, I've bumped the target releases
to 2.7/3.1. So Symbian are probably going to have to do something
port-specific anyway in order to get 2.6/3.0 up and running.

And in terms of hooking into this kind of thing, some simple macros that
can be overriden in pyport.h (as Brett suggested) may be a better idea
than baking any specific approach into the core interpreter.
msg69499 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2008-07-10 10:12
I think it is reasonable to get a macro definition change into 2.6.
The OP's request is essential for his application (running Python
on Nokia phones) and it would be a loss to wait two years for this.
Also, his request for a macro will enable another important piece
of functionality -- allowing a build to intercept and instrument all
calls to the memory allocator.

Barry, can you rule on whether to keep this open for consideration in 
2.6.   It seems daft to postpone this discussion indefinitely.  If we 
can agree to a simple, non-invasive solution while there is still yet 
another beta, then it makes sense to proceed.
msg69511 - (view) Author: Adam Olsen (Rhamphoryncus) Date: 2008-07-10 16:57
Basically you just want to kick the malloc implementation into doing
some housekeeping, freeing its caches?  I'm kinda surprised you don't
add the hook directly to your libc's malloc.

IMO, there's no use-case for this until Py_Finalize can completely tear
down the interpreter, which requires a lot of special work (killing(!)
daemon threads, unloading C modules, etc), and nobody intends to do that
at this point.

The practical alternative, as I said, is to run python in a subprocess.
 Let the OS clean up after us.
msg78995 - (view) Author: Neil Richardson (neilo) Date: 2009-01-03 19:55
I'll be in agreement here. I integrated Python into a game engine not 
too long ago, and had to a do a fair chunk of work to isolate Python 
into it's own heap - given that fragmentation on low memory systems can 
be a bit of a killer. Would also make future upgrades a heck of a lot 
easier too, as there'd be no need to do a search for all references and 
carefully replace them all.
msg79309 - (view) Author: Jukka Laurila (jlaurila) Date: 2009-01-07 09:18
Brett is right. Macroing the memory allocator is a better choice than
forcing indirection on all platforms. We did this on Python for S60,
using the macros PyCore_{MALLOC,REALLOC,FREE}_FUNC for interpreter's
allocations, and then redirected those to a mechanism that allows to set
the allocator at runtime. 

Sorry we don't have a clean patch at present for this change only, but
in case anyone's interested the full source is at
https://garage.maemo.org/frs/?group_id=854
msg91957 - (view) Author: PJ McNerney (pjmcnerney) Date: 2009-08-25 19:33
Has the ability to set the memory allocator been added to Python 2.7/3.1?

Thanks,
PJ
msg142981 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2011-08-25 17:16
All this needs is a patch.
Note that there are some places where we call malloc()/free() without going through our abstraction API. This is not in allocation-heavy paths, though.
msg183587 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-03-06 12:32
I attached a patch that I wrote for Wyplay: py_setallocators.patch. The patch adds two functions:

PyAPI_FUNC(int) Py_GetAllocators(
    char api,
    void* (**malloc_p) (size_t),
    void* (**realloc_p) (void*, size_t),
    void (**free_p) (void*)
    );

PyAPI_FUNC(int) Py_SetAllocators(
    char api,
    void* (*malloc) (size_t),
    void* (*realloc) (void*, size_t),
    void (*free) (void*)
    );

Where api is one of these values:

 - PY_ALLOC_SYSTEM_API: the system API (malloc, realloc, free)
 - PY_ALLOC_MEM_API: the PyMem_Malloc() API
 - PY_ALLOC_OBJECT_API: the PyObject_Malloc() API

These functions are used by the pytracemalloc project to hook PyMem_Malloc() and PyObject_Malloc() API. pytracemalloc traces all Python memory allocations to compute statistics per Python file.
https://pypi.python.org/pypi/pytracemalloc

Wyplay is also using Py_SetAllocators() internally to replace completly system allocators *before* Python is started. We have another private patch on Python adding a function. This function sets its own memory allocators, it is called before the start of Python thanks to an "__attribute__((constructor))" attribute.

--

If you use Py_SetAllocators() to replace completly a memory allocator (any memory allocation API), you have to do it before the first Python memory allocation (before Py_Main()) *or* your memory allocator must be able to recognize if a pointer was not allocated by him and pass the operation (realloc or free) to the previous memory allocator.

For example, PyObject_Free() is able to recognize that a pointer is part of its memory pool, or fallback to the system allocator (extract of the original code):

    if (Py_ADDRESS_IN_RANGE(p, pool)) {
        ...
        return;
    }
    free(p);

--

If you use Py_SetAllocators() to hook memory allocators (do something before and/or after calling the previous function, *without* touching the pointer nor the size), you can do it anytime.

--

I didn't run a benchmark yet to measure the overhead of the patch on Python performances.

New functions are not documented nor tested yet. If we want to test these new functions, we can write a simple hook tracing calls to the memory allocators and call the memory allocator.
msg183590 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-03-06 12:38
To be exhaustive, another patch should be developed to replace all calls for malloc/realloc/free by PyMem_Malloc/PyMem_Realloc/PyMem_Free. PyObject_Malloc() is still using mmap() or malloc() internally for example.

Other examples of functions calling malloc/realloc/free directly: _PySequence_BytesToCharpArray(), block_new() (of pyarena.c), find_key() (of thread.c), PyInterpreterState_New(), win32_wchdir(), posix_getcwd(), Py_Main(), etc.
msg183591 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-03-06 13:41
Some customizable memory allocators I know have an extra parameter "void *opaque" that is passed to all functions:

- in zlib: zalloc and zfree: http://www.zlib.net/manual.html#Usage
- same thing for bz2.
- lzma's ISzAlloc: http://www.asawicki.info/news_1368_lzma_sdk_-_how_to_use.html
- Oracle's OCI: http://docs.oracle.com/cd/B10501_01/appdev.920/a96584/oci15re4.htm

OTOH, expat, libxml, libmpdec don't have this extra parameter.
msg183947 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-03-11 10:19
At ccp we have something similar.  We are embedding python in the UnrealEngine on the PS3 and need to get everything through their allocators.  For the purpose of flexibility, we added an api similar to the OPs, but more flexible:

/* Support for custom allocators */
typedef void *(*PyCCP_Malloc_t)(size_t size, void *arg, const char *file, int line, const char *msg);
typedef void *(*PyCCP_Realloc_t)(void *ptr, size_t size, void *arg, const char *file, int line, const char *msg);
typedef void (*PyCCP_Free_t)(void *ptr, void *arg, const char *file, int line, const char *msg);
typedef size_t (*PyCCP_Msize_t)(void *ptr, void *arg);
typedef struct PyCCP_CustomAllocator_t
{
    PyCCP_Malloc_t  pMalloc;
    PyCCP_Realloc_t pRealloc;
    PyCCP_Free_t    pFree;
    PyCCP_Msize_t   pMsize;    /* can be NULL, or return -1 if no size info is avail. */
    void            *arg;      /* opaque argument for the functions */
} PyCCP_CustomAllocator_t;

/* To set an allocator!  use 0 for the regular allocator, 1 for the block allocator.
 * pass a null pointer to reset to internal default
 */
PyAPI_FUNC(void) PyCCP_SetAllocator(int which, const PyCCP_CustomAllocator_t *);

For a module to install itself as a "hook" at runtime, this approach can be extended by querying the current allocator, so that such a hook can the delegate the previous calls.

The "block" allocator here, is intended as the underlying allocator to be used by obmalloc.c.  Depending on platforms, this can then allocate aligned virtual memory directly, which is more efficient than layering that on-top of a malloc-like allocator.

There are areas in cPython that use malloc() directly.  Those are actually not needed in all cases, but to cope with them we change them all to new RAW api calls (using preprocessor macros).
Essentially, malloc() maps to PyCCP_RawMalloc() or PyMem_MALLOC_INNER() (both local additions) based on whether the particular site using malloc() requires truly gil free malloc or not.

For this reason, the custom allocators mentioned canot be assumed to be called with the GIL.  However, it is easily possible to extend the system above so that there is a GIL and non-GIL version for the 'regular' allocator.

I'll put details of the stuff we have done for EVE Online / Dust 514 on my blog.  It is this, but much much more too.

Hopefully we can arrive at a way to abstract memory allocation away from Python in a flexible and extendible manner :)
msg183950 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-03-11 11:15
Note that I'm definitely open to including extra settings to set up custom allocators as part of Py_CoreConfig in PEP 432 (http://www.python.org/dev/peps/pep-0432/#pre-initialization-phase).

I don't really want to continue the tradition of additional PySet_* APIs with weird conditions on when they have to be called, though (trying to prevent more of that kind of organic growth in complexity is why I wrote PEP 432 in the first place)
msg183951 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-03-11 11:30
Absolutely.  Although there is a very useful scenario where this could be consided a run-time setting:

  # turboprofiler.py
  # Load up the memory hooker which will supply us with all the info
  import _turboprofiler
  _turboprofiler.hookup()

Perhaps people interested in memory optimizations and profiling could hook up at pycon?  It is the most common regular query I get from people in my organization:  How can I find out how python is using/leaking/wasting memory?
msg190429 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-05-31 23:40
> typedef void *(*PyCCP_Malloc_t)(size_t size, void *arg, const char *file, int line, const char *msg);

I don't understand the purpose of the filename and line number. Python does not have such information. Is it just to have the API expected by Unreal engine?

What is the message? How is it filled?

--

I'm proposing a simpler prototype:

void* (*malloc) (size_t);

Just because Python does not use or have less or more. I'm not against adding an arbitrary void* argument, it should not hurt, and may be required by some other applications or libraries.

@kristjan.jonsson: Can you adapt your tool to fit the following API?

PyAPI_FUNC(int) Py_SetAllocators(
    char api,
    void* (*malloc) (size_t size, void *data),
    void* (*realloc) (void* ptr, size_t size, void *data),
    void (*free) (void* ptr, void *data)
    );

--

My pytracemalloc project hooks allocation functions and then use C Python functions to get the current filename and line number. No need to modify the C code to pass __FILE__ and __LINE__.

It can produce such summary:

2013-02-28 23:40:18: Top 5 allocations per file
#1: .../Lib/test/regrtest.py: 3998 KB
#2: .../Lib/unittest/case.py: 2343 KB
#3: .../ctypes/test/__init__.py: 513 KB
#4: .../Lib/encodings/__init__.py: 525 KB
#5: .../Lib/compiler/transformer.py: 438 KB
other: 32119 KB
Total allocated size: 39939 KB

You can also configure it to display also the line number.

https://pypi.python.org/pypi/pytracemalloc
msg190528 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-03 09:37
Hi.
the file and line arguments are for expanding from macros such as PyMem_MALLOC.  I had them added because they provide the features of a comprehensive debugging API.

Of course, I'm not showing you the entire set of modifications that we have made to the memory allocation scheme.  They including more extensive versions of the memory allocation tools, in order to more easily monitor memory allocations from within C.

For your information, I'm uploading pymemory.h from our 2.7 patch.  The extent of our modifications can be gleaned from there.

Basically, we have layered the macros into outer and inner versions, in order to better support internal diagnostics.

I'm happy with the api you provide, with a small addition:
PyAPI_FUNC(int) Py_SetAllocators(
    char api,
    void* (*malloc) (size_t size, void *data),
    void* (*realloc) (void* ptr, size_t size, void *data),
    void (*free) (void* ptr, void *data),
    void *data
    );

The 'data' pointer is pointless unless you can provide it as part of the  api.  This sort of extra indirection is necessary for C callbacks to provide instance specific context to statically compiled and linked callback functions.
msg190529 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-03 09:40
Also, our ccpmem.h, the interface to the ccpmem.cpp, internal flexible memory allocator framework.
Again, just FYI.  There are no trade secrets here, so please ask me for more details, if interested.  One particular trick we have been using, which might be of interest, is to be able to tag each allocation with a "context" id.  This is then set according to a global sys.memcontext variable, which the program will modify according to what it is doing.  This can then be used to track memory usage by different parts of the program.
msg190534 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-03 10:25
"""
I'm happy with the api you provide, with a small addition:
PyAPI_FUNC(int) Py_SetAllocators(
    char api,
    void* (*malloc) (size_t size, void *data),
    void* (*realloc) (void* ptr, size_t size, void *data),
    void (*free) (void* ptr, void *data),
    void *data
    );
"""

Oops, I forgot "void *data". Yeah, each group of allocator functions (malloc, free and realloc) will get its own "data" pointer.
msg190539 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-03 12:03
New patch (version 2), more complete:

 * add "void *data" argument to all allocator functions
 * add "block" API used for pymalloc allocator to allocate arenas. Use mmap or malloc, but may use VirtualAlloc in a near future (see #13483). Callbacks prototype:

   - void block_malloc (size_t, void*);
   - void block_free (void*, size_t, void*);

 * remove PY_ALLOC_SYSTEM_API


Main API:
---
#define PY_ALLOC_MEM_API 'm'      /* PyMem_Malloc() API */
#define PY_ALLOC_OBJECT_API 'o'   /* PyObject_Malloc() API */

PyAPI_FUNC(int) Py_GetAllocators(
    char api,
    void* (**malloc_p) (size_t size, void *user_data),
    void* (**realloc_p) (void *ptr, size_t size, void *user_data),
    void (**free_p) (void *ptr, void *user_data),
    void **user_data_p
    );

PyAPI_FUNC(int) Py_SetAllocators(
    char api,
    void* (*malloc) (size_t size, void *user_data),
    void* (*realloc) (void *ptr, size_t size, void *user_data),
    void (*free) (void *ptr, void *user_data),
    void *user_data
    );

PyAPI_FUNC(void) Py_GetBlockAllocators(
    void* (**malloc_p) (size_t size, void *user_data),
    void (**free_p) (void *ptr, size_t size, void *user_data),
    void **user_data_p
    );

PyAPI_FUNC(int) Py_SetBlockAllocators(
    void* (*malloc) (size_t size, void *user_data),
    void (*free) (void *ptr, size_t size, void *user_data),
    void *user_data
    );
---


I see the following use cases using allocators:

* Don't use malloc nor mmap but your own allocator: replace PyMem and PyObject allocators
* Track memory leaks (my pytracemalloc project, or Antoine's simple _Py_AllocatedBlocks counter): hook PyMem and PyObject allocators
* Fill newly allocated memory with a pattern and check for buffer underflow and overflow: hook PyMem and PyObject allocators

"Hook" means adding extra code before and/or after calling the original function.

The final API should allow to hook the APIS multiple times and replacing allocators. So it should be possible to track memory leaks, detect buffer overflow and our your own allocators. It is not yet possible with the patch 2, because _PyMem_DebugMalloc() calls directly malloc().

_PyMem_DebugMalloc is no more used by PyObject_Malloc. This code should be rewritten to use the hook approach instead of replacing memory allocators.


Example tracing PyMem calls using the hook approach:
-----------------------------------
typedef struct {
    void* (*malloc) (size_t, void*);
    void* (*realloc) (void*, size_t, void*);
    void (*free) (void*, void*);
    void *data;
} allocators_t;

allocators_t pymem, pyobject;

void* trace_malloc (size_t size, void* data)
{
    allocators_t *alloc = (allocators_t *)data;
    printf("malloc(%z)\n", size);
    return alloc.malloc(size, alloc.data);
}

void* trace_realloc (void* ptr, size_t size, void* data)
{
    allocators_t *alloc = (allocators_t *)data;
    printf("realloc(%p, %z)\n", ptr, size);
    return alloc.realloc(ptr, size, alloc.data);
}

void trace_free (void* ptr, void* data)
{
    allocators_t *alloc = (allocators_t *)data;
    printf("free(%p)\n", ptr);
    alloc.free(ptr, alloc.data);
}

void hook_pymem(void)
{
   Py_GetAllocators(PY_ALLOC_MEM_API, &pymem.malloc, &pymem.realloc, &pymem.free, &pymem.data);
   Py_SetAllocators(PY_ALLOC_MEM_API, trace_malloc, trace_realloc, trace_free, &pymem);

   Py_GetAllocators(PY_ALLOC_OBJECT_API, &pyobject.malloc, &pyobject.realloc, &pyobject.free, &pyobject.data);
   Py_SetAllocators(PY_ALLOC_OBJECT_API, trace_malloc, trace_realloc, trace_free, &pyobject);
}
-----------------------------------

I didn't try the example :-p It is just to give you an idea of the API and how to use it.
msg190741 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-07 09:33
I'd like to add some argument to providing a "file" and "line number" to the allocation api.  I know that currently this is not provided e.g. by the PyMem_Allocate() functions, but I think it would be wise to provide a "debug" version of these functions that pass in the call sites.  An allocator api that then also allows for these values to be provided to the malloc/realloc/free routines is then future-proof in that respect.

Case in point:  We have a memory profiler running which uses a allocator hook system similar to what Victor is proposing.  But in addition, it provides a "file " and "line" argument to every function.

Now, the profiler is currently not using this code.  Here how the "malloc" function looks:

static void *
PyMalloc(size_t size, void *arg, const char *file, int line, const char *msg)
{
    void *r = DustMalloc(size);
    if (r) {
        tmAllocEx(g_telemetryContext, file, line, r, size, "Python alloc: %s", msg);
		ReportAllocInfo(AllocEvent, 0, r, size);
    }
    return r;
}

tmAllocEx is calling the Telemetry memory profiles and passing in the allocation site.  (http://www.radgametools.com/telemetry.htm, also my blog about using it:  http://cosmicpercolator.com/2012/05/25/optimizing-python-condition-variables-with-telemetry/

But our profiler, called with ReportAllocInfo, isn't using this.  It relies solely on extracting the python callstack.

Today, I got this email (see attached file Capture.jpg)  

Basically, the profiler sees a lot of allocated memory with no python call stack.  Now it would be useful if we had the C call site information, to know where it came from.

So:  My suggestion is that the allocator api be
1) a struct, which allows for a cleaner api function
2) Include C filename and line number.

Even though the current python memory API (e.g. PyMem_Malloc(), PyObject_Malloc()) do not currently support it, this would allow us to internally have _extended_ versions of these apis that do support it and macros that pass in that information.  This can be added at a later stage.  Having it in the allcoator api function would make it more future proof.

See also my "pymem.h" and "ccpmem.h" files attached to this defect for examples on how we have tweaked python's internal memory apis to support this information.
msg190937 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-10 23:05
py_setallocators-filename.patch: Here is a try to define an API providing the filename and line number of the C code. The Py_SetAllocators() API is unchanged:

PyAPI_FUNC(int) Py_SetAllocators(
    char api,
    void* (*malloc) (size_t size, void *user_data),
    void* (*realloc) (void *ptr, size_t size, void *user_data),
    void (*free) (void *ptr, void *user_data),
    void *user_data
    );

If Python is compiled with -DPYMEM_TRACE_MALLOC, user_data is not the last parameter passed to Py_SetAllocators() but a pointer to a _PyMem_Trace structure:

typedef struct {
    void *data;
    /* NULL and -1 when unknown */
    const char *filename;
    int lineno;
} _PyMem_Trace;


The problem is that the module using Py_SetAllocators() must be compiled differently depending on PYMEM_TRACE_MALLOC. Example from pytracemalloc, modified for this patch:
---
    _PyMem_Trace *ctrace;
    trace_api_t *api;
    void *call_data;
    void *ptr;
#ifdef PYMEM_TRACE_MALLOC
    ctrace = (_PyMem_Trace *)data;
    api = (trace_api_t *)ctrace->data;
    ctrace->data = api->data;
    call_data = data;
#else
    ctrace = NULL;
    api = (trace_api_t *)data;
    call_data = api->data;
#endif
    ptr = api->malloc(size, call_data);
    ...
---
I didn't like the "ctrace->data = api->data;" instruction: pytracemalloc modifies the input _PyMem_Trace structure.


pytracemalloc code is a little bit more complex, but "it works". pytracemalloc can reuse the filename and line number of the C module, or of the Python module. It can be configured at runtime. Example of output for the C module:
---
2013-06-11 00:36:30: Top 15 allocations per file and line (compared to 2013-06-11 00:36:25)
#1: Objects/dictobject.c:352: size=6 MiB (+4324 KiB), count=9818 (+7773), average=663 B
#2: Objects/unicodeobject.c:1085: size=6 MiB (+2987 KiB), count=61788 (+26197), average=111 B
#3: Objects/tupleobject.c:104: size=4054 KiB (+2176 KiB), count=44569 (+24316), average=93 B
#4: Objects/typeobject.c:770: size=2440 KiB (+1626 KiB), count=13906 (+10360), average=179 B
#5: Objects/bytesobject.c:107: size=2395 KiB (+1114 KiB), count=24846 (+11462), average=98 B
#6: Objects/funcobject.c:12: size=1709 KiB (+1103 KiB), count=11516 (+7431), average=152 B
#7: Objects/codeobject.c:117: size=1760 KiB (+871 KiB), count=11267 (+5578), average=160 B
#8: Objects/dictobject.c:399: size=784 KiB (+627 KiB), count=10040 (+8028), average=80 B
#9: Objects/listobject.c:159: size=420 KiB (+382 KiB), count=5386 (+4891), average=80 B
#10: Objects/frameobject.c:649: size=1705 KiB (+257 KiB), count=3374 (+505), average=517 B
#11: ???:?: size=388 KiB (+161 KiB), count=588 (+240), average=676 B
#12: Objects/weakrefobject.c:36: size=241 KiB (+138 KiB), count=2579 (+1482), average=96 B
#13: Objects/dictobject.c:420: size=135 KiB (+112 KiB), count=2031 (+1736), average=68 B
#14: Objects/classobject.c:59: size=109 KiB (+105 KiB), count=1400 (+1345), average=80 B
#15: Objects/unicodeobject.c:727: size=188 KiB (+86 KiB), count=1237 (+687), average=156 B
37 more: size=828 KiB (+315 KiB), count=8421 (+5281), average=100 B
Total Python memory: size=29 MiB (+16 MiB), count=212766 (+117312), average=145 B
Total process memory: size=68 MiB (+22 MiB) (ignore tracemalloc: 0 B)
---


I also had to modify the following GC functions to get more accurate information:

- _PyObject_GC_Malloc(size)
- _PyObject_GC_New(tp)
- _PyObject_GC_NewVar(tp, nitems)
- PyObject_GC_Del(op)

For example, PyTuple_New() calls PyObject_GC_NewVar() to allocate its memory. With my patch, you get "Objects/tupleobject.c:104" instead of a generic "Modules/gcmodule.c:1717".
msg190940 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-11 01:17
New version of the patch, py_setallocators-3.patch:
 - _PyMem_DebugMalloc(), _PyMem_DebugFree() and _PyMem_DebugRealloc() are now setup as hooks to the system allocator and are hook on PyMem API *and* on PyObject API
 - move "if (size > PY_SSIZE_T_MAX)" check into PyObject_Malloc() and PyObject_Realloc()

This patch does not propose a simple API to reuse internal debug hooks when replacing system (PyMem) allocators.
msg190951 - (view) Author: Amaury Forgeot d'Arc (amaury.forgeotdarc) * (Python committer) Date: 2013-06-11 08:45
I prefer the new version without PYMEM_TRACE_MALLOC :-)

Can we rename "API" and "api_id" to something more specific? maybe DOMAIN and domain_id?
msg190962 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-11 11:59
Amaury Forgeot d'Arc added the comment:
> I prefer the new version without PYMEM_TRACE_MALLOC :-)

Well, py_setallocators-filename.patch is more a proof-of-concept
showing how to use my Py_SetAllocators() API to pass the C trace
(filename/line number), than a real proposition. The patch is very
intrusive and huge, I also prefer py_setallocators-3.patch :-)

> Can we rename "API" and "api_id" to something more specific? maybe DOMAIN and domain_id?

Something like:
{PY_ALLOC_MEM_DOMAIN, PY_ALLOC_OBJECT_DOMAIN}.
or
{PYMEM_DOMAIN, PYOBJECT_DOMAIN}
?

There are only two values, another option is to duplicate functions:
- PyMem_GetAllocators(), PyMem_SetAllocators(), PyMem_Malloc(), ..
- PyObject_GetAllocators(), PyObject_SetAllocators(), PyObject_Malloc(), ..

I prefer PyMem_SetAllocators() over PYOBJECT_DOMAIN.
msg191029 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-12 14:48
Benchmark of py_setallocators-3.patch:

 - benchmarks suite (-b 2n3): some tests are 1.04x faster, some tests are 1.04 slower, significant is between 115 and -191. I don't understand these output, but I guess that the overhead cannot be seen with such test.
 - pybench: "+0.1%" (diff between -4.9% and +5.6%)

If I understood correctly, the overhead is really really low (near zero).
msg191030 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-12 14:49
> If I understood correctly, the overhead is really really low (near zero).

See attached output pybench.txt and benchmarks.txt.
msg191049 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-12 21:51
New version (4) of the patch:

 - move the opaque pointer (now called "void *ctx", "context") as the first parameter instead of the last parameter, as done in zlib, lzma and Oracle's OCI APIs; ctx is also the first parameter of Py*_GetFunctions() and Py*_SetFunctions() instead of the last
 - rename public functions:

   * Py_GetAllocators() -> PyMem_GetAllocators(), PyObject_GetAllocators()
   * Py_SetAllocators() -> PyMem_SetAllocators(), PyObject_SetAllocators()
   * Py_GetBlockAllocators() -> PyObject_GetArenaAllocators()
   * Py_SetBlockAllocators() -> PyObject_SetArenaAllocators()

 - move declaration of PyObject_*() functions from pymem.h to objimpl.h
 - split _PyMem big structure into smaller structures: _PyMem, _PyObject, _PyObject_Arena
 - move "if (size == 0) size = 1;" from PyMem_Malloc() to _PyMem_Malloc(), so the custom allocator can decide how to implement PyMem_Malloc(0) (maybe something more efficient)

Does the new API look better? py_setallocators-4.patch is ready for a final review. If nobody complains, I'm going to commit it.
msg191050 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-12 21:53
py_setallocators-4.patch:
- Oh, I forgot another change: Py*_Get/SetAllocators() cannot fail anymore (because of an unknown API identifier), so the return type is now void

I just saw that I forgot ".. versionadded:: 3.4" in the doc.
msg191074 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-13 10:44
> This patch does not propose a simple API to reuse internal
> debug hooks when replacing system (PyMem) allocators.

Ok, this is now fixed with new patch (version 5). Nick does not want a new environment variable, so I added instead a new function PyMem_SetupDebugHooks() which reinstalls hooks to detect bugs if allocator functions were replaced with PyMem_SetAllocators() or PyObject_SetAllocators(). The function does nothing is Python is not compiled in debug more or if hooks are already installed (so the function can be called twice).

I also added unit tests for PyMem_SetAllocators() and PyObject_SetAllocators()! And I added "versionadded:: 3.4" to the C API documentation.
msg191077 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-13 10:54
> To be exhaustive, another patch should be developed to replace
> all calls for malloc/realloc/free by
> PyMem_Malloc/PyMem_Realloc/PyMem_Free.

I created issue #18203 for this point.

> PyObject_Malloc() is still using mmap() or malloc() internally
> for example.

Arena allocator can be replaced or hooked with PyObject_SetArenaAllocators() of my lastest patch.
msg191165 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-14 22:44
New changeset 6661a8154eb3 by Victor Stinner in branch 'default':
Issue #3329: Add new APIs to customize memory allocators
http://hg.python.org/cpython/rev/6661a8154eb3
msg191184 - (view) Author: Roundup Robot (python-dev) Date: 2013-06-15 01:38
New changeset b1455dd08000 by Victor Stinner in branch 'default':
Revert changeset 6661a8154eb3: Issue #3329: Add new APIs to customize memory allocators
http://hg.python.org/cpython/rev/b1455dd08000
msg191379 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-17 22:48
Convert changeset 6661a8154eb3 into a patch: py_setallocators-6.patch.
msg191436 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-18 21:52
Update the patch to follow the API described in the PEP 445 (2013-06-18 22:33:41 +0200).
msg191508 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-20 11:16
Update patch according to the last version of the PEP.
msg192220 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-02 23:01
Updated patch (version 9):

- update API to the last version of the PEP
- PYMEM_DOMAIN_RAW now also have a well defined behaviour when requesting an allocation of zero bytes: PyMem_RawMalloc(0) now calls malloc(1)
- enhance the documentation (ex: mention default allocators)
- _testcapi checks also that PyMem_RawMalloc(0) is non-NULL
msg192504 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 00:25
New changeset ca78c974e938 by Victor Stinner in branch 'default':
Issue #3329: Implement the PEP 445
http://hg.python.org/cpython/rev/ca78c974e938
msg192506 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-07 00:28
Ok, let see if buildbots like the PEP 445 (keep this issue open until we have the result of all 3.4 buildbots).

I created the issue #18392 to document PyObject_Malloc().
msg192508 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-07 01:03
It looks like the changeset ca78c974e938 broke the "x86 XP-4 3.x" buildbot:
buildbot.python.org/all/builders/x86 XP-4 3.x/builds/8795/

Traceback (most recent call last):
  File "../lib/test/regrtest.py", line 1305, in runtest_inner
    test_runner()
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_tools.py", line 459, in test_main
    support.run_unittest(*[obj for obj in globals().values()
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\support.py", line 1600, in run_unittest
    _run_suite(suite)
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\support.py", line 1566, in _run_suite
    result = runner.run(suite)
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 175, in run
    result.printErrors()
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 109, in printErrors
    self.printErrorList('ERROR', self.errors)
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 117, in printErrorList
    self.stream.writeln("%s" % err)
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 25, in writeln
    self.write(arg)
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../lib/test/regrtest.py", line 1615, in <module>
    main_in_temp_cwd()
  File "../lib/test/regrtest.py", line 1590, in main_in_temp_cwd
    main()
  File "../lib/test/regrtest.py", line 796, in main
    match_tests=match_tests)
  File "../lib/test/regrtest.py", line 998, in runtest
    debug, display_failure=False)
  File "../lib/test/regrtest.py", line 1330, in runtest_inner
    msg = traceback.format_exc()
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 254, in format_exc
    return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 180, in format_exception
    return list(_format_exception_iter(etype, value, tb, limit, chain))
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 152, in _format_exception_iter
    yield from _format_list_iter(_extract_tb_iter(tb, limit=limit))
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 17, in _format_list_iter
    for filename, lineno, name, line in extracted_list:
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 64, in _extract_tb_or_stack_iter
    line = linecache.getline(filename, lineno, f.f_globals)
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 15, in getline
    lines = getlines(filename, module_globals)
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 41, in getlines
    return updatecache(filename, module_globals)
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 127, in updatecache
    lines = fp.readlines()
  File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\codecs.py", line 301, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError
msg192509 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 01:06
New changeset 51ed51d10e60 by Victor Stinner in branch 'default':
Issue #3329: Fix _PyObject_ArenaVirtualFree()
http://hg.python.org/cpython/rev/51ed51d10e60
msg192570 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-07 15:25
Buildbots are happy, changeset 51ed51d10e60 fixed the memory leak on Windows XP. Let's close this issue, 5 years after its creation!
msg192576 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-07-07 17:36
Well done.
History
Date User Action Args
2013-07-07 17:36:13kristjan.jonssonsetmessages: + msg192576
2013-07-07 15:25:49hayposetstatus: open -> closed
resolution: fixed
messages: + msg192570
2013-07-07 01:06:31python-devsetmessages: + msg192509
2013-07-07 01:03:58hayposetmessages: + msg192508
2013-07-07 00:28:07hayposetmessages: + msg192506
2013-07-07 00:25:37python-devsetmessages: + msg192504
2013-07-02 23:02:55hayposetfiles: - py_setallocators-8.patch
2013-07-02 23:02:19hayposetfiles: - py_setallocators-7.patch
2013-07-02 23:02:16hayposetfiles: - py_setallocators-6.patch
2013-07-02 23:01:58hayposetfiles: + py_setallocators-9.patch

messages: + msg192220
2013-06-20 11:16:19hayposetfiles: + py_setallocators-8.patch

messages: + msg191508
2013-06-19 12:03:13haypolinkissue16742 dependencies
2013-06-18 21:52:47hayposetfiles: + py_setallocators-7.patch

messages: + msg191436
2013-06-17 22:48:45hayposetfiles: - py_setallocators-5.patch
2013-06-17 22:48:43hayposetfiles: - py_setallocators-4.patch
2013-06-17 22:48:30hayposetfiles: - py_setallocators-3.patch
2013-06-17 22:48:28hayposetfiles: - py_setallocators-2.patch
2013-06-17 22:48:26hayposetfiles: - py_setallocators.patch
2013-06-17 22:48:09hayposetfiles: + py_setallocators-6.patch

messages: + msg191379
2013-06-17 04:39:52trentsetnosy: + trent
2013-06-15 01:38:02python-devsetmessages: + msg191184
2013-06-14 22:44:09python-devsetnosy: + python-dev
messages: + msg191165
2013-06-13 22:57:25jceasetnosy: + jcea
2013-06-13 10:54:44hayposetmessages: + msg191077
2013-06-13 10:44:33hayposetfiles: + py_setallocators-5.patch

messages: + msg191074
2013-06-12 21:53:52hayposetmessages: + msg191050
2013-06-12 21:51:09hayposetfiles: + py_setallocators-4.patch

messages: + msg191049
2013-06-12 14:49:05hayposetmessages: + msg191030
2013-06-12 14:48:34hayposetfiles: + benchmarks.txt
2013-06-12 14:48:27hayposetfiles: + pybench.txt
2013-06-12 14:48:18hayposetmessages: + msg191029
2013-06-11 11:59:00hayposetmessages: + msg190962
2013-06-11 08:45:49amaury.forgeotdarcsetmessages: + msg190951
2013-06-11 01:17:36hayposetfiles: + py_setallocators-3.patch

messages: + msg190940
2013-06-10 23:05:54hayposetfiles: + py_setallocators-filename.patch

messages: + msg190937
2013-06-07 09:33:06kristjan.jonssonsetfiles: + Capture.JPG

messages: + msg190741
2013-06-03 12:03:05hayposetfiles: + py_setallocators-2.patch

messages: + msg190539
2013-06-03 10:25:02hayposetmessages: + msg190534
2013-06-03 09:40:22kristjan.jonssonsetfiles: + ccpmem.h

messages: + msg190529
2013-06-03 09:37:49kristjan.jonssonsetfiles: + pymem.h

messages: + msg190528
2013-05-31 23:40:50hayposetmessages: + msg190429
2013-03-11 11:30:20kristjan.jonssonsetmessages: + msg183951
2013-03-11 11:15:26ncoghlansetmessages: + msg183950
2013-03-11 10:20:00kristjan.jonssonsetmessages: + msg183947
2013-03-11 10:01:26kristjan.jonssonsetnosy: + kristjan.jonsson
2013-03-10 16:29:36gregory.p.smithsetnosy: + gregory.p.smith
2013-03-06 13:41:32amaury.forgeotdarcsetnosy: + amaury.forgeotdarc
messages: + msg183591
2013-03-06 12:38:40hayposetmessages: + msg183590
2013-03-06 12:32:44hayposetversions: + Python 3.4, - Python 3.3
2013-03-06 12:32:32hayposetfiles: + py_setallocators.patch

nosy: + haypo
messages: + msg183587

keywords: + patch
2013-01-25 19:27:26brett.cannonsetnosy: - brett.cannon
2011-08-25 17:16:25pitrousetnosy: + pitrou

messages: + msg142981
versions: + Python 3.3, - Python 3.2
2010-08-09 18:35:18terry.reedysetversions: - Python 3.1, Python 2.7
2010-02-18 20:30:12barrysetassignee: barry ->
2009-10-01 02:49:52tleshersetnosy: + tlesher
2009-08-25 19:33:39pjmcnerneysetnosy: + pjmcnerney

messages: + msg91957
versions: + Python 3.2, - Python 2.6, Python 2.5, Python 3.0
2009-05-29 10:14:50jszakmeistersetnosy: + jszakmeister
2009-01-07 09:18:24jlaurilasetmessages: + msg79309
2009-01-03 19:55:31neilosetnosy: + neilo
messages: + msg78995
versions: + Python 2.6, Python 2.5, Python 3.0
2008-07-10 16:57:10Rhamphoryncussetmessages: + msg69511
2008-07-10 10:12:33rhettingersetassignee: barry
messages: + msg69499
nosy: + barry, rhettinger
2008-07-10 09:59:16ncoghlansetnosy: + ncoghlan
messages: + msg69497
versions: + Python 3.1, Python 2.7, - Python 2.6, Python 3.0
2008-07-10 08:05:35jlaurilasetmessages: + msg69494
2008-07-09 20:06:33Rhamphoryncussetnosy: + Rhamphoryncus
messages: + msg69484
2008-07-09 19:55:55brett.cannonsetnosy: + brett.cannon
messages: + msg69483
2008-07-09 19:48:52jlaurilacreate