New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for setting the memory allocator used by Python #47579
Comments
Currently Python always uses the C library malloc/realloc/free as the The proposal is to make it possible to set the allocator used by the void Py_SetAllocator(void* (*alloc)(size_t), void* (*realloc)(void*,
size_t), void (*free)(void*)) Direct function calls to malloc/realloc/free in obmalloc.c must be |
Is registering pointers to functions really necessary, or would defining I guess my question becomes, Jukka, is this more for alternative |
How would this allow you to free all memory? The interpreter will still |
Brett, the ability to define the allocator dynamically at runtime could The application would control the lifecycle of the Python heap, and this Adam, the cleanup would work by freeing the entire heap used by Python Yes, there are various static pointers that are initially set to NULL, |
Given where we are in the release cycle, I've bumped the target releases And in terms of hooking into this kind of thing, some simple macros that |
I think it is reasonable to get a macro definition change into 2.6. Barry, can you rule on whether to keep this open for consideration in |
Basically you just want to kick the malloc implementation into doing IMO, there's no use-case for this until Py_Finalize can completely tear The practical alternative, as I said, is to run python in a subprocess. |
I'll be in agreement here. I integrated Python into a game engine not |
Brett is right. Macroing the memory allocator is a better choice than Sorry we don't have a clean patch at present for this change only, but |
Has the ability to set the memory allocator been added to Python 2.7/3.1? Thanks, |
All this needs is a patch. |
I attached a patch that I wrote for Wyplay: py_setallocators.patch. The patch adds two functions: PyAPI_FUNC(int) Py_GetAllocators(
char api,
void* (**malloc_p) (size_t),
void* (**realloc_p) (void*, size_t),
void (**free_p) (void*)
);
PyAPI_FUNC(int) Py_SetAllocators(
char api,
void* (*malloc) (size_t),
void* (*realloc) (void*, size_t),
void (*free) (void*)
); Where api is one of these values:
These functions are used by the pytracemalloc project to hook PyMem_Malloc() and PyObject_Malloc() API. pytracemalloc traces all Python memory allocations to compute statistics per Python file. Wyplay is also using Py_SetAllocators() internally to replace completly system allocators *before* Python is started. We have another private patch on Python adding a function. This function sets its own memory allocators, it is called before the start of Python thanks to an "__attribute__((constructor))" attribute. -- If you use Py_SetAllocators() to replace completly a memory allocator (any memory allocation API), you have to do it before the first Python memory allocation (before Py_Main()) *or* your memory allocator must be able to recognize if a pointer was not allocated by him and pass the operation (realloc or free) to the previous memory allocator. For example, PyObject_Free() is able to recognize that a pointer is part of its memory pool, or fallback to the system allocator (extract of the original code): if (Py_ADDRESS_IN_RANGE(p, pool)) {
...
return;
}
free(p); -- If you use Py_SetAllocators() to hook memory allocators (do something before and/or after calling the previous function, *without* touching the pointer nor the size), you can do it anytime. -- I didn't run a benchmark yet to measure the overhead of the patch on Python performances. New functions are not documented nor tested yet. If we want to test these new functions, we can write a simple hook tracing calls to the memory allocators and call the memory allocator. |
To be exhaustive, another patch should be developed to replace all calls for malloc/realloc/free by PyMem_Malloc/PyMem_Realloc/PyMem_Free. PyObject_Malloc() is still using mmap() or malloc() internally for example. Other examples of functions calling malloc/realloc/free directly: _PySequence_BytesToCharpArray(), block_new() (of pyarena.c), find_key() (of thread.c), PyInterpreterState_New(), win32_wchdir(), posix_getcwd(), Py_Main(), etc. |
Some customizable memory allocators I know have an extra parameter "void *opaque" that is passed to all functions:
OTOH, expat, libxml, libmpdec don't have this extra parameter. |
At ccp we have something similar. We are embedding python in the UnrealEngine on the PS3 and need to get everything through their allocators. For the purpose of flexibility, we added an api similar to the OPs, but more flexible: /* Support for custom allocators */
typedef void *(*PyCCP_Malloc_t)(size_t size, void *arg, const char *file, int line, const char *msg);
typedef void *(*PyCCP_Realloc_t)(void *ptr, size_t size, void *arg, const char *file, int line, const char *msg);
typedef void (*PyCCP_Free_t)(void *ptr, void *arg, const char *file, int line, const char *msg);
typedef size_t (*PyCCP_Msize_t)(void *ptr, void *arg);
typedef struct PyCCP_CustomAllocator_t
{
PyCCP_Malloc_t pMalloc;
PyCCP_Realloc_t pRealloc;
PyCCP_Free_t pFree;
PyCCP_Msize_t pMsize; /* can be NULL, or return -1 if no size info is avail. */
void *arg; /* opaque argument for the functions */
} PyCCP_CustomAllocator_t;
/* To set an allocator! use 0 for the regular allocator, 1 for the block allocator.
* pass a null pointer to reset to internal default
*/
PyAPI_FUNC(void) PyCCP_SetAllocator(int which, const PyCCP_CustomAllocator_t *); For a module to install itself as a "hook" at runtime, this approach can be extended by querying the current allocator, so that such a hook can the delegate the previous calls. The "block" allocator here, is intended as the underlying allocator to be used by obmalloc.c. Depending on platforms, this can then allocate aligned virtual memory directly, which is more efficient than layering that on-top of a malloc-like allocator. There are areas in cPython that use malloc() directly. Those are actually not needed in all cases, but to cope with them we change them all to new RAW api calls (using preprocessor macros). For this reason, the custom allocators mentioned canot be assumed to be called with the GIL. However, it is easily possible to extend the system above so that there is a GIL and non-GIL version for the 'regular' allocator. I'll put details of the stuff we have done for EVE Online / Dust 514 on my blog. It is this, but much much more too. Hopefully we can arrive at a way to abstract memory allocation away from Python in a flexible and extendible manner :) |
Note that I'm definitely open to including extra settings to set up custom allocators as part of Py_CoreConfig in PEP-432 (http://www.python.org/dev/peps/pep-0432/#pre-initialization-phase). I don't really want to continue the tradition of additional PySet_* APIs with weird conditions on when they have to be called, though (trying to prevent more of that kind of organic growth in complexity is why I wrote PEP-432 in the first place) |
Absolutely. Although there is a very useful scenario where this could be consided a run-time setting: # turboprofiler.py
# Load up the memory hooker which will supply us with all the info
import _turboprofiler
_turboprofiler.hookup() Perhaps people interested in memory optimizations and profiling could hook up at pycon? It is the most common regular query I get from people in my organization: How can I find out how python is using/leaking/wasting memory? |
I don't understand the purpose of the filename and line number. Python does not have such information. Is it just to have the API expected by Unreal engine? What is the message? How is it filled? -- I'm proposing a simpler prototype: void* (*malloc) (size_t); Just because Python does not use or have less or more. I'm not against adding an arbitrary void* argument, it should not hurt, and may be required by some other applications or libraries. @kristjan.jonsson: Can you adapt your tool to fit the following API? PyAPI_FUNC(int) Py_SetAllocators(
char api,
void* (*malloc) (size_t size, void *data),
void* (*realloc) (void* ptr, size_t size, void *data),
void (*free) (void* ptr, void *data)
); -- My pytracemalloc project hooks allocation functions and then use C Python functions to get the current filename and line number. No need to modify the C code to pass __FILE__ and __LINE__. It can produce such summary: 2013-02-28 23:40:18: Top 5 allocations per file You can also configure it to display also the line number. |
Hi. Of course, I'm not showing you the entire set of modifications that we have made to the memory allocation scheme. They including more extensive versions of the memory allocation tools, in order to more easily monitor memory allocations from within C. For your information, I'm uploading pymemory.h from our 2.7 patch. The extent of our modifications can be gleaned from there. Basically, we have layered the macros into outer and inner versions, in order to better support internal diagnostics. I'm happy with the api you provide, with a small addition:
PyAPI_FUNC(int) Py_SetAllocators(
char api,
void* (*malloc) (size_t size, void *data),
void* (*realloc) (void* ptr, size_t size, void *data),
void (*free) (void* ptr, void *data),
void *data
); The 'data' pointer is pointless unless you can provide it as part of the api. This sort of extra indirection is necessary for C callbacks to provide instance specific context to statically compiled and linked callback functions. |
Also, our ccpmem.h, the interface to the ccpmem.cpp, internal flexible memory allocator framework. |
"""
I'm happy with the api you provide, with a small addition:
PyAPI_FUNC(int) Py_SetAllocators(
char api,
void* (*malloc) (size_t size, void *data),
void* (*realloc) (void* ptr, size_t size, void *data),
void (*free) (void* ptr, void *data),
void *data
);
""" Oops, I forgot "void *data". Yeah, each group of allocator functions (malloc, free and realloc) will get its own "data" pointer. |
New patch (version 2), more complete:
Main API: #define PY_ALLOC_MEM_API 'm' /* PyMem_Malloc() API */
#define PY_ALLOC_OBJECT_API 'o' /* PyObject_Malloc() API */
PyAPI_FUNC(int) Py_GetAllocators(
char api,
void* (**malloc_p) (size_t size, void *user_data),
void* (**realloc_p) (void *ptr, size_t size, void *user_data),
void (**free_p) (void *ptr, void *user_data),
void **user_data_p
);
PyAPI_FUNC(int) Py_SetAllocators(
char api,
void* (*malloc) (size_t size, void *user_data),
void* (*realloc) (void *ptr, size_t size, void *user_data),
void (*free) (void *ptr, void *user_data),
void *user_data
);
PyAPI_FUNC(void) Py_GetBlockAllocators(
void* (**malloc_p) (size_t size, void *user_data),
void (**free_p) (void *ptr, size_t size, void *user_data),
void **user_data_p
);
PyAPI_FUNC(int) Py_SetBlockAllocators(
void* (*malloc) (size_t size, void *user_data),
void (*free) (void *ptr, size_t size, void *user_data),
void *user_data
); I see the following use cases using allocators:
"Hook" means adding extra code before and/or after calling the original function. The final API should allow to hook the APIS multiple times and replacing allocators. So it should be possible to track memory leaks, detect buffer overflow and our your own allocators. It is not yet possible with the patch 2, because _PyMem_DebugMalloc() calls directly malloc(). _PyMem_DebugMalloc is no more used by PyObject_Malloc. This code should be rewritten to use the hook approach instead of replacing memory allocators. Example tracing PyMem calls using the hook approach: typedef struct {
void* (*malloc) (size_t, void*);
void* (*realloc) (void*, size_t, void*);
void (*free) (void*, void*);
void *data;
} allocators_t; allocators_t pymem, pyobject; void* trace_malloc (size_t size, void* data)
{
allocators_t *alloc = (allocators_t *)data;
printf("malloc(%z)\n", size);
return alloc.malloc(size, alloc.data);
}
void* trace_realloc (void* ptr, size_t size, void* data)
{
allocators_t *alloc = (allocators_t *)data;
printf("realloc(%p, %z)\n", ptr, size);
return alloc.realloc(ptr, size, alloc.data);
}
void trace_free (void* ptr, void* data)
{
allocators_t *alloc = (allocators_t *)data;
printf("free(%p)\n", ptr);
alloc.free(ptr, alloc.data);
}
void hook_pymem(void)
{
Py_GetAllocators(PY_ALLOC_MEM_API, &pymem.malloc, &pymem.realloc, &pymem.free, &pymem.data);
Py_SetAllocators(PY_ALLOC_MEM_API, trace_malloc, trace_realloc, trace_free, &pymem);
Py_GetAllocators(PY_ALLOC_OBJECT_API, &pyobject.malloc, &pyobject.realloc, &pyobject.free, &pyobject.data);
Py_SetAllocators(PY_ALLOC_OBJECT_API, trace_malloc, trace_realloc, trace_free, &pyobject);
} I didn't try the example :-p It is just to give you an idea of the API and how to use it. |
I'd like to add some argument to providing a "file" and "line number" to the allocation api. I know that currently this is not provided e.g. by the PyMem_Allocate() functions, but I think it would be wise to provide a "debug" version of these functions that pass in the call sites. An allocator api that then also allows for these values to be provided to the malloc/realloc/free routines is then future-proof in that respect. Case in point: We have a memory profiler running which uses a allocator hook system similar to what Victor is proposing. But in addition, it provides a "file " and "line" argument to every function. Now, the profiler is currently not using this code. Here how the "malloc" function looks: static void *
PyMalloc(size_t size, void *arg, const char *file, int line, const char *msg)
{
void *r = DustMalloc(size);
if (r) {
tmAllocEx(g_telemetryContext, file, line, r, size, "Python alloc: %s", msg);
ReportAllocInfo(AllocEvent, 0, r, size);
}
return r;
} tmAllocEx is calling the Telemetry memory profiles and passing in the allocation site. (http://www.radgametools.com/telemetry.htm, also my blog about using it: http://cosmicpercolator.com/2012/05/25/optimizing-python-condition-variables-with-telemetry/ But our profiler, called with ReportAllocInfo, isn't using this. It relies solely on extracting the python callstack. Today, I got this email (see attached file Capture.jpg) Basically, the profiler sees a lot of allocated memory with no python call stack. Now it would be useful if we had the C call site information, to know where it came from. So: My suggestion is that the allocator api be
Even though the current python memory API (e.g. PyMem_Malloc(), PyObject_Malloc()) do not currently support it, this would allow us to internally have _extended_ versions of these apis that do support it and macros that pass in that information. This can be added at a later stage. Having it in the allcoator api function would make it more future proof. See also my "pymem.h" and "ccpmem.h" files attached to this defect for examples on how we have tweaked python's internal memory apis to support this information. |
py_setallocators-filename.patch: Here is a try to define an API providing the filename and line number of the C code. The Py_SetAllocators() API is unchanged: PyAPI_FUNC(int) Py_SetAllocators(
char api,
void* (*malloc) (size_t size, void *user_data),
void* (*realloc) (void *ptr, size_t size, void *user_data),
void (*free) (void *ptr, void *user_data),
void *user_data
); If Python is compiled with -DPYMEM_TRACE_MALLOC, user_data is not the last parameter passed to Py_SetAllocators() but a pointer to a _PyMem_Trace structure: typedef struct {
void *data;
/* NULL and -1 when unknown */
const char *filename;
int lineno;
} _PyMem_Trace; The problem is that the module using Py_SetAllocators() must be compiled differently depending on PYMEM_TRACE_MALLOC. Example from pytracemalloc, modified for this patch: _PyMem_Trace *ctrace;
trace_api_t *api;
void *call_data;
void *ptr;
#ifdef PYMEM_TRACE_MALLOC
ctrace = (_PyMem_Trace *)data;
api = (trace_api_t *)ctrace->data;
ctrace->data = api->data;
call_data = data;
#else
ctrace = NULL;
api = (trace_api_t *)data;
call_data = api->data;
#endif
ptr = api->malloc(size, call_data);
... I didn't like the "ctrace->data = api->data;" instruction: pytracemalloc modifies the input _PyMem_Trace structure. pytracemalloc code is a little bit more complex, but "it works". pytracemalloc can reuse the filename and line number of the C module, or of the Python module. It can be configured at runtime. Example of output for the C module: I also had to modify the following GC functions to get more accurate information:
For example, PyTuple_New() calls PyObject_GC_NewVar() to allocate its memory. With my patch, you get "Objects/tupleobject.c:104" instead of a generic "Modules/gcmodule.c:1717". |
New version of the patch, py_setallocators-3.patch:
This patch does not propose a simple API to reuse internal debug hooks when replacing system (PyMem) allocators. |
I prefer the new version without PYMEM_TRACE_MALLOC :-) Can we rename "API" and "api_id" to something more specific? maybe DOMAIN and domain_id? |
Amaury Forgeot d'Arc added the comment:
Well, py_setallocators-filename.patch is more a proof-of-concept
Something like: There are only two values, another option is to duplicate functions:
I prefer PyMem_SetAllocators() over PYOBJECT_DOMAIN. |
Benchmark of py_setallocators-3.patch:
If I understood correctly, the overhead is really really low (near zero). |
See attached output pybench.txt and benchmarks.txt. |
New version (4) of the patch:
Does the new API look better? py_setallocators-4.patch is ready for a final review. If nobody complains, I'm going to commit it. |
py_setallocators-4.patch:
I just saw that I forgot ".. versionadded:: 3.4" in the doc. |
Ok, this is now fixed with new patch (version 5). Nick does not want a new environment variable, so I added instead a new function PyMem_SetupDebugHooks() which reinstalls hooks to detect bugs if allocator functions were replaced with PyMem_SetAllocators() or PyObject_SetAllocators(). The function does nothing is Python is not compiled in debug more or if hooks are already installed (so the function can be called twice). I also added unit tests for PyMem_SetAllocators() and PyObject_SetAllocators()! And I added "versionadded:: 3.4" to the C API documentation. |
I created issue bpo-18203 for this point.
Arena allocator can be replaced or hooked with PyObject_SetArenaAllocators() of my lastest patch. |
New changeset 6661a8154eb3 by Victor Stinner in branch 'default': |
New changeset b1455dd08000 by Victor Stinner in branch 'default': |
Convert changeset 6661a8154eb3 into a patch: py_setallocators-6.patch. |
Update the patch to follow the API described in the PEP-445 (2013-06-18 22:33:41 +0200). |
Update patch according to the last version of the PEP. |
Updated patch (version 9):
|
New changeset ca78c974e938 by Victor Stinner in branch 'default': |
It looks like the changeset ca78c974e938 broke the "x86 XP-4 3.x" buildbot: Traceback (most recent call last):
File "../lib/test/regrtest.py", line 1305, in runtest_inner
test_runner()
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\test_tools.py", line 459, in test_main
support.run_unittest(*[obj for obj in globals().values()
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\support.py", line 1600, in run_unittest
_run_suite(suite)
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\test\support.py", line 1566, in _run_suite
result = runner.run(suite)
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 175, in run
result.printErrors()
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 109, in printErrors
self.printErrorList('ERROR', self.errors)
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 117, in printErrorList
self.stream.writeln("%s" % err)
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\unittest\runner.py", line 25, in writeln
self.write(arg)
MemoryError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "../lib/test/regrtest.py", line 1615, in <module>
main_in_temp_cwd()
File "../lib/test/regrtest.py", line 1590, in main_in_temp_cwd
main()
File "../lib/test/regrtest.py", line 796, in main
match_tests=match_tests)
File "../lib/test/regrtest.py", line 998, in runtest
debug, display_failure=False)
File "../lib/test/regrtest.py", line 1330, in runtest_inner
msg = traceback.format_exc()
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 254, in format_exc
return "".join(format_exception(*sys.exc_info(), limit=limit, chain=chain))
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 180, in format_exception
return list(_format_exception_iter(etype, value, tb, limit, chain))
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 152, in _format_exception_iter
yield from _format_list_iter(_extract_tb_iter(tb, limit=limit))
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 17, in _format_list_iter
for filename, lineno, name, line in extracted_list:
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\traceback.py", line 64, in _extract_tb_or_stack_iter
line = linecache.getline(filename, lineno, f.f_globals)
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 15, in getline
lines = getlines(filename, module_globals)
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 41, in getlines
return updatecache(filename, module_globals)
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\linecache.py", line 127, in updatecache
lines = fp.readlines()
File "D:\cygwin\home\db3l\buildarea\3.x.bolen-windows\build\lib\codecs.py", line 301, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError |
New changeset 51ed51d10e60 by Victor Stinner in branch 'default': |
Buildbots are happy, changeset 51ed51d10e60 fixed the memory leak on Windows XP. Let's close this issue, 5 years after its creation! |
Well done. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: