classification
Title: Replace direct calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc()
Type: enhancement Stage: needs patch
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: aliles, amaury.forgeotdarc, christian.heimes, georg.brandl, haypo, kristjan.jonsson, ncoghlan, python-dev, serhiy.storchaka, trent
Priority: normal Keywords: patch

Created on 2013-06-13 10:52 by haypo, last changed 2013-07-27 00:39 by python-dev. This issue is now closed.

Files
File name Uploaded Description Edit
ccpmem_patch.h kristjan.jonsson, 2013-06-14 09:12 Patch file
pymem_debugcheckgil.patch haypo, 2013-06-15 00:07 review
py_finalize.patch haypo, 2013-06-15 00:10 review
pymem_debugcheckgil-2.patch haypo, 2013-06-15 00:13 review
malloc_init.patch haypo, 2013-06-15 00:44 review
malloc_modules.patch haypo, 2013-06-15 01:19 review
Messages (23)
msg191076 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-13 10:52
The issue #3329 proposes an API to replace memory allocator functions. But Python calls directly malloc(), realloc() and free() in some functions, so custom allocators would not be used there.

Examples of functions calling malloc/realloc/free directly: _PySequence_BytesToCharpArray(), block_new() (of pyarena.c), find_key() (of thread.c), PyInterpreterState_New(), win32_wchdir(), posix_getcwd(), Py_Main(), etc.

We have to be careful with the GIL: PyMem_*() functions can only be called when holding the GIL.
msg191078 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-06-13 11:14
Be aware about external code which allocate memory itself (i.e. expat).
msg191080 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-13 11:32
> Be aware about external code which allocate memory itself (i.e. expat).

Well, I know that it will hard to cover 100% of the stdlib. I just want to replace the most obvious calls.


Some libraries can be configured to use a custom memory allocators:

- zlib: zalloc and zfree, http://www.zlib.net/manual.html#Usage
- bz2
- lzma: LzmaEnc_Create parameter
- OpenSSL: CRYPTO_set_mem_functions
- etc.

We should probably uses these functions to reuse Python allocators (PyMem_Malloc()), or maybe only if PyMem_SetAllocators() has been called? (if Python does not use system allocators, aka malloc)

See also #18178 for libffi, it may be related.


The _decimal module configures libmpdec to use PyMem_Malloc:

#define _Py_DEC_MINALLOC 4

    mpd_mallocfunc = PyMem_Malloc;
    mpd_reallocfunc = PyMem_Realloc;
    mpd_callocfunc = mpd_callocfunc_em;
    mpd_free = PyMem_Free;
    mpd_setminalloc(_Py_DEC_MINALLOC);
msg191082 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2013-06-13 12:15
expat has a XML_Memory_Handling_Suite. You just have to replace XML_ParserCreate() and XML_ParserCreateNS with XML_ParserCreate_MM().
msg191086 - (view) Author: Georg Brandl (georg.brandl) * (Python committer) Date: 2013-06-13 17:27
> We have to be careful with the GIL: PyMem_*() functions can only be
> called when holding the GIL.

> Some libraries can be configured to use a custom memory allocators:
> [...]
> We should probably uses these functions to reuse Python allocators 
> (PyMem_Malloc())

I think there's a potential problem here :)
msg191093 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-13 22:07
>> We have to be careful with the GIL: PyMem_*() functions can only be
>> called when holding the GIL.

> (...)
> I think there's a potential problem here :)

I didn't understand the motivation to require the GIL held for PyMem_Malloc(). I searched in the source code history and on the Internet (archives of python-dev). In my opinion, the restiction is motivated by a bug: PyMem_Malloc() calls (indirectly) PyObject_Malloc() in debug mode, and PyObject_Malloc() is not thread-safe.

I opened a thread on python-dev to discuss this point.
msg191111 - (view) Author: Kristján Valur Jónsson (kristjan.jonsson) * (Python committer) Date: 2013-06-14 09:12
Keeping the GIL requirement is _very_ useful for PyMem_MALLOC et al.  It allows applications to hook in their own monitoring code, accessible from python itself, without having to worry about conflicts with python.
even if it were not for the GIL itself, PyMem_Malloc() may have all sorts of side effects.

Because of this, and to allow ourselves the flexibility to do all sorts of things inside PyMem_Malloc(), at CCP we added a parallel api, PyMem_MALLOC_RAW() etc.
This api is guaranteed to delegate directly to the external allocator (malloc by default, or an embedding application's supplied allocastor)

We have patched pythoncore in 2.7 in all places there were using malloc directly using the file attached to the defect.  Notice how it can patch "malloc" in two different ways, using either regular malloc (in non-sensitive areas) and using the raw malloc (in sensitive areas.)

e.g. thread.c contains the following lines in our branch:
#include "Python.h"

/* patch malloc/free with threadsafe python versions */
#define CCPMEM_PATCH_RAW
#include "ccpmem_patch.h"
msg191169 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-14 23:05
I commited my new API to customize memory allocators:

New changeset 6661a8154eb3 by Victor Stinner in branch 'default':
Issue #3329: Add new APIs to customize memory allocators
http://hg.python.org/cpython/rev/6661a8154eb3

I added PyMem_RawMalloc(), PyMem_RawRealloc() and PyMem_RawFree() in the same commit. These functions are wrappers to malloc/realloc/free which can be called without the GIL held. Using these new functions instead of malloc/realloc/free is interesting because the internal functions can be replaced with PyMem_SetRawAllocators() and many checks are added in debug mode (ex: check for buffer under- and overflow).
msg191174 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-15 00:07
pymem_debugcheckgil.patch: Patch adding check in debug mode to ensure that PyMem_Malloc() and PyObject_Malloc() are called with the GIL held.
msg191175 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-15 00:10
py_finalize.patch: modify Py_Finalize() to destroy the GIL after the last call to PyMem_Malloc() or PyObject_Malloc(). For example, 
PyCFunction_Fini() calls PyObject_GC_Del() which calls PyObject_FREE().
msg191176 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-15 00:13
pymem_debugcheckgil.patch: oops, a forgot a change in pgenmain.c. New fixed patch attached.
msg191179 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-15 00:36
See also issue #16742: "PyOS_Readline drops GIL and calls PyOS_StdioReadline, which isn't thread safe".
msg191180 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-15 00:44
malloc_init.patch: Patch for functions called at Python initialization.

This patch is not complete: to parse "-X" option (PySys_AddXOption) and "-W" (PySys_AddWarnOption), PyMem_Malloc() and PyObject_Malloc() are still called indirectly. Fixing this issue may need to reorganize completly how Python is initialized, because we need to have basic Python types (ex: Unicode) to be ready to be able to parse the command line. But we cannot initialize too much because the Python initialization also depends on options from the command line...
msg191182 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-15 01:19
malloc_modules.patch: Patch for different Python modules.
msg191183 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-06-15 01:35
Ok, initial patches are attached. Let describe them a little bit.

- pymem_debugcheckgil-2.patch: I don't think that this patch can be commited before the "bootstrap" issue is solved (Python doesn't start in debug mode with this patch when -W or -X command line option is used). I wrote it to check that other patches are correct (check if the GIL is held when PyMem_Malloc is called).

- py_finalize.patch: should be safe

- malloc_init.patch: change _Py_char2wchar() API, the result must now be freed by PyMem_RawFree() instead of PyMem_Free(). the change does not hurt in release mode (both functions are just wrapper to free()), but may break in debug mode (because python checks that PyMem_RawFree() is called on a buffer allocated by PyMem_RawMalloc()) in extension modules using _Py_char2wchar(). I would like to _Py_char2wchar() API (not to solve this issue, just because it is very useful for applications embedding Python), so changing its name would avoid a crash (applications would get a compilation or link error instead). Other change of the patch: replace free() and PyMem_Free() with PyMem_RawFree(), should be safe.

- malloc_modules.patch: I replaced many malloc() with PyMem_Malloc(), which is not safe.

All these patches must be reviewed carefully to check if the GIL is held or not, and tested with pymem_debugcheckgil-2.patch (on Windows too! some patched functions os posixmodule.c are only compiled on Windows).
msg191195 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2013-06-15 04:06
Note that CPython's main function accesses the C API before calling Py_Initialize(). This is insane, but fixing it is hard (and one of the main goals of PEP 432).

I suggest using Py_IsInitialized() to exclude those from your debug checks for now.
msg192559 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 14:26
New changeset 213d6d7f5979 by Victor Stinner in branch 'default':
Issue #18203: Fix Py_Finalize(): destroy the GIL after the last call to
http://hg.python.org/cpython/rev/213d6d7f5979

New changeset 18bb92b0c458 by Victor Stinner in branch 'default':
Issue #18203: Replace malloc() with PyMem_RawMalloc() at Python initialization
http://hg.python.org/cpython/rev/18bb92b0c458

New changeset 41ef797e6639 by Victor Stinner in branch 'default':
Issue #18203: Replace malloc() with PyMem_Malloc() in Python modules
http://hg.python.org/cpython/rev/41ef797e6639
msg192563 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 14:36
New changeset fc01f9497da7 by Victor Stinner in branch 'default':
Issue #18203: Fix decode_ascii_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar()
http://hg.python.org/cpython/rev/fc01f9497da7
msg192572 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 15:26
New changeset 638d43665356 by Victor Stinner in branch 'default':
Issue #18203: Replace malloc() with PyMem_Malloc() in _ssl for the password
http://hg.python.org/cpython/rev/638d43665356

New changeset 9af1905f20af by Victor Stinner in branch 'default':
Issue #18203: Replace malloc() with PyMem_RawMalloc() to allocate thread locks
http://hg.python.org/cpython/rev/9af1905f20af

New changeset fb7d346b45fa by Victor Stinner in branch 'default':
Issue #18203: Replace malloc() with PyMem_Malloc() to allocate arena objects
http://hg.python.org/cpython/rev/fb7d346b45fa

New changeset 10db0c67fc72 by Victor Stinner in branch 'default':
Issue #18203:  Replace malloc() with PyMem_Malloc() in _PySequence_BytesToCharpArray()
http://hg.python.org/cpython/rev/10db0c67fc72
msg192591 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 20:58
New changeset 31a635303e55 by Victor Stinner in branch 'default':
Issue #18203: Fix _Py_DecodeUTF8_surrogateescape(), use PyMem_RawMalloc() as _Py_char2wchar()
http://hg.python.org/cpython/rev/31a635303e55
msg192596 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-07 21:32
New changeset 65f2c92ed079 by Victor Stinner in branch 'default':
Issue #18203: Add _PyMem_RawStrdup() and _PyMem_Strdup()
http://hg.python.org/cpython/rev/65f2c92ed079
msg192597 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2013-07-07 21:33
Ok, with the last changeset, I consider the issue has fixed. There is no more direct call to malloc() in the Python source code.
msg193766 - (view) Author: Roundup Robot (python-dev) Date: 2013-07-27 00:39
New changeset 1cba6687993e by Victor Stinner in branch 'default':
Issue #18203: Replace PyMem_Malloc() with PyMem_RawMalloc() at Python initialization
http://hg.python.org/cpython/rev/1cba6687993e
History
Date User Action Args
2013-07-27 00:39:51python-devsetmessages: + msg193766
2013-07-07 21:33:47hayposetstatus: open -> closed
resolution: fixed
messages: + msg192597
2013-07-07 21:32:57python-devsetmessages: + msg192596
2013-07-07 20:58:06python-devsetmessages: + msg192591
2013-07-07 15:26:32python-devsetmessages: + msg192572
2013-07-07 14:36:07python-devsetmessages: + msg192563
2013-07-07 14:26:03python-devsetnosy: + python-dev
messages: + msg192559
2013-06-17 04:40:03trentsetnosy: + trent
2013-06-16 07:32:26alilessetnosy: + aliles
2013-06-15 23:28:38hayposettitle: Replace calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc() -> Replace direct calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc()
2013-06-15 23:28:04hayposettitle: Replace calls to malloc() with PyMem_Malloc() -> Replace calls to malloc() with PyMem_Malloc() or PyMem_RawMalloc()
2013-06-15 04:06:47ncoghlansetmessages: + msg191195
2013-06-15 01:35:21hayposetmessages: + msg191183
2013-06-15 01:19:49hayposetfiles: + malloc_modules.patch

messages: + msg191182
2013-06-15 00:44:07hayposetfiles: + malloc_init.patch

messages: + msg191180
2013-06-15 00:36:28hayposetmessages: + msg191179
2013-06-15 00:13:03hayposetfiles: + pymem_debugcheckgil-2.patch

messages: + msg191176
2013-06-15 00:10:11hayposetfiles: + py_finalize.patch

messages: + msg191175
2013-06-15 00:07:37hayposetfiles: + pymem_debugcheckgil.patch
keywords: + patch
messages: + msg191174
2013-06-14 23:05:45hayposetmessages: + msg191169
2013-06-14 09:12:30kristjan.jonssonsetfiles: + ccpmem_patch.h

messages: + msg191111
2013-06-13 22:07:03hayposetmessages: + msg191093
2013-06-13 17:27:30georg.brandlsetnosy: + georg.brandl
messages: + msg191086
2013-06-13 12:15:18christian.heimessetnosy: + christian.heimes
messages: + msg191082
2013-06-13 11:32:02hayposetmessages: + msg191080
2013-06-13 11:14:12serhiy.storchakasetnosy: + serhiy.storchaka

messages: + msg191078
stage: needs patch
2013-06-13 10:52:39haypocreate