This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: feature: support pymalloc for subinterpreters. each subinterpreter has pymalloc_state
Type: Stage: patch review
Components: Subinterpreters Versions: Python 3.10
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: JunyiXie, methane, nascheme, vstinner
Priority: normal Keywords: patch

Created on 2021-02-24 09:43 by JunyiXie, last changed 2022-04-11 14:59 by admin.

Pull Requests
URL Status Linked Edit
PR 24857 open JunyiXie, 2021-03-14 12:35
Messages (10)
msg387614 - (view) Author: junyixie (JunyiXie) * Date: 2021-02-24 09:43
move pymalloc state in obmalloc.h
_is add pymalloc_state
pymalloc_allocxx api use subinterpreter pymalloc_state
msg388668 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-14 12:34
Made two changes:
1. support pymalloc for subinterpreters. each subinterpreter has pymalloc_state

2. _copy_raw_string api alloc memory use PyMem_RawFree and PyMem_RawMalloc.

I extend _xxsubinterpretermodule.c to support call any function in sub interpreter. 
when i need return result from sub interpreter call. 

1. i need create item->name in shared item. will use pymem_xxx api to manage memory. when with_pymalloc macro defined, it will create memory and bound to interpreter(iterp1) pymalloc state.

2. after switch interpreter state, now in iterp2 state, get return value from shareditem, and i need free shared item. but item->name memory managed by interp1 pymalloc state. if i want to free them, i need switch to interpreter state 1.  it's complicated. to implementation it, we need save interpid in shared item.

so i think, in _sharednsitem_init _copy_raw_string, need malloc by PyMem_RawAPI. easy to management.

static int
_sharednsitem_init(struct _sharednsitem *item, PyObject *key, PyObject *value)
    item->name = _copy_raw_string(key);

_sharedns *result_shread = _sharedns_new(1);

    // Switch to interpreter.
    PyThreadState *new_tstate = PyInterpreterState_ThreadHead(interp);
    PyThreadState *save1 = PyEval_SaveThread();

    // Switch to interpreter.
    PyThreadState *save_tstate = NULL;
    if (interp != PyInterpreterState_Get()) {
        // XXX Using the "head" thread isn't strictly correct.
        PyThreadState *tstate = PyInterpreterState_ThreadHead(interp);
        // XXX Possible GILState issues?
        save_tstate = PyThreadState_Swap(tstate);
    PyObject *module = PyImport_ImportModule(PyUnicode_AsUTF8(module_name));
    PyObject *function = PyObject_GetAttr(module, function_name);
    result = PyObject_Call(function, args, kwargs);

    if (result == NULL) {
        // exception handler

    if (result && _sharednsitem_init(&result_shread->items[0], PyUnicode_FromString("result"), result) != 0) {
        PyErr_Format(RunFailedError, "interp_call_function result convert to shared failed");
        return NULL;;
    // Switch back.
    // Switch back.
    if (save_tstate != NULL) {
    // ...

    if (result) {
        result = _PyCrossInterpreterData_NewObject(&result_shread->items[0].data);
msg388670 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-14 12:35
github pr
msg388671 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-14 12:35
msg388734 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-15 13:03
There is a problem:
if we bound pymalloc state with a interpreter.
malloc pointer in interpreterA and free pointer is usual.

it's cause a problem. 
when we use PyObject_Free, 
1. we look up address in pymalloc pool.
2. if not find, current code will call PyMem_RawFree(p) to free. it will cause crash.(address is pymalloc_alloc from another interpreter)

I think it has two way to slove this problem:
1. free/alloc memory in one interpreter. Frequent switch interpreter affects performance
2. when free memory address, find this address in all interpreter pymalloc pool. and free it.(but it need add lock to pymalloc)
msg388735 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-15 13:04
> malloc pointer in interpreterA and free pointer is usual.

malloc pointer in interpreterA and free pointer in interpreterB is usual.
msg388736 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-15 13:06
by the way, 
There is no operation to destroy the memory pool in the cpython code. Repeated creation of the pymalloc pool will cause memory leaks.
msg388737 - (view) Author: junyixie (JunyiXie) * Date: 2021-03-15 13:09
> 2. when free memory address, find this address in all interpreter pymalloc pool. and free it.(but it need add lock to pymalloc)

when finalize_interp_delete, we need keep interpreter pymalloc pool in linked list.It will be used when search memory in pymalloc pools.
msg388743 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-15 14:37
I'm not sure that it's needed to have a "per interpreter" allocator. The needed feature is to be able to call PyMem_Malloc() in parallel in different threads. If I understood correctly, the glibc malloc has a per-thread fast allocator (no locking) and then falls back to a slow allocator (locking) if the fast allocator failed. Maybe pymalloc could have per-thread memory arenas.

When I implemented the PEP 587, I spend a significant amount of time to avoid using pymalloc before Py_Initialize() is called: only use PyMem_RawMalloc() before Py_Initialize().

But I'm not 100% sure that pymalloc is not used before Py_Initialize() nor *after* Py_Finalize(). For example, we should check if a daemon thread can call PyMem_Malloc() after Py_Finalize(), even if they are supposed to exit as soon as they try to acquire the GIL, even the GIL must be held to use pymalloc (to use PyMem_Malloc and PyObject_Malloc):

See also bpo-37448:
"Add radix tree implementation for obmalloc address_in_range()"
msg388745 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-03-15 14:39
The current workaround is to disable pymalloc when Python is built with EXPERIMENTAL_ISOLATED_SUBINTERPRETERS:

_PyPreConfig_InitCompatConfig(PyPreConfig *config):

    /* bpo-40512: pymalloc is not compatible with subinterpreters,
       force usage of libc malloc() which is thread-safe. */
#ifdef Py_DEBUG
    config->allocator = PYMEM_ALLOCATOR_MALLOC_DEBUG;
    config->allocator = PYMEM_ALLOCATOR_MALLOC;
Date User Action Args
2022-04-11 14:59:41adminsetgithub: 87479
2021-03-15 14:39:50vstinnersetmessages: + msg388745
2021-03-15 14:38:30vstinnersetnosy: + nascheme, methane
2021-03-15 14:37:54vstinnersetmessages: + msg388743
2021-03-15 13:09:24JunyiXiesetmessages: + msg388737
2021-03-15 13:06:43JunyiXiesetmessages: + msg388736
2021-03-15 13:04:20JunyiXiesetmessages: + msg388735
2021-03-15 13:03:16JunyiXiesetmessages: + msg388734
2021-03-14 12:35:40JunyiXiesetmessages: + msg388671
2021-03-14 12:35:30JunyiXiesetkeywords: + patch

stage: patch review
messages: + msg388670
pull_requests: + pull_request23617
2021-03-14 12:34:49JunyiXiesetnosy: + vstinner
messages: + msg388668
2021-02-24 09:43:34JunyiXiesetversions: + Python 3.10
2021-02-24 09:43:29JunyiXiecreate