classification
Title: Debug pymalloc crash when using os.fork() [regression]
Type: crash Stage:
Components: Interpreter Core Versions: Python 3.10, Python 3.9, Python 3.8
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: CendioOssman, stestagg
Priority: normal Keywords:

Created on 2020-12-02 14:22 by CendioOssman, last changed 2020-12-06 23:01 by stestagg.

Messages (4)
msg382314 - (view) Author: Pierre Ossman (CendioOssman) Date: 2020-12-02 14:22
A python equivalent of the classical daemon() call started throwing an error from 3.8 when the debug hooks are active for pymalloc. If the tracing is also active it segfaults.

This simple example triggers it:


import os

def daemon():
    pid = os.fork()
    if pid != 0:
        os._exit(0)

daemon()


The error given is:

Debug memory block at address p=0xf013d0: API '1'
    0 bytes originally requested
    The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd):
        at p-7: 0x00 *** OUCH
        at p-6: 0x00 *** OUCH
        at p-5: 0x00 *** OUCH
        at p-4: 0x00 *** OUCH
        at p-3: 0x00 *** OUCH
        at p-2: 0x00 *** OUCH
        at p-1: 0x00 *** OUCH
    Because memory is corrupted at the start, the count of bytes requested
       may be bogus, and checking the trailing pad bytes may segfault.
    The 8 pad bytes at tail=0xf013d0 are not all FORBIDDENBYTE (0xfd):
        at tail+0: 0x01 *** OUCH
        at tail+1: 0x00 *** OUCH
        at tail+2: 0x00 *** OUCH
        at tail+3: 0x00 *** OUCH
        at tail+4: 0x00 *** OUCH
        at tail+5: 0x00 *** OUCH
        at tail+6: 0x00 *** OUCH
        at tail+7: 0x00 *** OUCH

Enable tracemalloc to get the memory block allocation traceback

Fatal Python error: bad ID: Allocated using API '1', verified using API 'r'
Python runtime state: finalizing (tstate=0xf023b0)



Tested on Fedora, Ubuntu and RHEL with the same behaviour everwhere. Everything up to 3.8 works fine. 3.8 and 3.9 both exhibit the issue.

Since this is a very standard way of starting a daemon it should affect quite a few users. At the very least it makes it annoying to use development mode to catch other issues.
msg382612 - (view) Author: Steve Stagg (stestagg) Date: 2020-12-06 22:10
Reproducible on master, but doesn't occur with a debug build.

My configure:

> ./configure --prefix=$PREFIX --exec-prefix=$EPREFIX --cache-file=../config.cache --without-ensurepip > /dev/null
> PYTHONMALLOC=pymalloc_debug $EPREFIX/bin/python3 ../test.py

Crash comes out of the child process, I modified example to invoke lldb for child:

```
import os
import time
import subprocess

def daemon():
    pid = os.fork()
    if pid != 0:
        subprocess.run(["lldb", '-p', str(pid)])
        time.sleep(10)
        os._exit(0)
    time.sleep(1)

daemon()
```


Attaching to the child gives this backtrace:

(lldb) bt
* thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
    frame #0: 0x0000564a5a325f9c python3.10`_PyObject_DebugDumpAddress(p=0x0000564a5aa2c630) at obmalloc.c:2457:12
    frame #1: 0x0000564a5a326571 python3.10`_PyMem_DebugCheckAddress(func="_PyMem_DebugRawFree", api=<unavailable>, p=<unavailable>) at obmalloc.c:2375:9
  * frame #2: 0x0000564a5a326d29 python3.10`_PyMem_DebugRawFree at obmalloc.c:2208:5
    frame #3: 0x0000564a5a326d17 python3.10`_PyMem_DebugRawFree(ctx=0x0000564a5a5d37e0, p=0x0000564a5aa2c630) at obmalloc.c:2197
    frame #4: 0x0000564a5a3d4aa3 python3.10`PyInterpreterState_Delete(interp=0x0000564a5aa4e6b0) at pystate.c:396:9
    frame #5: 0x0000564a5a3d1921 python3.10`Py_FinalizeEx [inlined] finalize_interp_delete(tstate=<unavailable>) at pylifecycle.c:1633:5
    frame #6: 0x0000564a5a3d190b python3.10`Py_FinalizeEx at pylifecycle.c:1782
    frame #7: 0x0000564a5a2dab80 python3.10`Py_BytesMain at main.c:679:9
    frame #8: 0x0000564a5a2dab69 python3.10`Py_BytesMain at main.c:707
    frame #9: 0x0000564a5a2dab4f python3.10`Py_BytesMain(argc=<unavailable>, argv=<unavailable>) at main.c:731
    frame #10: 0x00007ff2e6251152 libc.so.6`__libc_start_main + 242
    frame #11: 0x0000564a5a2d978e python3.10`_start + 46


The call stack is trying to free an interpreter mutex:
    PyThread_free_lock(interp->id_mutex);
msg382613 - (view) Author: Steve Stagg (stestagg) Date: 2020-12-06 22:34
Something seems to be: breaking alignment, changing the allocator used, or trashing that memory.

In my case, the address of the mutex is: 0x5603a3666630  (*1) below
and the surrounding memory:

0x5603a3666610: n\0\0\03\0\0\0\0\0\0\0(*2)\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd1\0\0\0\0\0\0\0
0x5603a3666630: (*1)\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd
0x5603a3666650: \xdd\xdd\xdd\xdd\xdd\xdd\xdd\xddQ\0\0\0\0\0\0\0\x90ds\xa3\x03V\0\0\xf0\xc4\xfbq\a\x7f\0\0

The fun thing is that the nbytes field lines up at (*2) which is full of DEADBYTEs, and thus tries to find the tail of the allocation at p + 15987178197214944733  (which happily segfaults)
msg382614 - (view) Author: Steve Stagg (stestagg) Date: 2020-12-06 23:01
So, I'm not an allocator/init/teardown expert, but it looks like: When you fork, PyRuntimeState creates a new mutex, explicitly using the default allocator (without the debug allocator active)..


#ifdef HAVE_FORK
/* This function is called from PyOS_AfterFork_Child to ensure that
   newly created child processes do not share locks with the parent. */
PyStatus
_PyRuntimeState_ReInitThreads(_PyRuntimeState *runtime)
{
    // This was initially set in _PyRuntimeState_Init().
    runtime->main_thread = PyThread_get_thread_ident();

    /* Force default allocator, since _PyRuntimeState_Fini() must
       use the same allocator than this function. */
    PyMemAllocatorEx old_alloc;
    _PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc);

    int reinit_interp = _PyThread_at_fork_reinit(&runtime->interpreters.mutex);
    int reinit_main_id = _PyThread_at_fork_reinit(&runtime->interpreters.main->id_mutex);
    int reinit_xidregistry = _PyThread_at_fork_reinit(&runtime->xidregistry.mutex);

    PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc);

    if (reinit_interp < 0
        || reinit_main_id < 0
        || reinit_xidregistry < 0)
    {
        return _PyStatus_ERR("Failed to reinitialize runtime locks");

    }
    return _PyStatus_OK();
}
#endif


But the PyInterpreterState_Delete function does not do this:

if (interp->id_mutex != NULL) {
    PyThread_free_lock(interp->id_mutex);
}


Which causes it to try to use the debug allocator to free, and hence crash..
History
Date User Action Args
2020-12-06 23:01:11stestaggsetmessages: + msg382614
2020-12-06 22:34:49stestaggsetmessages: + msg382613
2020-12-06 22:11:00stestaggsetnosy: + stestagg
messages: + msg382612
2020-12-02 14:22:16CendioOssmancreate