Issue 42540: Debug pymalloc crash when using os.fork() [regression]

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/86706

classification

Title:	Debug pymalloc crash when using os.fork() [regression]
Type:	crash	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.10, Python 3.9, Python 3.8

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	CendioOssman, colesbury, lukasz.langa, miss-islington, stestagg, vstinner
Priority:	normal	Keywords:	patch

Created on 2020-12-02 14:22 by CendioOssman, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 29564	merged	colesbury, 2021-11-15 20:33
PR 29599	merged	miss-islington, 2021-11-17 20:51
PR 29600	merged	colesbury, 2021-11-17 21:04

Messages (9)
msg382314 - (view)	Author: Pierre Ossman (CendioOssman)	Date: 2020-12-02 14:22
A python equivalent of the classical daemon() call started throwing an error from 3.8 when the debug hooks are active for pymalloc. If the tracing is also active it segfaults. This simple example triggers it: import os def daemon(): pid = os.fork() if pid != 0: os._exit(0) daemon() The error given is: Debug memory block at address p=0xf013d0: API '1' 0 bytes originally requested The 7 pad bytes at p-7 are not all FORBIDDENBYTE (0xfd): at p-7: 0x00 * OUCH at p-6: 0x00 * OUCH at p-5: 0x00 * OUCH at p-4: 0x00 * OUCH at p-3: 0x00 * OUCH at p-2: 0x00 * OUCH at p-1: 0x00 * OUCH Because memory is corrupted at the start, the count of bytes requested may be bogus, and checking the trailing pad bytes may segfault. The 8 pad bytes at tail=0xf013d0 are not all FORBIDDENBYTE (0xfd): at tail+0: 0x01 * OUCH at tail+1: 0x00 * OUCH at tail+2: 0x00 * OUCH at tail+3: 0x00 * OUCH at tail+4: 0x00 * OUCH at tail+5: 0x00 * OUCH at tail+6: 0x00 * OUCH at tail+7: 0x00 *** OUCH Enable tracemalloc to get the memory block allocation traceback Fatal Python error: bad ID: Allocated using API '1', verified using API 'r' Python runtime state: finalizing (tstate=0xf023b0) Tested on Fedora, Ubuntu and RHEL with the same behaviour everwhere. Everything up to 3.8 works fine. 3.8 and 3.9 both exhibit the issue. Since this is a very standard way of starting a daemon it should affect quite a few users. At the very least it makes it annoying to use development mode to catch other issues.
msg382612 - (view)	Author: Steve Stagg (stestagg)	Date: 2020-12-06 22:10
Reproducible on master, but doesn't occur with a debug build. My configure: > ./configure --prefix=$PREFIX --exec-prefix=$EPREFIX --cache-file=../config.cache --without-ensurepip > /dev/null > PYTHONMALLOC=pymalloc_debug $EPREFIX/bin/python3 ../test.py Crash comes out of the child process, I modified example to invoke lldb for child: ``` import os import time import subprocess def daemon(): pid = os.fork() if pid != 0: subprocess.run(["lldb", '-p', str(pid)]) time.sleep(10) os._exit(0) time.sleep(1) daemon() ``` Attaching to the child gives this backtrace: (lldb) bt * thread #1, name = 'python3', stop reason = signal SIGSEGV: invalid address (fault address: 0x0) frame #0: 0x0000564a5a325f9c python3.10`_PyObject_DebugDumpAddress(p=0x0000564a5aa2c630) at obmalloc.c:2457:12 frame #1: 0x0000564a5a326571 python3.10`_PyMem_DebugCheckAddress(func="_PyMem_DebugRawFree", api=<unavailable>, p=<unavailable>) at obmalloc.c:2375:9 * frame #2: 0x0000564a5a326d29 python3.10`_PyMem_DebugRawFree at obmalloc.c:2208:5 frame #3: 0x0000564a5a326d17 python3.10`_PyMem_DebugRawFree(ctx=0x0000564a5a5d37e0, p=0x0000564a5aa2c630) at obmalloc.c:2197 frame #4: 0x0000564a5a3d4aa3 python3.10`PyInterpreterState_Delete(interp=0x0000564a5aa4e6b0) at pystate.c:396:9 frame #5: 0x0000564a5a3d1921 python3.10`Py_FinalizeEx [inlined] finalize_interp_delete(tstate=<unavailable>) at pylifecycle.c:1633:5 frame #6: 0x0000564a5a3d190b python3.10`Py_FinalizeEx at pylifecycle.c:1782 frame #7: 0x0000564a5a2dab80 python3.10`Py_BytesMain at main.c:679:9 frame #8: 0x0000564a5a2dab69 python3.10`Py_BytesMain at main.c:707 frame #9: 0x0000564a5a2dab4f python3.10`Py_BytesMain(argc=<unavailable>, argv=<unavailable>) at main.c:731 frame #10: 0x00007ff2e6251152 libc.so.6`__libc_start_main + 242 frame #11: 0x0000564a5a2d978e python3.10`_start + 46 The call stack is trying to free an interpreter mutex: PyThread_free_lock(interp->id_mutex);
msg382613 - (view)	Author: Steve Stagg (stestagg)	Date: 2020-12-06 22:34
Something seems to be: breaking alignment, changing the allocator used, or trashing that memory. In my case, the address of the mutex is: 0x5603a3666630 (1) below and the surrounding memory: 0x5603a3666610: n\0\0\03\0\0\0\0\0\0\0(2)\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd1\0\0\0\0\0\0\0 0x5603a3666630: (1)\x01\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd\xdd 0x5603a3666650: \xdd\xdd\xdd\xdd\xdd\xdd\xdd\xddQ\0\0\0\0\0\0\0\x90ds\xa3\x03V\0\0\xf0\xc4\xfbq\a\x7f\0\0 The fun thing is that the nbytes field lines up at (2) which is full of DEADBYTEs, and thus tries to find the tail of the allocation at p + 15987178197214944733 (which happily segfaults)
msg382614 - (view)	Author: Steve Stagg (stestagg)	Date: 2020-12-06 23:01
So, I'm not an allocator/init/teardown expert, but it looks like: When you fork, PyRuntimeState creates a new mutex, explicitly using the default allocator (without the debug allocator active).. #ifdef HAVE_FORK /* This function is called from PyOS_AfterFork_Child to ensure that newly created child processes do not share locks with the parent. / PyStatus _PyRuntimeState_ReInitThreads(_PyRuntimeState runtime) { // This was initially set in _PyRuntimeState_Init(). runtime->main_thread = PyThread_get_thread_ident(); /* Force default allocator, since _PyRuntimeState_Fini() must use the same allocator than this function. */ PyMemAllocatorEx old_alloc; _PyMem_SetDefaultAllocator(PYMEM_DOMAIN_RAW, &old_alloc); int reinit_interp = _PyThread_at_fork_reinit(&runtime->interpreters.mutex); int reinit_main_id = _PyThread_at_fork_reinit(&runtime->interpreters.main->id_mutex); int reinit_xidregistry = _PyThread_at_fork_reinit(&runtime->xidregistry.mutex); PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &old_alloc); if (reinit_interp < 0 \|\| reinit_main_id < 0 \|\| reinit_xidregistry < 0) { return _PyStatus_ERR("Failed to reinitialize runtime locks"); } return _PyStatus_OK(); } #endif But the PyInterpreterState_Delete function does not do this: if (interp->id_mutex != NULL) { PyThread_free_lock(interp->id_mutex); } Which causes it to try to use the debug allocator to free, and hence crash..
msg406492 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-11-17 20:51
New changeset 736684b1bb67369a2e95a9f621752deead44e7ef by Sam Gross in branch 'main': bpo-42540: reallocation of id_mutex should not force default allocator (GH-29564) https://github.com/python/cpython/commit/736684b1bb67369a2e95a9f621752deead44e7ef
msg406493 - (view)	Author: miss-islington (miss-islington)	Date: 2021-11-17 21:16
New changeset 1079b3e3cb3eba7062e174ecc6c0ab20c2d0722e by Miss Islington (bot) in branch '3.10': bpo-42540: reallocation of id_mutex should not force default allocator (GH-29564) https://github.com/python/cpython/commit/1079b3e3cb3eba7062e174ecc6c0ab20c2d0722e
msg406503 - (view)	Author: Łukasz Langa (lukasz.langa) *	Date: 2021-11-17 22:54
New changeset 87787c8774221c81602b31b0e0dc0678ad494e91 by Sam Gross in branch '3.9': [3.9] bpo-42540: reallocation of id_mutex should not force default allocator (GH-29564) (GH-29600) https://github.com/python/cpython/commit/87787c8774221c81602b31b0e0dc0678ad494e91
msg406505 - (view)	Author: Łukasz Langa (lukasz.langa) *	Date: 2021-11-17 23:01
Thanks, Sam! ✨ 🍰 ✨
msg406533 - (view)	Author: STINNER Victor (vstinner) *	Date: 2021-11-18 12:42
Do you think that it would be worth it to dump the memory allocation when a Fatal Python error related to a memory error is triggered?

History
Date	User	Action	Args
2022-04-11 14:59:38	admin	set	github: 86706
2021-11-18 12:42:27	vstinner	set	messages: + msg406533
2021-11-17 23:01:52	lukasz.langa	set	status: open -> closed resolution: fixed messages: + msg406505 stage: patch review -> resolved
2021-11-17 22:54:13	lukasz.langa	set	nosy: + lukasz.langa messages: + msg406503
2021-11-17 21:16:10	miss-islington	set	messages: + msg406493
2021-11-17 21:04:54	colesbury	set	pull_requests: + pull_request27843
2021-11-17 20:51:46	vstinner	set	nosy: + vstinner messages: + msg406492
2021-11-17 20:51:16	miss-islington	set	nosy: + miss-islington pull_requests: + pull_request27842
2021-11-15 20:33:05	colesbury	set	keywords: + patch nosy: + colesbury pull_requests: + pull_request27812 stage: patch review
2020-12-06 23:01:11	stestagg	set	messages: + msg382614
2020-12-06 22:34:49	stestagg	set	messages: + msg382613
2020-12-06 22:11:00	stestagg	set	nosy: + stestagg messages: + msg382612
2020-12-02 14:22:16	CendioOssman	create