This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Function locals and evaluation stack should be stored in a contiguous, per-thread stack
Type: performance Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Mark.Shannon Nosy List: Mark.Shannon, erlendaasland, kj, pablogsal, vstinner
Priority: normal Keywords: patch

Created on 2021-05-04 14:54 by Mark.Shannon, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
fix.diff erlendaasland, 2021-05-21 11:35
Pull Requests
URL Status Linked Edit
PR 26076 merged Mark.Shannon, 2021-05-12 16:58
PR 26285 merged Mark.Shannon, 2021-05-21 16:16
PR 26291 merged pablogsal, 2021-05-21 17:30
PR 26771 merged Mark.Shannon, 2021-06-17 15:42
Messages (10)
msg392906 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-04 14:54
Currently, the local variables (inc. cell and free vars) for functions and the evaluation stack are kept in per-activation chunks in the frame object. This is not a good design for modern hardware.
The local variables and stack are amongst the hottest memory in the VM and should be stored in a contiguous stack in memory.

Allocating a per-thread stack would improve memory locality considerably, and pave the way to allocating heap objects for function activations lazily, only when needed for debugging and introspection.

Generators would still need heap allocated locals and stack, but that would be no worse than currently.
msg394104 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-21 11:35
Using Pablo's (or Victor's) reproducer from bpo-44184, I'm getting a crash, apparently because _PyThreadState_PushLocals() is called after PyThreadState_Clear().

In Python/pystate.c interpreter_clear(), we're first calling PyThreadState_Clear() (line 295), and a few lines later (line 326) we're running _PyGC_CollectNoFail(). Clearing tstate _after_ the last GC collect resolves the issue (see attached patch), but it might introduce other problems; I'm not at all familiar with this part of the codebase.


FYI, git HEAD is at b11a951f16f0603d98de24fee5c023df83ea552c on main.


=================================================================
==22475==ERROR: AddressSanitizer: heap-use-after-free on address 0x629000000220 at pc 0x00010c985aa8 bp 0x7ffee3bab620 sp 0x7ffee3bab618
WRITE of size 8 at 0x629000000220 thread T0
    #0 0x10c985aa7 in _PyThreadState_PushLocals pystate.c:2030
    #1 0x10c7fc8b6 in _PyEval_Vector ceval.c:5164
    #2 0x10c397ebe in _PyFunction_Vectorcall call.c:342
    #3 0x10c5591b2 in _PyObject_VectorcallTstate abstract.h:114
    #4 0x10c558f6b in PyObject_CallOneArg abstract.h:184
    #5 0x10c5586c9 in call_unbound_noarg typeobject.c:1625
    #6 0x10c57d578 in slot_tp_finalize typeobject.c:7762
    #7 0x10ca2ea0b in finalize_garbage gcmodule.c:982
    #8 0x10ca2694d in gc_collect_main gcmodule.c:1287
    #9 0x10ca25f9c in _PyGC_CollectNoFail gcmodule.c:2123
    #10 0x10c97abed in interpreter_clear pystate.c:326
    #11 0x10c97ae81 in _PyInterpreterState_Clear pystate.c:358
    #12 0x10c9697d9 in finalize_interp_clear pylifecycle.c:1634
    #13 0x10c96925b in Py_FinalizeEx pylifecycle.c:1812
    #14 0x10ca1f143 in Py_RunMain main.c:668
    #15 0x10ca203e4 in pymain_main main.c:696
    #16 0x10ca20694 in Py_BytesMain main.c:720
    #17 0x10c055a91 in main python.c:15
    #18 0x7fff20582f3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c)

0x629000000220 is located 32 bytes inside of 16384-byte region [0x629000000200,0x629000004200)
freed by thread T0 here:
    #0 0x10d518609 in wrap_free+0xa9 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x48609)
    #1 0x10c50a7fc in _PyObject_ArenaFree obmalloc.c:174
    #2 0x10c5098e7 in _PyObject_VirtualFree obmalloc.c:564
    #3 0x10c9808cf in PyThreadState_Clear pystate.c:904
    #4 0x10c97a178 in interpreter_clear pystate.c:295
    #5 0x10c97ae81 in _PyInterpreterState_Clear pystate.c:358
    #6 0x10c9697d9 in finalize_interp_clear pylifecycle.c:1634
    #7 0x10c96925b in Py_FinalizeEx pylifecycle.c:1812
    #8 0x10ca1f143 in Py_RunMain main.c:668
    #9 0x10ca203e4 in pymain_main main.c:696
    #10 0x10ca20694 in Py_BytesMain main.c:720
    #11 0x10c055a91 in main python.c:15
    #12 0x7fff20582f3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c)

previously allocated by thread T0 here:
    #0 0x10d5184c0 in wrap_malloc+0xa0 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x484c0)
    #1 0x10c50a7d8 in _PyObject_ArenaMalloc obmalloc.c:168
    #2 0x10c5098af in _PyObject_VirtualAlloc obmalloc.c:558
    #3 0x10c985bb7 in allocate_chunk pystate.c:617
    #4 0x10c97d6ee in new_threadstate pystate.c:678
    #5 0x10c97cb49 in PyThreadState_New pystate.c:706
    #6 0x10c96e26b in pycore_create_interpreter pylifecycle.c:614
    #7 0x10c96ca7c in pyinit_config pylifecycle.c:834
    #8 0x10c967c3a in pyinit_core pylifecycle.c:1003
    #9 0x10c967429 in Py_InitializeFromConfig pylifecycle.c:1188
    #10 0x10ca2409f in pymain_init main.c:66
    #11 0x10ca201ee in pymain_main main.c:687
    #12 0x10ca20694 in Py_BytesMain main.c:720
    #13 0x10c055a91 in main python.c:15
    #14 0x7fff20582f3c in start+0x0 (libdyld.dylib:x86_64+0x15f3c)

SUMMARY: AddressSanitizer: heap-use-after-free pystate.c:2030 in _PyThreadState_PushLocals
Shadow bytes around the buggy address:
  0x1c51fffffff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c5200000000: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c5200000010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c5200000020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c5200000030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c5200000040: fd fd fd fd[fd]fd fd fd fd fd fd fd fd fd fd fd
  0x1c5200000050: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c5200000060: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c5200000070: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c5200000080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c5200000090: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==22475==ABORTING
msg394105 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-05-21 11:36
cc. Pablo, Victor
msg394133 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-05-21 17:34
Mark, can you check the failure? Otherwise is going to start failing on the buildbots when we add the test case for the issue Erlend mentions.
msg394140 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-05-21 18:49
What's the test case, exactly?

ref.py for the other issue doesn't crash if I change "func.py" to "ref.py"
otherwise it just complains that "func.py" doesn't exist.
msg394141 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-05-21 18:55
Is the test added here:

https://github.com/python/cpython/pull/26274

import ast
import builtins
import sys
import os

tree = ast.parse("pass")
x = [tree]
x.append(x)

# Put the cycle somewhere to survive until the *last* GC collection
def callback(*args):
    pass
callback.a = x
os.register_at_fork(after_in_child=callback)


----------------


If this doesn't work maybe Erlend may help you reproducing this.
msg394306 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2021-05-25 09:56
bpo-44184 has been fixed, it was unrelated. Mark fixed a bug in commit af5d497f72ceaf3f207a8aded028607c4c46a993. It seems like all issues are solved, I close the issue.

Mark: please set the version field to 3.11 for work in 3.11.


Commits:

commit b11a951f16f0603d98de24fee5c023df83ea552c
Author: Mark Shannon <mark@hotpy.org>
Date:   Fri May 21 10:57:35 2021 +0100

    bpo-44032: Move data stack to thread from FrameObject. (GH-26076)
    
    * Remove 'zombie' frames. We won't need them once we are allocating fixed-size frames.
    
    * Add co_nlocalplus field to code object to avoid recomputing size of locals + frees + cells.
    
    * Move locals, cells and freevars out of frame object into separate memory buffer.
    
    * Use per-threadstate allocated memory chunks for local variables.
    
    * Move globals and builtins from frame object to per-thread stack.
    
    * Move (slow) locals frame object to per-thread stack.
    
    * Move internal frame functions to internal header.

commit 498383c019c1209f6fecf8f64ce44fbf437191da
Author: Pablo Galindo <Pablogsal@gmail.com>
Date:   Fri May 21 19:15:39 2021 +0100

    bpo-44032: Fix downcast conversion in frameobject.c (GH-26291)

commit af5d497f72ceaf3f207a8aded028607c4c46a993
Author: Mark Shannon <mark@hotpy.org>
Date:   Mon May 24 16:22:02 2021 +0100

    bpo-44032: Delay deletion of stack chunks until thread state is deleted. (GH-26285)
msg396037 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2021-06-18 10:00
New changeset 0982ded179f280176868c1c4eccf77bf70687816 by Mark Shannon in branch 'main':
bpo-44032: Move pointer to code object from frame-object to frame specials array. (GH-26771)
https://github.com/python/cpython/commit/0982ded179f280176868c1c4eccf77bf70687816
msg413721 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-22 17:33
Removing the PyFrameObject.f_code member (commit b11a951f16f0603d98de24fee5c023df83ea552c) broke by gevent project:

* https://github.com/gevent/gevent/issues/1867
* https://bugs.python.org/issue40421#msg413719
msg413799 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2022-02-23 15:14
I created bpo-46836: "[C API] Move PyFrameObject to the internal C API".
History
Date User Action Args
2022-04-11 14:59:45adminsetgithub: 88198
2022-02-23 15:14:24vstinnersetmessages: + msg413799
2022-02-22 17:33:39vstinnersetmessages: + msg413721
2021-06-18 10:00:49Mark.Shannonsetmessages: + msg396037
2021-06-17 15:42:20Mark.Shannonsetpull_requests: + pull_request25357
2021-05-25 09:56:55vstinnersetstatus: open -> closed
versions: + Python 3.11
messages: + msg394306

resolution: fixed
stage: patch review -> resolved
2021-05-21 18:55:49pablogsalsetmessages: + msg394141
2021-05-21 18:49:27Mark.Shannonsetmessages: + msg394140
2021-05-21 17:34:02pablogsalsetmessages: + msg394133
2021-05-21 17:30:50pablogsalsetpull_requests: + pull_request24895
2021-05-21 16:16:59Mark.Shannonsetpull_requests: + pull_request24891
2021-05-21 11:36:19erlendaaslandsetnosy: + vstinner, pablogsal
messages: + msg394105
2021-05-21 11:35:25erlendaaslandsetfiles: + fix.diff
nosy: + erlendaasland
messages: + msg394104

2021-05-12 16:58:20Mark.Shannonsetkeywords: + patch
stage: patch review
pull_requests: + pull_request24716
2021-05-04 15:00:30kjsetnosy: + kj
2021-05-04 14:54:54Mark.Shannonsettitle: Store locals and evaluation stack should be stored in a contiguous, per-thread stack -> Function locals and evaluation stack should be stored in a contiguous, per-thread stack
2021-05-04 14:54:31Mark.Shannoncreate