This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Title: _PyUnicode_Fini should invalidate ucnhash_capi capsule pointer
Type: crash Stage: patch review
Components: Interpreter Core Versions: Python 3.11, Python 3.10
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: christian.heimes, lukasz.langa, miss-islington, neonene, pablogsal, vstinner
Priority: high Keywords: patch

Created on 2022-03-31 13:34 by christian.heimes, last changed 2022-04-11 14:59 by admin.

File name Uploaded Description Edit
ucnbug.c christian.heimes, 2022-03-31 13:34
Pull Requests
URL Status Linked Edit
PR 32212 merged christian.heimes, 2022-03-31 13:40
PR 32216 merged christian.heimes, 2022-03-31 15:32
PR 32217 closed christian.heimes, 2022-03-31 15:32
PR 32313 open neonene, 2022-04-04 19:17
Messages (3)
msg416432 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-03-31 13:34
unicodeobject.c has a static pointer to a unicode name CAPI capsule:

   static _PyUnicode_Name_CAPI *ucnhash_capi = NULL;

The capsule is initialized on demand when the parser encounters a named unicode representation like "\N{digit nine}". Once the capsule pointer ucnhash_capi has been initialized, it is never reset. Not even a full interpreter shutdown invalidates the pointer.

A shutdown of the main interpreter with Py_Finalize() renders the pointer invalid. If the interpreter is re-initialized again, the invalid pointer causes a segfault. The problem was first discovered by Trey Hunner in

python.js:219 Uncaught RuntimeError: null function or function signature mismatch
    at _PyUnicode_DecodeUnicodeEscapeInternal (unicodeobject.c:6493:25)
    at decode_unicode_with_escapes (string_parser.c:121:13)
    at _PyPegen_parsestr (string_parser.c:273:1)
    at strings_rule (action_helpers.c:901:20)
    at atom_rule (parser.c:14293:27)
    at primary_rule (parser.c:13916:17)
    at await_primary_rule (parser.c:13666:17)
    at factor_rule (parser.c:13542:29)
    at term_rule (parser.c:13330:17)
    at sum_rule (parser.c:13044:17)

I can reproduce the issue with pure C code:

$ gcc -Xlinker -export-dynamic -g -IInclude/ -I. -o ucnbug ucnbug.c libpython3.11.a -lm -ldl
$ gdb ucnbug
(gdb) run


Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00000000005729a8 in _PyUnicode_DecodeUnicodeEscapeInternal (s=<optimized out>, s@entry=0x7fffea53b6d0 "\\N{digit nine}", size=<optimized out>, errors=errors@entry=0x0, 
    consumed=consumed@entry=0x0, first_invalid_escape=first_invalid_escape@entry=0x7fffffffc748) at Objects/unicodeobject.c:6490
#2  0x0000000000644fe3 in decode_unicode_with_escapes (parser=parser@entry=0x7fffea5e45d0, s=0x7fffea53b6d0 "\\N{digit nine}", s@entry=0x7fffea6af1d1 "\\N{digit nine}'", len=<optimized out>, 
    len@entry=14, t=t@entry=0x7fffea606910) at Parser/string_parser.c:118
#3  0x0000000000645675 in _PyPegen_parsestr (p=p@entry=0x7fffea5e45d0, bytesmode=bytesmode@entry=0x7fffffffc838, rawmode=rawmode@entry=0x7fffffffc83c, result=result@entry=0x7fffffffc848, 
    fstr=fstr@entry=0x7fffffffc850, fstrlen=fstrlen@entry=0x7fffffffc858, t=0x7fffea606910) at Parser/string_parser.c:269
#4  0x0000000000644163 in _PyPegen_concatenate_strings (p=p@entry=0x7fffea5e45d0, strings=strings@entry=0x94e310) at Parser/action_helpers.c:896
#5  0x00000000004791e6 in strings_rule (p=p@entry=0x7fffea5e45d0) at Parser/parser.c:15463
#6  0x000000000047c498 in atom_rule (p=p@entry=0x7fffea5e45d0) at Parser/parser.c:14274
#7  0x000000000047e159 in primary_raw (p=0x7fffea5e45d0) at Parser/parser.c:13908
#8  primary_rule (p=p@entry=0x7fffea5e45d0) at Parser/parser.c:13706
msg416440 - (view) Author: miss-islington (miss-islington) Date: 2022-03-31 15:15
New changeset 44e915028d75f7cef141aa1aada962465a5907d6 by Christian Heimes in branch 'main':
bpo-47182: Fix crash by named unicode characters after interpreter reinitialization (GH-32212)
msg416474 - (view) Author: Christian Heimes (christian.heimes) * (Python committer) Date: 2022-04-01 08:45
New changeset 55d5c96c57738766eb6f3b5ccfa6599d5f094c18 by Christian Heimes in branch '3.10':
[3.10] bpo-47182: Fix crash by named unicode characters after interpreter reinitialization (GH-32212) (GH-32216)
Date User Action Args
2022-04-11 14:59:57adminsetgithub: 91338
2022-04-04 19:17:34neonenesetnosy: + neonene
pull_requests: + pull_request30375
2022-04-01 08:45:05christian.heimessetmessages: + msg416474
2022-03-31 15:40:29christian.heimessetversions: - Python 3.7, Python 3.8, Python 3.9
2022-03-31 15:32:14christian.heimessetpull_requests: + pull_request30293
2022-03-31 15:32:03christian.heimessetpull_requests: + pull_request30292
2022-03-31 15:15:01miss-islingtonsetnosy: + miss-islington
messages: + msg416440
2022-03-31 13:40:06christian.heimessetkeywords: + patch
stage: patch review
pull_requests: + pull_request30288
2022-03-31 13:34:15christian.heimescreate