This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Possible PyUnicode_InternInPlace() edge-case bug
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: larry, methane, xiang.zhang
Priority: low Keywords:

Created on 2016-10-10 15:41 by larry, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (4)
msg278421 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-10-10 15:41
A visual inspection of PyUnicode_InternInPlace() suggests there might be a rare edge-case bug lurking there.

Specifically, the bug has to do with "interned mortal" strings.  Interned mortal strings are stored in the static "interned" dictionary, with their key and value set to the same string.  (If "x" is interned, then in the interned dictionary "x" is set to "x".)  Under normal circumstances, these two references would keep the string alive forever.  However, in order for the string to be *mortal*, the unicode object then DECREFs the string twice so the references in the "interned" dict no longer "count".  When the refcnt reaches 0, and the string is being destroyed, unicode_dealloc() temporarily resurrects the object, bumping its reference count up to 3.  It then removes the string from the interned dict and destroys it as normal.

The actual intern substitution happens in a function called PyUnicode_InternInPlace().  This function looks the string up in the "interned" dictionary, and if the string is there it substitutes in the interned version from the dictionary.  There's a curious comment in that function:

    /* It might be that the GetItem call fails even
       though the key is present in the dictionary,
       namely when this happens during a stack overflow. */
    Py_ALLOW_RECURSION
    t = PyDict_GetItem(interned, s);
    Py_END_ALLOW_RECURSION

If t is NULL, then PyUnicode_InternInPlace() goes on to set t in the interned dictionary, reduces the refcnt by 2, etc etc.

The problem: if t is in the dictionary, but PyDict_GetItem() fails (during a stack overflow?), then the subsequent PyDict_SetItem() will *overwrite* the existing interned string.  Which means the "interned" dict will drop its two references, which means two REFCNTs.  If there were only 1 or 2 extant references to this string, it means the string will be immediately dealloc'd, *even though the string was still in use*.

I suppose this is already such a speculative condition that it's probably not worrying over.  If PyDict_GetItem fails due to stack overflow, perhaps the Python process is doomed to fail soon anyway.  Perhaps PyDict_SetItem has no chance of working.  But it *seems* like this code is intended to function correctly even when PyDict_GetItem fails due to stack overflow, and I suspect it won't.
msg278424 - (view) Author: Xiang Zhang (xiang.zhang) * (Python committer) Date: 2016-10-10 16:00
The code has already been changed in #27454.
msg278426 - (view) Author: Larry Hastings (larry) * (Python committer) Date: 2016-10-10 16:09
Ah, indeed.  My mistake--I'm working in the Gilectomy branch, which is hilariously out of date.  (It's stuck in February 2016.)

The new version using PyDict_SetDefault() won't have *this* specific problem.  Perhaps later I'll read the new code and see if I can spot a similar problem.  But for now it only makes sense to close this issue.
msg278427 - (view) Author: Inada Naoki (methane) * (Python committer) Date: 2016-10-10 16:28
I think the comment described "Why Py_ALLOW_RECURSION is required", not "t may be NULL even if s is in the intern dict".

> If PyDict_GetItem fails due to stack overflow, perhaps the Python process is doomed to fail soon anyway.

When I changed the code to use PyDict_SetDefault(), I found "USE_STACKCHECK" macro.
See https://docs.python.org/3/c-api/sys.html#c.PyOS_CheckStack

But I don't know this can be really happend. Interned dict only contains exact unicode objects. There is no chance for
calling user defined __hash__() or __eq__().
History
Date User Action Args
2022-04-11 14:58:38adminsetgithub: 72592
2016-10-10 16:28:24methanesetnosy: + methane
messages: + msg278427
2016-10-10 16:09:39larrysetstatus: open -> closed
resolution: fixed
messages: + msg278426

stage: test needed -> resolved
2016-10-10 16:00:05xiang.zhangsetnosy: + xiang.zhang
messages: + msg278424
2016-10-10 15:41:22larrycreate