Title: Stop using PyDict_GetItem, PyDict_GetItemString and _PyDict_GetItemId
Type: Stage: patch review
Components: Extension Modules, Interpreter Core Versions: Python 3.10
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: hongweipeng, serhiy.storchaka, vstinner
Priority: normal Keywords: patch

Created on 2020-10-11 14:36 by serhiy.storchaka, last changed 2020-11-24 07:31 by hongweipeng.

Pull Requests
URL Status Linked Edit
PR 22648 merged serhiy.storchaka, 2020-10-11 14:43
PR 23487 closed hongweipeng, 2020-11-24 07:31
Messages (6)
msg378437 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-10-11 14:36
There are design flaws in PyDict_GetItem(), PyDict_GetItemString() and _PyDict_GetItemId().

1. They suppress all exceptions raised internally. Most common of are MemoryError, KeyboardInterrupt and RecursionError whose raising is out of control of the programmer. When an exception is suppressed the function returns incorrect result. For example PyDict_GetItem() can return NULL for existing key if the user press Ctrl-C.

This situation seems impossible if the key and all keys in the dict are exact strings, but we cannot be sure.

2. PyDict_GetItem() preserves the current exception, and therefore can be used in handling exception but PyDict_GetItemString() and _PyDict_GetItemId() clear it if the creation of string object is failed (due to MemoryError or, less likely, UnicodeDecodeError). It can leads to very unexpected result.

Most of uses of PyDict_GetItem() etc were fixed in issue35459 and other issues. But some occurrences were left, in cases when there is no any error handling, either errors are not expected or all exceptions are suppressed in any case. It is mainly in the compiler and symtable (where we look up in just created dicts), in structseq.c and in the sys module.

The proposed PR (it can be split on several smaller PRs if needed, see also issue41993 and issue42002) removes all calls of PyDict_GetItem, PyDict_GetItemString and _PyDict_GetItemId, replacing them with calls of PyDict_GetItemWithError, _PyDict_GetItemStringWithError and _PyDict_GetItemIdWithError or other functions if appropriate.

_PyDict_GetItemId is no longer used and can be removed.
msg378460 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-11 22:17
See also old issues about PyDict_GetItemWithError():

* bpo-20615
* bpo-30475
* bpo-35459
msg378462 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-11 22:24
PyDict_GetItem() behavior is an old wart. Would it make sense to plan a migration plan to slowly move towards PyDict_GetItem() behaving as PyDict_GetItemWithError()?

CPython is an old code base full of corner cases. But outside CPython, is it common to rely on the current PyDict_GetItem() behavior (ignore errors, can be called with an exception raised)?

(A) For example, would it make sense to start to emit a DeprecationWarning when it's called with an exception raised? Currently, the function calls _PyErr_Fetch(): checking if exc_type is non-NULL requires to add an if which would add a cost to all PyDict_GetItem() calls. But IMO it would be interesting to fix this function in the long term.

(B) Also, would it make sense to start to no longer ignore base exceptions like SystemExit, KeyboardInterrupt or MemoryError in PyDict_GetItem()?

How can we take a decision on that? Analyze the C code base of a bunch of popular C extension? Make the change and await to see how many people complain?

In the long term, I would prefer that PyDict_GetItem() behaves as other C functions: it should not be called with an exception raised (the caller has to store/restore the current exception), and raises an exception on error. I expect better performance for such more regular and simpler behavior.
msg378463 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-10-11 22:26
I don't want to hold fixing the code right now. If we decide to break the backward compatibility of the C API, a transition plan will likely take a few years anyway, whereas this issue should be first as soon as possible.
msg379643 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-10-26 06:43
New changeset fb5db7ec58624cab0797b4050735be865d380823 by Serhiy Storchaka in branch 'master':
bpo-42006: Stop using PyDict_GetItem, PyDict_GetItemString and _PyDict_GetItemId. (GH-22648)
msg379645 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2020-10-26 07:15
My plan is to deprecate PyDict_GetItem() and functions with same design flaw and finally remove them (in 4.0?).

We cannot start to ignore exceptions in PyDict_GetItem(). It would not fix the original flaw when the user code does not check an exception after using PyDict_GetItem(), and only make it worse. It can cause crashes in the code which asserts that the exception is not set (for example when function returns value and sets exception). And in worst case it will lead to incorrect results when PyErr_Occurred() is used after calling other C API function.

Emitting a DeprecationWarning only when PyDict_GetItem() is called with an exception raised does not make much sense, because the problem with PyDict_GetItem() is not only in this case. Also, there is a trick in using warnings here, you should use PyErr_WriteUnraisable(), so warnings always will be printed and cannot be turned into exceptions.
Date User Action Args
2020-11-24 07:31:32hongweipengsetnosy: + hongweipeng
pull_requests: + pull_request22374
2020-10-26 07:15:19serhiy.storchakasetmessages: + msg379645
2020-10-26 06:43:48serhiy.storchakasetmessages: + msg379643
2020-10-11 22:26:15vstinnersetmessages: + msg378463
2020-10-11 22:24:24vstinnersetmessages: + msg378462
2020-10-11 22:17:56vstinnersetmessages: + msg378460
2020-10-11 14:43:57serhiy.storchakasetkeywords: + patch
stage: patch review
pull_requests: + pull_request21622
2020-10-11 14:36:40serhiy.storchakacreate