classification
Title: Add PyDict_GetItemStringWithError
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.8
process
Status: closed Resolution: not a bug
Dependencies: 35461 Superseder:
Assigned To: Nosy List: rhettinger, ronaldoussoren, scoder, serhiy.storchaka
Priority: normal Keywords:

Created on 2018-12-11 11:00 by ronaldoussoren, last changed 2019-02-14 16:42 by serhiy.storchaka. This issue is now closed.

Messages (8)
msg331607 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2018-12-11 11:00
PyDict_GetItemWithError is a variant of PyDict_GetItem that doesn't swallow unrelated exceptions.  While converting a project to use this API I noticed that there is similar variant of PyDict_GetItemString.

It would be nice to have PyDict_GetItemStringWithError as a public API to make it easier to convert existing code to the better API.
msg331623 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-12-11 13:48
PyDict_GetItemString() is an old API from times when dicts could contain only string keys. It is not necessary part of the C API and can be replaced with PyDict_GetItem() in new code. It was kept only as a convenient function.

In issue35459 many uses of PyDict_GetItemString() were replaced with PyDict_GetItemWithError() and private _PyDict_GetItemIdWithError(). Only 4 occurrences were replaced with newly added private _PyDict_GetItemStringWithError(). And they could use PyDict_GetItem(). So there are not much use cases for PyDict_GetItemStringWithError().

Before adding PyDict_GetItemStringWithError() we could consider alternatives.

*WithError() functions require calling PyErr_Occurred() to distinguish the error case from the "not found" case. This adds an overhead which can be not small in performance critical code. It would be better if the API function returned a three-state value: "found", "not found" and "error". See for example _PyObject_LookupAttr(). I am not sure this is the best design.
msg335153 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-02-10 13:05
Adding PyDict_GetItemStringWithError makes is possible to migrate existing code to a design that does not ignore errors.  I have a significant number of calls to PyDict_GetItemString that cannot be switch toed PyDict_GetItem without increasing the number of lines of C code I have to write (or in other words, if PyDict_GetItemString wouldn't exist I'd have invented this myself for use in extensions I write).

Another reason for adding the function is consistency in the API. 

BTW. I've added a function with the same signature as the proposed PyDict_GetItemStringWithError to my exension's code base, which is something I'd have to do anyway for backward compatibility with older CPython versions.

W.r.t API design: I prefer the interface of _PyObject_LookupAttr to that of PyDict_GetItemWithError, even if the additional cost of a call to PyErr_Occurred() is insignificant in most of my code (esp. when guarded with a test for a NULL return value from PyDict_GetItemWithError).
msg335157 - (view) Author: Stefan Behnel (scoder) * (Python committer) Date: 2019-02-10 14:42
The overhead of calling PyErr_Occurred() is definitely negligible in something as involved as PyDict_GetItemStringWithError(), where a mere key lookup first has to fire up the string decoder on a C character buffer to create a new string object and then calculate its hash value, just to throw away all that right after the tiny time interval that it takes to look up the key in the dict. It is not something I would encourage anyone to do in code that has only the slightest excuse for being implemented in C. :)

Rather, I would propose to open up the ID-String API that CPython uses internally, so that user code can benefit from fast lookups of interned strings with pre-initialised hash values, without having to care about creating and cleaning up string constants themselves all the time.

(FWIW, Cython also generates interned string constants automatically, but does not currently use the ID-API, also because it does it for *all* its strings, not just those that resemble identifiers etc.)
msg335159 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-02-10 15:51
There are more reasons to implement in C than just speed ;-). In my code I have two usecases for using PyDict_GetItemString, both not in performance critical code

1) Look up a hardcoded name in a dictionary. These could be switched to the Id API if that were public (and I might implement something like the Id API myself). 

2) Look up a name that is passed in as a "char*" from C code that's outside of my control.  This is a real use case for a PyDict_GetItemString API and not easilty converted to another API.

In PyObjC the majority of calls to PyDict_GetItemString are in the first category, while most of the latter would probably be better of using a PyDict_GetItemBytes API.  Although I definitely to not propose to add such an API to CPython.
msg335190 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-02-11 02:46
-1 for expanding this API.  As Serhiy pointed-out, PyDict_GetItemString() is an old API kept just for backward compatibility.  For your use case, it is easy to call PyUnicode_FromString(key) and then follow-up with PyDict_GetItemWithError().  The latter way is more flexible in that it allows you to cache the unicode object for future use (something you're going to want to do if you care about performance).  The latter way also lets you intern the string as well.

FWIW, if it is only your own code, it is trivially easy to write your own helper function if that is what you needed for a single porting project.  IMO, unnecessarily adding to many variants of the same function just makes the API harder to learn (see all the ObjectCall variants for example) and makes the code harder for us to maintain.  ISTM most of the concrete APIs are deliberately sparse, so adding this variant would be a change in the philosophy of the C-API.  Also, we mostly want people to use the abstract API unless they know for sure that a target dictionary is an exact dict (this has been an outstanding problem for OrderedDicts for example).
msg335202 - (view) Author: Ronald Oussoren (ronaldoussoren) * (Python committer) Date: 2019-02-11 08:55
I can live without this API, I mostly filed this issue because there's an obvious hole in the API when you look at API description: SetItem, SetItemString, DelItem, DelItemString, GetItem, GetItemString, GetItemWithError, <>. 

Thanks for the explanations.
msg335542 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-02-14 16:42
See also issue35459.
History
Date User Action Args
2019-02-14 16:42:48serhiy.storchakasetmessages: + msg335542
2019-02-11 08:55:19ronaldoussorensetstatus: open -> closed
resolution: not a bug
messages: + msg335202

stage: resolved
2019-02-11 02:46:34rhettingersetnosy: + rhettinger
messages: + msg335190
2019-02-10 15:51:44ronaldoussorensetmessages: + msg335159
2019-02-10 14:42:50scodersetnosy: + scoder
messages: + msg335157
2019-02-10 13:05:00ronaldoussorensetmessages: + msg335153
2018-12-11 13:48:48serhiy.storchakasetnosy: + serhiy.storchaka
dependencies: + Document C API functions which swallow exceptions
messages: + msg331623
2018-12-11 11:00:24ronaldoussorencreate