This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Merge duplicated _Py_IDENTIFIER identifiers in C code
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: brett.cannon, eric.snow, ncoghlan, shihai1991, vstinner
Priority: normal Keywords: patch

Created on 2020-01-29 14:52 by shihai1991, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 18254 merged shihai1991, 2020-01-29 14:56
Messages (9)
msg360966 - (view) Author: Hai Shi (shihai1991) * (Python triager) Date: 2020-01-29 14:52
As stinner said in issue19514

those _Py_IDENTIFIER should be merged:

./Modules/_ctypes/_ctypes.c:1054:    _Py_IDENTIFIER(_type_);
./Modules/_ctypes/_ctypes.c:1132:    _Py_IDENTIFIER(_type_);
./Modules/_ctypes/_ctypes.c:1494:    _Py_IDENTIFIER(_type_);
./Modules/_ctypes/_ctypes.c:2071:    _Py_IDENTIFIER(_type_);

./Modules/_ctypes/_ctypes.c:1692:    _Py_IDENTIFIER(_as_parameter_);
./Modules/_ctypes/_ctypes.c:1759:    _Py_IDENTIFIER(_as_parameter_);
./Modules/_ctypes/_ctypes.c:1826:    _Py_IDENTIFIER(_as_parameter_);
./Modules/_ctypes/_ctypes.c:2256:    _Py_IDENTIFIER(_as_parameter_);

./Modules/_ctypes/_ctypes.c:2474:    _Py_IDENTIFIER(_check_retval_);
./Modules/_ctypes/_ctypes.c:3280:    _Py_IDENTIFIER(_check_retval_);

./Modules/_pickle.c:3560:    _Py_IDENTIFIER(__name__);
./Modules/_pickle.c:3979:        _Py_IDENTIFIER(__name__);

./Modules/_pickle.c:4042:            _Py_IDENTIFIER(__new__);
./Modules/_pickle.c:5771:        _Py_IDENTIFIER(__new__);

./Python/ceval.c:5058:    _Py_IDENTIFIER(__name__);
./Python/ceval.c:5134:    _Py_IDENTIFIER(__name__);

./Python/import.c:386:    _Py_IDENTIFIER(__spec__);
./Python/import.c:1569:    _Py_IDENTIFIER(__spec__);

./Python/import.c:1571:    _Py_IDENTIFIER(__path__);
./Python/import.c:1933:        _Py_IDENTIFIER(__path__);

./Python/_warnings.c:487:    _Py_IDENTIFIER(__name__);
./Python/_warnings.c:821:    _Py_IDENTIFIER(__name__);
./Python/_warnings.c:972:    _Py_IDENTIFIER(__name__);

./Python/errors.c:1012:    _Py_IDENTIFIER(__module__);
./Python/errors.c:1238:    _Py_IDENTIFIER(__module__);

./Objects/bytesobject.c:546:    _Py_IDENTIFIER(__bytes__);
./Objects/bytesobject.c:2488:    _Py_IDENTIFIER(__bytes__);

./Objects/moduleobject.c:61:    _Py_IDENTIFIER(__name__);
./Objects/moduleobject.c:488:    _Py_IDENTIFIER(__name__);
./Objects/moduleobject.c:741:        _Py_IDENTIFIER(__name__);

./Objects/moduleobject.c:62:    _Py_IDENTIFIER(__doc__);
./Objects/moduleobject.c:461:    _Py_IDENTIFIER(__doc__);

./Objects/moduleobject.c:65:    _Py_IDENTIFIER(__spec__);
./Objects/moduleobject.c:744:            _Py_IDENTIFIER(__spec__);

./Objects/iterobject.c:107:    _Py_IDENTIFIER(iter);
./Objects/iterobject.c:247:    _Py_IDENTIFIER(iter);

./Objects/rangeobject.c:760:    _Py_IDENTIFIER(iter);
./Objects/rangeobject.c:918:    _Py_IDENTIFIER(iter);

./Objects/descrobject.c:574:    _Py_IDENTIFIER(getattr);
./Objects/descrobject.c:1243:    _Py_IDENTIFIER(getattr);

./Objects/odictobject.c:899:    _Py_IDENTIFIER(items);
./Objects/odictobject.c:1378:    _Py_IDENTIFIER(items);
./Objects/odictobject.c:2198:    _Py_IDENTIFIER(items);

./Objects/fileobject.c:35:    _Py_IDENTIFIER(open);
./Objects/fileobject.c:550:    _Py_IDENTIFIER(open);


./Objects/typeobject.c:312:        _Py_IDENTIFIER(mro);
./Objects/typeobject.c:1893:        _Py_IDENTIFIER(mro);
msg360972 - (view) Author: Eric Snow (eric.snow) * (Python committer) Date: 2020-01-29 16:23
FTR:

As Martin noted in #19514, there isn't any performance difference for statics, whether local or global.  For static locals the compiler (at least on linux) generates symbols named as "<name>.<#>" and they are treated as global.

One key difference (again, at least on linux; seen using "nm") is that symbols for static globals preserve identifying information, like the originating source file.  For static locals the filename is not preserved and, when there are duplicates, you do not know from which function a particular symbol comes.  So compiled symbols for static globals are *much* easier to identify in the source than symbols for static locals.

Hence static locals complicate something I'm working on: tooling to identify global variables in our source code.  So I'm a fan of efforts to remove duplicates (and move static locals to the explicit global namespace).  Thank you! :)
msg360990 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2020-01-29 22:46
We can't make this change, as it means the statics get initialised before the Python interpreter has been initialised, and won't be reinitialised if the interpreter is destroyed and recreated.
msg360991 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2020-01-29 22:53
My apologies, my comment above was based on an outdated understanding of how the identifier structs get initialised (it's the usage that initialises them, not the declaration).

That means this is a useful refactoring to help identify blockers to full subinterpreter support.
msg360992 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2020-01-29 22:58
In the subinterpreter context: perhaps it would make sense to move *all* Py_IDENTIFIER declarations to file scope?

That would make it much clearer which of our extension modules actually have hidden state for caching purposes.

If we did that though, we'd also want to update the usage instructions in object.h
msg360993 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2020-01-29 23:08
> That means this is a useful refactoring to help identify blockers to full subinterpreter support.

I don't think that subinterpreter support should block this issue, since currently, _Py_IDENTIFIER() does *not* support subinterpreters.

> In the subinterpreter context: perhaps it would make sense to move *all* Py_IDENTIFIER declarations to file scope?

What do you mean? _Py_IDENTIFIER() macro uses "static", so no identifier is visible outside the current C file. The scope may be the whole file, or the current function, depending where it's declared.

--

Once I discussed with Eric: _Py_IDENTIFIER() should use an "interpreter local storage" for identifiers values. _Py_IDENTIFIER() would only be a "key" and _PyUnicode_FromId() would store the value somewhere in a hash table stored in PyInterpreterState. Something similar to the TSS API:

* PyThread_create_key()
* PyThread_delete_key_value()
* PyThread_set_key_value()
* PyThread_get_key_value()

But per interpreter, rather than being per thread.

The key can be simply the variable address in memory. It only has to be unique in the process.
msg361030 - (view) Author: Hai Shi (shihai1991) * (Python triager) Date: 2020-01-30 10:06
If i understand clearly, msg360993 is an solution of issue39465.
msg361082 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-01-30 23:20
New changeset 46874c26ee1fc752e2e6930efa1d223b2351edb8 by Hai Shi in branch 'master':
bpo-39487: Merge duplicated _Py_IDENTIFIER identifiers in C code (GH-18254)
https://github.com/python/cpython/commit/46874c26ee1fc752e2e6930efa1d223b2351edb8
msg361083 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2020-01-30 23:21
I just merged Hai Shi's PR, so I'm going to close assuming that took care of all the instances.
History
Date User Action Args
2022-04-11 14:59:26adminsetgithub: 83668
2020-01-30 23:21:51brett.cannonsetstatus: open -> closed
resolution: fixed
messages: + msg361083

stage: patch review -> resolved
2020-01-30 23:20:55brett.cannonsetnosy: + brett.cannon
messages: + msg361082
2020-01-30 10:06:47shihai1991setmessages: + msg361030
2020-01-29 23:08:00vstinnersetmessages: + msg360993
2020-01-29 22:58:29ncoghlansetmessages: + msg360992
2020-01-29 22:53:37ncoghlansetstatus: closed -> open
resolution: rejected -> (no value)
messages: + msg360991

stage: resolved -> patch review
2020-01-29 22:46:21ncoghlansetstatus: open -> closed

nosy: + ncoghlan
messages: + msg360990

resolution: rejected
stage: patch review -> resolved
2020-01-29 16:23:20eric.snowsetnosy: + eric.snow
messages: + msg360972
2020-01-29 15:00:16shihai1991setnosy: + vstinner
2020-01-29 14:56:14shihai1991setkeywords: + patch
stage: patch review
pull_requests: + pull_request17631
2020-01-29 14:52:08shihai1991create