This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Share duplicated _Py_IDENTIFIER identifiers in C code
Type: Stage:
Components: Versions: Python 3.4
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: elixir, loewis, pitrou, python-dev, vstinner
Priority: normal Keywords: patch

Created on 2013-11-07 01:08 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
remove_duplicated_pyidentifiers.patch elixir, 2013-11-12 17:02 review
Messages (21)
msg202301 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 01:08
I started to share some common identifiers in Python/pythonrun.c, extract:

/* Common identifiers */
_Py_Identifier _PyId_argv = _Py_static_string_init("argv");
_Py_Identifier _PyId_builtins = _Py_static_string_init("builtins");
...

Do you think it would be interesting to continue to share such identifier somewhere? Maybe in a new file?

We might do the same for some common strings like empty string, single character (like "\n"), etc.


See also issue #19514.


Duplicated identifiers in the io module:

_Py_IDENTIFIER(_dealloc_warn): Modules/_io/bufferedio.c:17
_Py_IDENTIFIER(_dealloc_warn): Modules/_io/textio.c:15

_Py_IDENTIFIER(__IOBase_closed): Modules/_io/iobase.c:183
_Py_IDENTIFIER(__IOBase_closed): Modules/_io/iobase.c:62

_Py_IDENTIFIER(read1): Modules/_io/bufferedio.c:24
_Py_IDENTIFIER(read1): Modules/_io/textio.c:25

_Py_IDENTIFIER(readable): Modules/_io/bufferedio.c:25
_Py_IDENTIFIER(readable): Modules/_io/textio.c:26

_Py_IDENTIFIER(readall): Modules/_io/fileio.c:590
_Py_IDENTIFIER(readall): Modules/_io/iobase.c:802

_Py_IDENTIFIER(seek): Modules/_io/iobase.c:101
_Py_IDENTIFIER(seek): Modules/_io/textio.c:29

_Py_IDENTIFIER(writable): Modules/_io/bufferedio.c:27
_Py_IDENTIFIER(writable): Modules/_io/textio.c:33


Duplicated identifiers in other files:

_Py_IDENTIFIER(append): Modules/_elementtree.c:2373
_Py_IDENTIFIER(append): Modules/_pickle.c:5056

_Py_IDENTIFIER(__bases__): Objects/abstract.c:2417
_Py_IDENTIFIER(__bases__): Objects/typeobject.c:2813

_Py_IDENTIFIER(builtins): pythonrun.c
_Py_IDENTIFIER(builtins): Python/sysmodule.c:169

_Py_IDENTIFIER(__bytes__): Objects/bytesobject.c:2458
_Py_IDENTIFIER(__bytes__): Objects/object.c:563

_Py_IDENTIFIER(__class__): Objects/abstract.c:2492
_Py_IDENTIFIER(__class__): Objects/typeobject.c:42
_Py_IDENTIFIER(__class__): Python/codecs.c:470
_Py_IDENTIFIER(__class__): Python/compile.c:553

_Py_IDENTIFIER(close): Modules/_io/bufferedio.c:16
_Py_IDENTIFIER(close): Modules/_io/fileio.c:129
_Py_IDENTIFIER(close): Modules/_io/textio.c:14
_Py_IDENTIFIER(close): Modules/mmapmodule.c:707
_Py_IDENTIFIER(close): Modules/ossaudiodev.c:540
_Py_IDENTIFIER(close): Modules/selectmodule.c:1513
_Py_IDENTIFIER(close): Objects/genobject.c:173
_Py_IDENTIFIER(close): Python/traceback.c:235

_Py_IDENTIFIER(__delitem__): Objects/typeobject.c:5132
_Py_IDENTIFIER(__delitem__): Objects/typeobject.c:5183

_Py_IDENTIFIER(__dict__): Modules/arraymodule.c:2040
_Py_IDENTIFIER(__dict__): Modules/_collectionsmodule.c:894
_Py_IDENTIFIER(__dict__): Modules/_pickle.c:5204
_Py_IDENTIFIER(__dict__): Objects/bytearrayobject.c:2704
_Py_IDENTIFIER(__dict__): Objects/moduleobject.c:479
_Py_IDENTIFIER(__dict__): Objects/setobject.c:1946
_Py_IDENTIFIER(__dict__): Objects/typeobject.c:43
_Py_IDENTIFIER(__dict__): Parser/asdl_c.py:702
_Py_IDENTIFIER(__dict__): Python/bltinmodule.c:1942
_Py_IDENTIFIER(__dict__): Python/ceval.c:4660
_Py_IDENTIFIER(__dict__): Python/Python-ast.c:544

_Py_IDENTIFIER(__doc__): Objects/descrobject.c:1443
_Py_IDENTIFIER(__doc__): Objects/typeobject.c:44

_Py_IDENTIFIER(enable): Modules/faulthandler.c:1050
_Py_IDENTIFIER(enable): Modules/_posixsubprocess.c:50

_Py_IDENTIFIER(encoding): Python/bltinmodule.c:1716
_Py_IDENTIFIER(encoding): Python/pythonrun.c:1359
_Py_IDENTIFIER(encoding): Python/sysmodule.c:107

_Py_IDENTIFIER(filename): Python/errors.c:932
_Py_IDENTIFIER(filename): Python/pythonrun.c:1634

_Py_IDENTIFIER(fileno): Modules/faulthandler.c:133
_Py_IDENTIFIER(fileno): Modules/_io/_iomodule.c:238
_Py_IDENTIFIER(fileno): Modules/_io/textio.c:17
_Py_IDENTIFIER(fileno): Objects/fileobject.c:200
_Py_IDENTIFIER(fileno): Python/bltinmodule.c:35

_Py_IDENTIFIER(flush): Modules/faulthandler.c:134
_Py_IDENTIFIER(flush): Modules/_io/bufferedio.c:18
_Py_IDENTIFIER(flush): Modules/_io/textio.c:18
_Py_IDENTIFIER(flush): Python/bltinmodule.c:1550
_Py_IDENTIFIER(flush): Python/bltinmodule.c:36
_Py_IDENTIFIER(flush): Python/pythonrun.c:2120
_Py_IDENTIFIER(flush): Python/pythonrun.c:519

_Py_IDENTIFIER(__getinitargs__): Modules/_datetimemodule.c:3075
_Py_IDENTIFIER(__getinitargs__): Modules/_pickle.c:4501

_Py_IDENTIFIER(TextIOWrapper): Python/pythonrun.c:1005
_Py_IDENTIFIER(TextIOWrapper): Python/traceback.c:237

_Py_IDENTIFIER(__getstate__): Modules/_datetimemodule.c:3076
_Py_IDENTIFIER(__getstate__): Objects/typeobject.c:3448

_Py_IDENTIFIER(__import__): Python/ceval.c:2428
_Py_IDENTIFIER(__import__): Python/import.c:1230

_Py_IDENTIFIER(__module__): Objects/typeobject.c:48
_Py_IDENTIFIER(__module__): Python/errors.c:840
_Py_IDENTIFIER(__module__): Python/pythonrun.c:1916

_Py_IDENTIFIER(__name__): Objects/classobject.c:17
_Py_IDENTIFIER(__name__): Objects/typeobject.c:49
_Py_IDENTIFIER(__name__): Objects/weakrefobject.c:159
_Py_IDENTIFIER(__name__): Python/_warnings.c:260
_Py_IDENTIFIER(__name__): Python/codecs.c:471
_Py_IDENTIFIER(__name__): Python/import.c:1234

_Py_IDENTIFIER(__new__): Modules/_ctypes/callproc.c:1643
_Py_IDENTIFIER(__new__): Modules/_pickle.c:4511
_Py_IDENTIFIER(__new__): Objects/typeobject.c:50
_Py_IDENTIFIER(__new__): Objects/typeobject.c:5637

_Py_IDENTIFIER(__qualname__): Objects/descrobject.c:367
_Py_IDENTIFIER(__qualname__): Objects/methodobject.c:191
_Py_IDENTIFIER(__qualname__): Objects/typeobject.c:2032

_Py_IDENTIFIER(__setitem__): Modules/_collectionsmodule.c:1767
_Py_IDENTIFIER(__setitem__): Objects/typeobject.c:5133
_Py_IDENTIFIER(__setitem__): Objects/typeobject.c:5184

_Py_IDENTIFIER(__setstate__): Modules/_ctypes/callproc.c:1644
_Py_IDENTIFIER(__setstate__): Modules/_pickle.c:5147

_Py_IDENTIFIER(__trunc__): Modules/mathmodule.c:1464
_Py_IDENTIFIER(__trunc__): Objects/abstract.c:1273

_Py_IDENTIFIER(get): Modules/_collectionsmodule.c:1766
_Py_IDENTIFIER(get): Objects/descrobject.c:793

_Py_IDENTIFIER(isatty): Modules/_io/_iomodule.c:237
_Py_IDENTIFIER(isatty): Modules/_io/bufferedio.c:19
_Py_IDENTIFIER(isatty): Modules/_io/textio.c:20
_Py_IDENTIFIER(isatty): Python/pythonrun.c:1004

_Py_IDENTIFIER(items): Modules/_collectionsmodule.c:1566
_Py_IDENTIFIER(items): Modules/_pickle.c:2569
_Py_IDENTIFIER(items): Objects/abstract.c:2039
_Py_IDENTIFIER(items): Objects/descrobject.c:817
_Py_IDENTIFIER(items): Objects/typeobject.c:3537
_Py_IDENTIFIER(items): Python/Python-ast.c:108

_Py_IDENTIFIER(keys): Objects/abstract.c:2022
_Py_IDENTIFIER(keys): Objects/descrobject.c:803
_Py_IDENTIFIER(keys): Objects/dictobject.c:1792
_Py_IDENTIFIER(keys): Python/Python-ast.c:201

_Py_IDENTIFIER(lineno): Python/Python-ast.c:29
_Py_IDENTIFIER(lineno): Python/errors.c:933
_Py_IDENTIFIER(lineno): Python/pythonrun.c:1635

_Py_IDENTIFIER(mode): Modules/_io/_iomodule.c:239
_Py_IDENTIFIER(mode): Modules/_io/bufferedio.c:20
_Py_IDENTIFIER(mode): Modules/_io/textio.c:21
_Py_IDENTIFIER(mode): Python/pythonrun.c:1007

_Py_IDENTIFIER(msg): Python/Python-ast.c:130
_Py_IDENTIFIER(msg): Python/errors.c:934
_Py_IDENTIFIER(msg): Python/pythonrun.c:1633

_Py_IDENTIFIER(name): Modules/_io/bufferedio.c:21
_Py_IDENTIFIER(name): Modules/_io/fileio.c:62
_Py_IDENTIFIER(name): Modules/_io/textio.c:22
_Py_IDENTIFIER(name): Python/Python-ast.c:37
_Py_IDENTIFIER(name): Python/pythonrun.c:1006
_Py_IDENTIFIER(name): Python/pythonrun.c:222

_Py_IDENTIFIER(offset): Python/errors.c:935
_Py_IDENTIFIER(offset): Python/pythonrun.c:1636

_Py_IDENTIFIER(open): Objects/fileobject.c:33
_Py_IDENTIFIER(open): Parser/tokenizer.c:476
_Py_IDENTIFIER(open): Python/pythonrun.c:1003
_Py_IDENTIFIER(open): Python/traceback.c:155
_Py_IDENTIFIER(open): Python/traceback.c:236

_Py_IDENTIFIER(peek): Modules/_io/bufferedio.c:22
_Py_IDENTIFIER(peek): Modules/_io/iobase.c:453
_Py_IDENTIFIER(peek): Modules/_pickle.c:1181

_Py_IDENTIFIER(print_file_and_line): Python/errors.c:936
_Py_IDENTIFIER(print_file_and_line): Python/pythonrun.c:1865

_Py_IDENTIFIER(raw): Modules/_io/textio.c:23
_Py_IDENTIFIER(raw): Python/pythonrun.c:1029

_Py_IDENTIFIER(read): Modules/_cursesmodule.c:2336
_Py_IDENTIFIER(read): Modules/_io/bufferedio.c:23
_Py_IDENTIFIER(read): Modules/_io/bufferedio.c:55
_Py_IDENTIFIER(read): Modules/_io/iobase.c:452
_Py_IDENTIFIER(read): Modules/_io/iobase.c:846
_Py_IDENTIFIER(read): Modules/_io/textio.c:24
_Py_IDENTIFIER(read): Modules/_pickle.c:1182
_Py_IDENTIFIER(read): Modules/arraymodule.c:1267
_Py_IDENTIFIER(read): Modules/pyexpat.c:898
_Py_IDENTIFIER(read): Python/marshal.c:1595

_Py_IDENTIFIER(readinto): Modules/_io/bufferedio.c:26
_Py_IDENTIFIER(readinto): Python/marshal.c:621

_Py_IDENTIFIER(readline): Modules/_pickle.c:1183
_Py_IDENTIFIER(readline): Objects/fileobject.c:62
_Py_IDENTIFIER(readline): Parser/tokenizer.c:477

_Py_IDENTIFIER(replace): Modules/_datetimemodule.c:1071
_Py_IDENTIFIER(replace): Modules/_io/textio.c:27
_Py_IDENTIFIER(replace): Modules/zipimport.c:563
_Py_IDENTIFIER(replace): Modules/zipimport.c:70

_Py_IDENTIFIER(text): Modules/_elementtree.c:2357
_Py_IDENTIFIER(text): Python/errors.c:937
_Py_IDENTIFIER(text): Python/pythonrun.c:1637

_Py_IDENTIFIER(time): Modules/_datetimemodule.c:1309
_Py_IDENTIFIER(time): Modules/gcmodule.c:890

_Py_IDENTIFIER(upper): Modules/_sqlite/connection.c:1500
_Py_IDENTIFIER(upper): Modules/_sqlite/cursor.c:144
_Py_IDENTIFIER(upper): Modules/_sqlite/module.c:190
_Py_IDENTIFIER(upper): Python/Python-ast.c:329

_Py_IDENTIFIER(values): Objects/abstract.c:2056
_Py_IDENTIFIER(values): Objects/descrobject.c:810
_Py_IDENTIFIER(values): Python/Python-ast.c:170

_Py_IDENTIFIER(write): Modules/_csv.c:1374
_Py_IDENTIFIER(write): Modules/_cursesmodule.c:1792
_Py_IDENTIFIER(write): Modules/_io/bufferedio.c:28
_Py_IDENTIFIER(write): Modules/_pickle.c:831
_Py_IDENTIFIER(write): Modules/arraymodule.c:1340
_Py_IDENTIFIER(write): Modules/cjkcodecs/multibytecodec.c:1572
_Py_IDENTIFIER(write): Modules/cjkcodecs/multibytecodec.c:1642
_Py_IDENTIFIER(write): Objects/fileobject.c:131
_Py_IDENTIFIER(write): Python/marshal.c:1566
_Py_IDENTIFIER(write): Python/sysmodule.c:129
_Py_IDENTIFIER(write): Python/sysmodule.c:2007
msg202309 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 07:48
> _Py_IDENTIFIER(__delitem__): Objects/typeobject.c:5132
> _Py_IDENTIFIER(__delitem__): Objects/typeobject.c:5183

I moved this one to #19514.
msg202355 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 16:01
If most identifiers are stored in the same place, it would become possible to have a "cleanup" function to clear all identifiers. Such function could be called at Python shutdown to release as much memory as possible.
msg202360 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-07 16:25
What are you trying to achieve exactly? I don't think "sharing" identifier structs will gain anything significant. Please don't make the source code less readable in search for some mythical "efficiency".
msg202372 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-11-07 17:28
Victor: There already *is* a cleanup function that clears all allocated identifier memory at interpreter shutdown. Please read the source.
msg202373 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 17:33
> Victor: There already *is* a cleanup function that clears all allocated identifier memory at interpreter shutdown. Please read the source.

Oh, great! I never noticed _PyUnicode_ClearStaticStrings().

Call trace: Py_Finalize() -> _PyUnicode_ClearStaticStrings() ->
_PyUnicode_Fini().
msg202377 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-11-07 17:50
Well, that was one of the motivations of introducing this Py_IDENTIFIER machinery: to be able to cleanup at the end (unlike the static variables that were used before, which couldn't be cleaned up).
msg202381 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 20:54
Antoine, Martin: So, what do you think? Is it worth to move most common identifiers to a single place to not duplicate them?

If identifiers are already cleared at exit, the advantage would be to initialize duplicated identifiers more quickly, and don't initialize duplicated identifiers multiple times (once per copy).

If we choose to not share common identifiers, _PyId_xxx identifiers from pythonrun.c must be removed. There are also some identifiers duplicated in the same file which can be moved at the top to remove at least duplicates in a single file, as it was done for other identifiers in issue #19514.
msg202382 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-07 21:00
> Antoine, Martin: So, what do you think? Is it worth to move most common identifiers
> to a single place to not duplicate them?

Well, worth what? :)
If you don't tell us what it brings (numbers?), I'm against it.
msg202383 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 21:02
> If you don't tell us what it brings (numbers?), I'm against it.

For performances, it's probably very close to zero speed up. For the memory, it's a few bytes per duplicated identifier.
msg202385 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2013-11-07 21:17
>> If you don't tell us what it brings (numbers?), I'm against it.
>
> For performances, it's probably very close to zero speed up. For
> the memory, it's a few bytes per duplicated identifier.

Well, then IMHO it's not worth it.
msg202386 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 21:18
> Well, then IMHO it's not worth it.

Ok, you are probably right :-)

@Andrei: Are you interested to work on a patch to remove identifiers duplicated in the same file?
msg202388 - (view) Author: -- (elixir) * Date: 2013-11-07 21:47
> @Andrei: Are you interested to work on a patch to remove identifiers duplicated in the same file?

Yes, I will provide a patch in a day or two.
msg202389 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-07 22:08
New changeset 01c4a0af73cf by Victor Stinner in branch 'default':
Issue #19512, #19515: remove shared identifiers, move identifiers where they
http://hg.python.org/cpython/rev/01c4a0af73cf
msg202391 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-07 22:09
> New changeset 01c4a0af73cf by Victor Stinner in branch 'default':
> Issue #19512, #19515: remove shared identifiers, move identifiers where they
> http://hg.python.org/cpython/rev/01c4a0af73cf

This changeset removes some identifiers duplicated in the same file.
msg202418 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2013-11-08 13:45
I'd be +0 on extracting common identifiers. I see a slight increase of readability, and expect a slight reduction of memory usage, and a tiny reduction in runtime. It's not really worth the effort, but I fail to see that it causes harm.

I see no point in reverting cases where this approach is already taken. I don't quite understand Victor's interest in this, either, as there are hundreds of open real bugs that could use his attention.
msg202419 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-08 14:13
2013/11/8 Martin v. Löwis <report@bugs.python.org>:
> I'd be +0 on extracting common identifiers. I (...) expect a slight reduction of memory usage, and a tiny reduction in runtime.

Only duplicated Py_IDENTIFIER structures would be removed in memory,
but these structures are very small (3 pointers or something like
that). The identifier strings are already interned, so not duplicated
in memory.

> It's not really worth the effort, but I fail to see that it causes harm.

Initializing an identifier has to decode the literal byte string from
UTF-8, but Python UTF-8 decoder is really fast. I'm not sure that it's
possible to see a difference on the startup time.

> I see no point in reverting cases where this approach is already taken.

I only reverted shared identifiers added a few days ago in issue
#19512. I agree to leave the old code unchanged.

> I don't quite understand Victor's interest in this, either, as there are hundreds of open real bugs that could use his attention.

I tried to explain my motivation on using more identifiers in the issue #19512.
msg202696 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2013-11-12 15:40
@Andrei: Ping?
msg202706 - (view) Author: -- (elixir) * Date: 2013-11-12 17:02
Removed _Py_IDENTIFIERs duplicated in the same file, except:

    _Py_IDENTIFIER(replace): Modules/zipimport.c:563

which was enclosed in some #ifdefs.
msg202719 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-12 20:42
New changeset 518e3b174120 by Victor Stinner in branch 'default':
Issue #19515: Remove identifiers duplicated in the same file.
http://hg.python.org/cpython/rev/518e3b174120
msg202720 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2013-11-12 20:44
New changeset d2209b9f8971 by Victor Stinner in branch 'default':
Issue #19515: Remove duplicated identifiers in zipimport.c
http://hg.python.org/cpython/rev/d2209b9f8971
History
Date User Action Args
2022-04-11 14:57:53adminsetgithub: 63714
2013-11-12 20:44:58python-devsetmessages: + msg202720
2013-11-12 20:42:56vstinnersetstatus: open -> closed
resolution: fixed
2013-11-12 20:42:39python-devsetmessages: + msg202719
2013-11-12 17:02:57elixirsetfiles: + remove_duplicated_pyidentifiers.patch
keywords: + patch
messages: + msg202706
2013-11-12 15:40:56vstinnersetmessages: + msg202696
2013-11-08 14:13:33vstinnersetmessages: + msg202419
2013-11-08 13:45:05loewissetmessages: + msg202418
2013-11-07 22:09:58vstinnersetmessages: + msg202391
2013-11-07 22:08:07python-devsetnosy: + python-dev
messages: + msg202389
2013-11-07 21:47:35elixirsetmessages: + msg202388
2013-11-07 21:18:37vstinnersetmessages: + msg202386
2013-11-07 21:17:10pitrousetmessages: + msg202385
2013-11-07 21:02:51vstinnersetmessages: + msg202383
2013-11-07 21:00:30pitrousetmessages: + msg202382
2013-11-07 20:54:41vstinnersetmessages: + msg202381
2013-11-07 17:50:45loewissetmessages: + msg202377
2013-11-07 17:33:17vstinnersetmessages: + msg202373
2013-11-07 17:28:38loewissetmessages: + msg202372
2013-11-07 16:25:10pitrousetnosy: + pitrou
messages: + msg202360
2013-11-07 16:01:38vstinnersetmessages: + msg202355
2013-11-07 15:59:28vstinnersetnosy: + loewis, elixir
2013-11-07 07:48:14vstinnersetmessages: + msg202309
2013-11-07 01:08:17vstinnercreate