Issue19515
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2013-11-07 01:08 by vstinner, last changed 2022-04-11 14:57 by admin. This issue is now closed.
Files | ||||
---|---|---|---|---|
File name | Uploaded | Description | Edit | |
remove_duplicated_pyidentifiers.patch | elixir, 2013-11-12 17:02 | review |
Messages (21) | |||
---|---|---|---|
msg202301 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-07 01:08 | |
I started to share some common identifiers in Python/pythonrun.c, extract: /* Common identifiers */ _Py_Identifier _PyId_argv = _Py_static_string_init("argv"); _Py_Identifier _PyId_builtins = _Py_static_string_init("builtins"); ... Do you think it would be interesting to continue to share such identifier somewhere? Maybe in a new file? We might do the same for some common strings like empty string, single character (like "\n"), etc. See also issue #19514. Duplicated identifiers in the io module: _Py_IDENTIFIER(_dealloc_warn): Modules/_io/bufferedio.c:17 _Py_IDENTIFIER(_dealloc_warn): Modules/_io/textio.c:15 _Py_IDENTIFIER(__IOBase_closed): Modules/_io/iobase.c:183 _Py_IDENTIFIER(__IOBase_closed): Modules/_io/iobase.c:62 _Py_IDENTIFIER(read1): Modules/_io/bufferedio.c:24 _Py_IDENTIFIER(read1): Modules/_io/textio.c:25 _Py_IDENTIFIER(readable): Modules/_io/bufferedio.c:25 _Py_IDENTIFIER(readable): Modules/_io/textio.c:26 _Py_IDENTIFIER(readall): Modules/_io/fileio.c:590 _Py_IDENTIFIER(readall): Modules/_io/iobase.c:802 _Py_IDENTIFIER(seek): Modules/_io/iobase.c:101 _Py_IDENTIFIER(seek): Modules/_io/textio.c:29 _Py_IDENTIFIER(writable): Modules/_io/bufferedio.c:27 _Py_IDENTIFIER(writable): Modules/_io/textio.c:33 Duplicated identifiers in other files: _Py_IDENTIFIER(append): Modules/_elementtree.c:2373 _Py_IDENTIFIER(append): Modules/_pickle.c:5056 _Py_IDENTIFIER(__bases__): Objects/abstract.c:2417 _Py_IDENTIFIER(__bases__): Objects/typeobject.c:2813 _Py_IDENTIFIER(builtins): pythonrun.c _Py_IDENTIFIER(builtins): Python/sysmodule.c:169 _Py_IDENTIFIER(__bytes__): Objects/bytesobject.c:2458 _Py_IDENTIFIER(__bytes__): Objects/object.c:563 _Py_IDENTIFIER(__class__): Objects/abstract.c:2492 _Py_IDENTIFIER(__class__): Objects/typeobject.c:42 _Py_IDENTIFIER(__class__): Python/codecs.c:470 _Py_IDENTIFIER(__class__): Python/compile.c:553 _Py_IDENTIFIER(close): Modules/_io/bufferedio.c:16 _Py_IDENTIFIER(close): Modules/_io/fileio.c:129 _Py_IDENTIFIER(close): Modules/_io/textio.c:14 _Py_IDENTIFIER(close): Modules/mmapmodule.c:707 _Py_IDENTIFIER(close): Modules/ossaudiodev.c:540 _Py_IDENTIFIER(close): Modules/selectmodule.c:1513 _Py_IDENTIFIER(close): Objects/genobject.c:173 _Py_IDENTIFIER(close): Python/traceback.c:235 _Py_IDENTIFIER(__delitem__): Objects/typeobject.c:5132 _Py_IDENTIFIER(__delitem__): Objects/typeobject.c:5183 _Py_IDENTIFIER(__dict__): Modules/arraymodule.c:2040 _Py_IDENTIFIER(__dict__): Modules/_collectionsmodule.c:894 _Py_IDENTIFIER(__dict__): Modules/_pickle.c:5204 _Py_IDENTIFIER(__dict__): Objects/bytearrayobject.c:2704 _Py_IDENTIFIER(__dict__): Objects/moduleobject.c:479 _Py_IDENTIFIER(__dict__): Objects/setobject.c:1946 _Py_IDENTIFIER(__dict__): Objects/typeobject.c:43 _Py_IDENTIFIER(__dict__): Parser/asdl_c.py:702 _Py_IDENTIFIER(__dict__): Python/bltinmodule.c:1942 _Py_IDENTIFIER(__dict__): Python/ceval.c:4660 _Py_IDENTIFIER(__dict__): Python/Python-ast.c:544 _Py_IDENTIFIER(__doc__): Objects/descrobject.c:1443 _Py_IDENTIFIER(__doc__): Objects/typeobject.c:44 _Py_IDENTIFIER(enable): Modules/faulthandler.c:1050 _Py_IDENTIFIER(enable): Modules/_posixsubprocess.c:50 _Py_IDENTIFIER(encoding): Python/bltinmodule.c:1716 _Py_IDENTIFIER(encoding): Python/pythonrun.c:1359 _Py_IDENTIFIER(encoding): Python/sysmodule.c:107 _Py_IDENTIFIER(filename): Python/errors.c:932 _Py_IDENTIFIER(filename): Python/pythonrun.c:1634 _Py_IDENTIFIER(fileno): Modules/faulthandler.c:133 _Py_IDENTIFIER(fileno): Modules/_io/_iomodule.c:238 _Py_IDENTIFIER(fileno): Modules/_io/textio.c:17 _Py_IDENTIFIER(fileno): Objects/fileobject.c:200 _Py_IDENTIFIER(fileno): Python/bltinmodule.c:35 _Py_IDENTIFIER(flush): Modules/faulthandler.c:134 _Py_IDENTIFIER(flush): Modules/_io/bufferedio.c:18 _Py_IDENTIFIER(flush): Modules/_io/textio.c:18 _Py_IDENTIFIER(flush): Python/bltinmodule.c:1550 _Py_IDENTIFIER(flush): Python/bltinmodule.c:36 _Py_IDENTIFIER(flush): Python/pythonrun.c:2120 _Py_IDENTIFIER(flush): Python/pythonrun.c:519 _Py_IDENTIFIER(__getinitargs__): Modules/_datetimemodule.c:3075 _Py_IDENTIFIER(__getinitargs__): Modules/_pickle.c:4501 _Py_IDENTIFIER(TextIOWrapper): Python/pythonrun.c:1005 _Py_IDENTIFIER(TextIOWrapper): Python/traceback.c:237 _Py_IDENTIFIER(__getstate__): Modules/_datetimemodule.c:3076 _Py_IDENTIFIER(__getstate__): Objects/typeobject.c:3448 _Py_IDENTIFIER(__import__): Python/ceval.c:2428 _Py_IDENTIFIER(__import__): Python/import.c:1230 _Py_IDENTIFIER(__module__): Objects/typeobject.c:48 _Py_IDENTIFIER(__module__): Python/errors.c:840 _Py_IDENTIFIER(__module__): Python/pythonrun.c:1916 _Py_IDENTIFIER(__name__): Objects/classobject.c:17 _Py_IDENTIFIER(__name__): Objects/typeobject.c:49 _Py_IDENTIFIER(__name__): Objects/weakrefobject.c:159 _Py_IDENTIFIER(__name__): Python/_warnings.c:260 _Py_IDENTIFIER(__name__): Python/codecs.c:471 _Py_IDENTIFIER(__name__): Python/import.c:1234 _Py_IDENTIFIER(__new__): Modules/_ctypes/callproc.c:1643 _Py_IDENTIFIER(__new__): Modules/_pickle.c:4511 _Py_IDENTIFIER(__new__): Objects/typeobject.c:50 _Py_IDENTIFIER(__new__): Objects/typeobject.c:5637 _Py_IDENTIFIER(__qualname__): Objects/descrobject.c:367 _Py_IDENTIFIER(__qualname__): Objects/methodobject.c:191 _Py_IDENTIFIER(__qualname__): Objects/typeobject.c:2032 _Py_IDENTIFIER(__setitem__): Modules/_collectionsmodule.c:1767 _Py_IDENTIFIER(__setitem__): Objects/typeobject.c:5133 _Py_IDENTIFIER(__setitem__): Objects/typeobject.c:5184 _Py_IDENTIFIER(__setstate__): Modules/_ctypes/callproc.c:1644 _Py_IDENTIFIER(__setstate__): Modules/_pickle.c:5147 _Py_IDENTIFIER(__trunc__): Modules/mathmodule.c:1464 _Py_IDENTIFIER(__trunc__): Objects/abstract.c:1273 _Py_IDENTIFIER(get): Modules/_collectionsmodule.c:1766 _Py_IDENTIFIER(get): Objects/descrobject.c:793 _Py_IDENTIFIER(isatty): Modules/_io/_iomodule.c:237 _Py_IDENTIFIER(isatty): Modules/_io/bufferedio.c:19 _Py_IDENTIFIER(isatty): Modules/_io/textio.c:20 _Py_IDENTIFIER(isatty): Python/pythonrun.c:1004 _Py_IDENTIFIER(items): Modules/_collectionsmodule.c:1566 _Py_IDENTIFIER(items): Modules/_pickle.c:2569 _Py_IDENTIFIER(items): Objects/abstract.c:2039 _Py_IDENTIFIER(items): Objects/descrobject.c:817 _Py_IDENTIFIER(items): Objects/typeobject.c:3537 _Py_IDENTIFIER(items): Python/Python-ast.c:108 _Py_IDENTIFIER(keys): Objects/abstract.c:2022 _Py_IDENTIFIER(keys): Objects/descrobject.c:803 _Py_IDENTIFIER(keys): Objects/dictobject.c:1792 _Py_IDENTIFIER(keys): Python/Python-ast.c:201 _Py_IDENTIFIER(lineno): Python/Python-ast.c:29 _Py_IDENTIFIER(lineno): Python/errors.c:933 _Py_IDENTIFIER(lineno): Python/pythonrun.c:1635 _Py_IDENTIFIER(mode): Modules/_io/_iomodule.c:239 _Py_IDENTIFIER(mode): Modules/_io/bufferedio.c:20 _Py_IDENTIFIER(mode): Modules/_io/textio.c:21 _Py_IDENTIFIER(mode): Python/pythonrun.c:1007 _Py_IDENTIFIER(msg): Python/Python-ast.c:130 _Py_IDENTIFIER(msg): Python/errors.c:934 _Py_IDENTIFIER(msg): Python/pythonrun.c:1633 _Py_IDENTIFIER(name): Modules/_io/bufferedio.c:21 _Py_IDENTIFIER(name): Modules/_io/fileio.c:62 _Py_IDENTIFIER(name): Modules/_io/textio.c:22 _Py_IDENTIFIER(name): Python/Python-ast.c:37 _Py_IDENTIFIER(name): Python/pythonrun.c:1006 _Py_IDENTIFIER(name): Python/pythonrun.c:222 _Py_IDENTIFIER(offset): Python/errors.c:935 _Py_IDENTIFIER(offset): Python/pythonrun.c:1636 _Py_IDENTIFIER(open): Objects/fileobject.c:33 _Py_IDENTIFIER(open): Parser/tokenizer.c:476 _Py_IDENTIFIER(open): Python/pythonrun.c:1003 _Py_IDENTIFIER(open): Python/traceback.c:155 _Py_IDENTIFIER(open): Python/traceback.c:236 _Py_IDENTIFIER(peek): Modules/_io/bufferedio.c:22 _Py_IDENTIFIER(peek): Modules/_io/iobase.c:453 _Py_IDENTIFIER(peek): Modules/_pickle.c:1181 _Py_IDENTIFIER(print_file_and_line): Python/errors.c:936 _Py_IDENTIFIER(print_file_and_line): Python/pythonrun.c:1865 _Py_IDENTIFIER(raw): Modules/_io/textio.c:23 _Py_IDENTIFIER(raw): Python/pythonrun.c:1029 _Py_IDENTIFIER(read): Modules/_cursesmodule.c:2336 _Py_IDENTIFIER(read): Modules/_io/bufferedio.c:23 _Py_IDENTIFIER(read): Modules/_io/bufferedio.c:55 _Py_IDENTIFIER(read): Modules/_io/iobase.c:452 _Py_IDENTIFIER(read): Modules/_io/iobase.c:846 _Py_IDENTIFIER(read): Modules/_io/textio.c:24 _Py_IDENTIFIER(read): Modules/_pickle.c:1182 _Py_IDENTIFIER(read): Modules/arraymodule.c:1267 _Py_IDENTIFIER(read): Modules/pyexpat.c:898 _Py_IDENTIFIER(read): Python/marshal.c:1595 _Py_IDENTIFIER(readinto): Modules/_io/bufferedio.c:26 _Py_IDENTIFIER(readinto): Python/marshal.c:621 _Py_IDENTIFIER(readline): Modules/_pickle.c:1183 _Py_IDENTIFIER(readline): Objects/fileobject.c:62 _Py_IDENTIFIER(readline): Parser/tokenizer.c:477 _Py_IDENTIFIER(replace): Modules/_datetimemodule.c:1071 _Py_IDENTIFIER(replace): Modules/_io/textio.c:27 _Py_IDENTIFIER(replace): Modules/zipimport.c:563 _Py_IDENTIFIER(replace): Modules/zipimport.c:70 _Py_IDENTIFIER(text): Modules/_elementtree.c:2357 _Py_IDENTIFIER(text): Python/errors.c:937 _Py_IDENTIFIER(text): Python/pythonrun.c:1637 _Py_IDENTIFIER(time): Modules/_datetimemodule.c:1309 _Py_IDENTIFIER(time): Modules/gcmodule.c:890 _Py_IDENTIFIER(upper): Modules/_sqlite/connection.c:1500 _Py_IDENTIFIER(upper): Modules/_sqlite/cursor.c:144 _Py_IDENTIFIER(upper): Modules/_sqlite/module.c:190 _Py_IDENTIFIER(upper): Python/Python-ast.c:329 _Py_IDENTIFIER(values): Objects/abstract.c:2056 _Py_IDENTIFIER(values): Objects/descrobject.c:810 _Py_IDENTIFIER(values): Python/Python-ast.c:170 _Py_IDENTIFIER(write): Modules/_csv.c:1374 _Py_IDENTIFIER(write): Modules/_cursesmodule.c:1792 _Py_IDENTIFIER(write): Modules/_io/bufferedio.c:28 _Py_IDENTIFIER(write): Modules/_pickle.c:831 _Py_IDENTIFIER(write): Modules/arraymodule.c:1340 _Py_IDENTIFIER(write): Modules/cjkcodecs/multibytecodec.c:1572 _Py_IDENTIFIER(write): Modules/cjkcodecs/multibytecodec.c:1642 _Py_IDENTIFIER(write): Objects/fileobject.c:131 _Py_IDENTIFIER(write): Python/marshal.c:1566 _Py_IDENTIFIER(write): Python/sysmodule.c:129 _Py_IDENTIFIER(write): Python/sysmodule.c:2007 |
|||
msg202309 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-07 07:48 | |
> _Py_IDENTIFIER(__delitem__): Objects/typeobject.c:5132 > _Py_IDENTIFIER(__delitem__): Objects/typeobject.c:5183 I moved this one to #19514. |
|||
msg202355 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-07 16:01 | |
If most identifiers are stored in the same place, it would become possible to have a "cleanup" function to clear all identifiers. Such function could be called at Python shutdown to release as much memory as possible. |
|||
msg202360 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-11-07 16:25 | |
What are you trying to achieve exactly? I don't think "sharing" identifier structs will gain anything significant. Please don't make the source code less readable in search for some mythical "efficiency". |
|||
msg202372 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2013-11-07 17:28 | |
Victor: There already *is* a cleanup function that clears all allocated identifier memory at interpreter shutdown. Please read the source. |
|||
msg202373 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-07 17:33 | |
> Victor: There already *is* a cleanup function that clears all allocated identifier memory at interpreter shutdown. Please read the source. Oh, great! I never noticed _PyUnicode_ClearStaticStrings(). Call trace: Py_Finalize() -> _PyUnicode_ClearStaticStrings() -> _PyUnicode_Fini(). |
|||
msg202377 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2013-11-07 17:50 | |
Well, that was one of the motivations of introducing this Py_IDENTIFIER machinery: to be able to cleanup at the end (unlike the static variables that were used before, which couldn't be cleaned up). |
|||
msg202381 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-07 20:54 | |
Antoine, Martin: So, what do you think? Is it worth to move most common identifiers to a single place to not duplicate them? If identifiers are already cleared at exit, the advantage would be to initialize duplicated identifiers more quickly, and don't initialize duplicated identifiers multiple times (once per copy). If we choose to not share common identifiers, _PyId_xxx identifiers from pythonrun.c must be removed. There are also some identifiers duplicated in the same file which can be moved at the top to remove at least duplicates in a single file, as it was done for other identifiers in issue #19514. |
|||
msg202382 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-11-07 21:00 | |
> Antoine, Martin: So, what do you think? Is it worth to move most common identifiers > to a single place to not duplicate them? Well, worth what? :) If you don't tell us what it brings (numbers?), I'm against it. |
|||
msg202383 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-07 21:02 | |
> If you don't tell us what it brings (numbers?), I'm against it. For performances, it's probably very close to zero speed up. For the memory, it's a few bytes per duplicated identifier. |
|||
msg202385 - (view) | Author: Antoine Pitrou (pitrou) * | Date: 2013-11-07 21:17 | |
>> If you don't tell us what it brings (numbers?), I'm against it. > > For performances, it's probably very close to zero speed up. For > the memory, it's a few bytes per duplicated identifier. Well, then IMHO it's not worth it. |
|||
msg202386 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-07 21:18 | |
> Well, then IMHO it's not worth it. Ok, you are probably right :-) @Andrei: Are you interested to work on a patch to remove identifiers duplicated in the same file? |
|||
msg202388 - (view) | Author: -- (elixir) * | Date: 2013-11-07 21:47 | |
> @Andrei: Are you interested to work on a patch to remove identifiers duplicated in the same file? Yes, I will provide a patch in a day or two. |
|||
msg202389 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-11-07 22:08 | |
New changeset 01c4a0af73cf by Victor Stinner in branch 'default': Issue #19512, #19515: remove shared identifiers, move identifiers where they http://hg.python.org/cpython/rev/01c4a0af73cf |
|||
msg202391 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-07 22:09 | |
> New changeset 01c4a0af73cf by Victor Stinner in branch 'default': > Issue #19512, #19515: remove shared identifiers, move identifiers where they > http://hg.python.org/cpython/rev/01c4a0af73cf This changeset removes some identifiers duplicated in the same file. |
|||
msg202418 - (view) | Author: Martin v. Löwis (loewis) * | Date: 2013-11-08 13:45 | |
I'd be +0 on extracting common identifiers. I see a slight increase of readability, and expect a slight reduction of memory usage, and a tiny reduction in runtime. It's not really worth the effort, but I fail to see that it causes harm. I see no point in reverting cases where this approach is already taken. I don't quite understand Victor's interest in this, either, as there are hundreds of open real bugs that could use his attention. |
|||
msg202419 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-08 14:13 | |
2013/11/8 Martin v. Löwis <report@bugs.python.org>: > I'd be +0 on extracting common identifiers. I (...) expect a slight reduction of memory usage, and a tiny reduction in runtime. Only duplicated Py_IDENTIFIER structures would be removed in memory, but these structures are very small (3 pointers or something like that). The identifier strings are already interned, so not duplicated in memory. > It's not really worth the effort, but I fail to see that it causes harm. Initializing an identifier has to decode the literal byte string from UTF-8, but Python UTF-8 decoder is really fast. I'm not sure that it's possible to see a difference on the startup time. > I see no point in reverting cases where this approach is already taken. I only reverted shared identifiers added a few days ago in issue #19512. I agree to leave the old code unchanged. > I don't quite understand Victor's interest in this, either, as there are hundreds of open real bugs that could use his attention. I tried to explain my motivation on using more identifiers in the issue #19512. |
|||
msg202696 - (view) | Author: STINNER Victor (vstinner) * | Date: 2013-11-12 15:40 | |
@Andrei: Ping? |
|||
msg202706 - (view) | Author: -- (elixir) * | Date: 2013-11-12 17:02 | |
Removed _Py_IDENTIFIERs duplicated in the same file, except: _Py_IDENTIFIER(replace): Modules/zipimport.c:563 which was enclosed in some #ifdefs. |
|||
msg202719 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-11-12 20:42 | |
New changeset 518e3b174120 by Victor Stinner in branch 'default': Issue #19515: Remove identifiers duplicated in the same file. http://hg.python.org/cpython/rev/518e3b174120 |
|||
msg202720 - (view) | Author: Roundup Robot (python-dev) | Date: 2013-11-12 20:44 | |
New changeset d2209b9f8971 by Victor Stinner in branch 'default': Issue #19515: Remove duplicated identifiers in zipimport.c http://hg.python.org/cpython/rev/d2209b9f8971 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:57:53 | admin | set | github: 63714 |
2013-11-12 20:44:58 | python-dev | set | messages: + msg202720 |
2013-11-12 20:42:56 | vstinner | set | status: open -> closed resolution: fixed |
2013-11-12 20:42:39 | python-dev | set | messages: + msg202719 |
2013-11-12 17:02:57 | elixir | set | files:
+ remove_duplicated_pyidentifiers.patch keywords: + patch messages: + msg202706 |
2013-11-12 15:40:56 | vstinner | set | messages: + msg202696 |
2013-11-08 14:13:33 | vstinner | set | messages: + msg202419 |
2013-11-08 13:45:05 | loewis | set | messages: + msg202418 |
2013-11-07 22:09:58 | vstinner | set | messages: + msg202391 |
2013-11-07 22:08:07 | python-dev | set | nosy:
+ python-dev messages: + msg202389 |
2013-11-07 21:47:35 | elixir | set | messages: + msg202388 |
2013-11-07 21:18:37 | vstinner | set | messages: + msg202386 |
2013-11-07 21:17:10 | pitrou | set | messages: + msg202385 |
2013-11-07 21:02:51 | vstinner | set | messages: + msg202383 |
2013-11-07 21:00:30 | pitrou | set | messages: + msg202382 |
2013-11-07 20:54:41 | vstinner | set | messages: + msg202381 |
2013-11-07 17:50:45 | loewis | set | messages: + msg202377 |
2013-11-07 17:33:17 | vstinner | set | messages: + msg202373 |
2013-11-07 17:28:38 | loewis | set | messages: + msg202372 |
2013-11-07 16:25:10 | pitrou | set | nosy:
+ pitrou messages: + msg202360 |
2013-11-07 16:01:38 | vstinner | set | messages: + msg202355 |
2013-11-07 15:59:28 | vstinner | set | nosy:
+ loewis, elixir |
2013-11-07 07:48:14 | vstinner | set | messages: + msg202309 |
2013-11-07 01:08:17 | vstinner | create |