New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share duplicated _Py_IDENTIFIER identifiers in C code #63714
Comments
I started to share some common identifiers in Python/pythonrun.c, extract: /* Common identifiers */ Do you think it would be interesting to continue to share such identifier somewhere? Maybe in a new file? We might do the same for some common strings like empty string, single character (like "\n"), etc. See also issue bpo-19514. Duplicated identifiers in the io module: _Py_IDENTIFIER(_dealloc_warn): Modules/_io/bufferedio.c:17 _Py_IDENTIFIER(__IOBase_closed): Modules/_io/iobase.c:183 _Py_IDENTIFIER(read1): Modules/_io/bufferedio.c:24 _Py_IDENTIFIER(readable): Modules/_io/bufferedio.c:25 _Py_IDENTIFIER(readall): Modules/_io/fileio.c:590 _Py_IDENTIFIER(seek): Modules/_io/iobase.c:101 _Py_IDENTIFIER(writable): Modules/_io/bufferedio.c:27 Duplicated identifiers in other files: _Py_IDENTIFIER(append): Modules/_elementtree.c:2373 _Py_IDENTIFIER(bases): Objects/abstract.c:2417 _Py_IDENTIFIER(builtins): pythonrun.c _Py_IDENTIFIER(bytes): Objects/bytesobject.c:2458 _Py_IDENTIFIER(class): Objects/abstract.c:2492 _Py_IDENTIFIER(close): Modules/_io/bufferedio.c:16 _Py_IDENTIFIER(delitem): Objects/typeobject.c:5132 _Py_IDENTIFIER(dict): Modules/arraymodule.c:2040 _Py_IDENTIFIER(doc): Objects/descrobject.c:1443 _Py_IDENTIFIER(enable): Modules/faulthandler.c:1050 _Py_IDENTIFIER(encoding): Python/bltinmodule.c:1716 _Py_IDENTIFIER(filename): Python/errors.c:932 _Py_IDENTIFIER(fileno): Modules/faulthandler.c:133 _Py_IDENTIFIER(flush): Modules/faulthandler.c:134 _Py_IDENTIFIER(getinitargs): Modules/_datetimemodule.c:3075 _Py_IDENTIFIER(TextIOWrapper): Python/pythonrun.c:1005 _Py_IDENTIFIER(getstate): Modules/_datetimemodule.c:3076 _Py_IDENTIFIER(import): Python/ceval.c:2428 _Py_IDENTIFIER(module): Objects/typeobject.c:48 _Py_IDENTIFIER(name): Objects/classobject.c:17 _Py_IDENTIFIER(new): Modules/_ctypes/callproc.c:1643 _Py_IDENTIFIER(qualname): Objects/descrobject.c:367 _Py_IDENTIFIER(setitem): Modules/_collectionsmodule.c:1767 _Py_IDENTIFIER(setstate): Modules/_ctypes/callproc.c:1644 _Py_IDENTIFIER(trunc): Modules/mathmodule.c:1464 _Py_IDENTIFIER(get): Modules/_collectionsmodule.c:1766 _Py_IDENTIFIER(isatty): Modules/_io/_iomodule.c:237 _Py_IDENTIFIER(items): Modules/_collectionsmodule.c:1566 _Py_IDENTIFIER(keys): Objects/abstract.c:2022 _Py_IDENTIFIER(lineno): Python/Python-ast.c:29 _Py_IDENTIFIER(mode): Modules/_io/_iomodule.c:239 _Py_IDENTIFIER(msg): Python/Python-ast.c:130 _Py_IDENTIFIER(name): Modules/_io/bufferedio.c:21 _Py_IDENTIFIER(offset): Python/errors.c:935 _Py_IDENTIFIER(open): Objects/fileobject.c:33 _Py_IDENTIFIER(peek): Modules/_io/bufferedio.c:22 _Py_IDENTIFIER(print_file_and_line): Python/errors.c:936 _Py_IDENTIFIER(raw): Modules/_io/textio.c:23 _Py_IDENTIFIER(read): Modules/_cursesmodule.c:2336 _Py_IDENTIFIER(readinto): Modules/_io/bufferedio.c:26 _Py_IDENTIFIER(readline): Modules/_pickle.c:1183 _Py_IDENTIFIER(replace): Modules/_datetimemodule.c:1071 _Py_IDENTIFIER(text): Modules/_elementtree.c:2357 _Py_IDENTIFIER(time): Modules/_datetimemodule.c:1309 _Py_IDENTIFIER(upper): Modules/_sqlite/connection.c:1500 _Py_IDENTIFIER(values): Objects/abstract.c:2056 _Py_IDENTIFIER(write): Modules/_csv.c:1374 |
I moved this one to bpo-19514. |
If most identifiers are stored in the same place, it would become possible to have a "cleanup" function to clear all identifiers. Such function could be called at Python shutdown to release as much memory as possible. |
What are you trying to achieve exactly? I don't think "sharing" identifier structs will gain anything significant. Please don't make the source code less readable in search for some mythical "efficiency". |
Victor: There already *is* a cleanup function that clears all allocated identifier memory at interpreter shutdown. Please read the source. |
Oh, great! I never noticed _PyUnicode_ClearStaticStrings(). Call trace: Py_Finalize() -> _PyUnicode_ClearStaticStrings() -> |
Well, that was one of the motivations of introducing this Py_IDENTIFIER machinery: to be able to cleanup at the end (unlike the static variables that were used before, which couldn't be cleaned up). |
Antoine, Martin: So, what do you think? Is it worth to move most common identifiers to a single place to not duplicate them? If identifiers are already cleared at exit, the advantage would be to initialize duplicated identifiers more quickly, and don't initialize duplicated identifiers multiple times (once per copy). If we choose to not share common identifiers, _PyId_xxx identifiers from pythonrun.c must be removed. There are also some identifiers duplicated in the same file which can be moved at the top to remove at least duplicates in a single file, as it was done for other identifiers in issue bpo-19514. |
Well, worth what? :) |
For performances, it's probably very close to zero speed up. For the memory, it's a few bytes per duplicated identifier. |
Well, then IMHO it's not worth it. |
Ok, you are probably right :-) @andrei: Are you interested to work on a patch to remove identifiers duplicated in the same file? |
Yes, I will provide a patch in a day or two. |
New changeset 01c4a0af73cf by Victor Stinner in branch 'default': |
This changeset removes some identifiers duplicated in the same file. |
I'd be +0 on extracting common identifiers. I see a slight increase of readability, and expect a slight reduction of memory usage, and a tiny reduction in runtime. It's not really worth the effort, but I fail to see that it causes harm. I see no point in reverting cases where this approach is already taken. I don't quite understand Victor's interest in this, either, as there are hundreds of open real bugs that could use his attention. |
2013/11/8 Martin v. Löwis <report@bugs.python.org>:
Only duplicated Py_IDENTIFIER structures would be removed in memory,
Initializing an identifier has to decode the literal byte string from
I only reverted shared identifiers added a few days ago in issue
I tried to explain my motivation on using more identifiers in the issue bpo-19512. |
@andrei: Ping? |
Removed _Py_IDENTIFIERs duplicated in the same file, except:
which was enclosed in some #ifdefs. |
New changeset 518e3b174120 by Victor Stinner in branch 'default': |
New changeset d2209b9f8971 by Victor Stinner in branch 'default': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: