Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share duplicated _Py_IDENTIFIER identifiers in C code #63714

Closed
vstinner opened this issue Nov 7, 2013 · 21 comments
Closed

Share duplicated _Py_IDENTIFIER identifiers in C code #63714

vstinner opened this issue Nov 7, 2013 · 21 comments

Comments

@vstinner
Copy link
Member

vstinner commented Nov 7, 2013

BPO 19515
Nosy @loewis, @pitrou, @vstinner
Files
  • remove_duplicated_pyidentifiers.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2013-11-12.20:42:56.125>
    created_at = <Date 2013-11-07.01:08:17.094>
    labels = []
    title = 'Share duplicated _Py_IDENTIFIER identifiers in C code'
    updated_at = <Date 2013-11-12.20:44:58.657>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2013-11-12.20:44:58.657>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2013-11-12.20:42:56.125>
    closer = 'vstinner'
    components = []
    creation = <Date 2013-11-07.01:08:17.094>
    creator = 'vstinner'
    dependencies = []
    files = ['32587']
    hgrepos = []
    issue_num = 19515
    keywords = ['patch']
    message_count = 21.0
    messages = ['202301', '202309', '202355', '202360', '202372', '202373', '202377', '202381', '202382', '202383', '202385', '202386', '202388', '202389', '202391', '202418', '202419', '202696', '202706', '202719', '202720']
    nosy_count = 5.0
    nosy_names = ['loewis', 'pitrou', 'vstinner', 'python-dev', 'elixir']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = None
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue19515'
    versions = ['Python 3.4']

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2013

    I started to share some common identifiers in Python/pythonrun.c, extract:

    /* Common identifiers */
    _Py_Identifier _PyId_argv = _Py_static_string_init("argv");
    _Py_Identifier _PyId_builtins = _Py_static_string_init("builtins");
    ...

    Do you think it would be interesting to continue to share such identifier somewhere? Maybe in a new file?

    We might do the same for some common strings like empty string, single character (like "\n"), etc.

    See also issue bpo-19514.

    Duplicated identifiers in the io module:

    _Py_IDENTIFIER(_dealloc_warn): Modules/_io/bufferedio.c:17
    _Py_IDENTIFIER(_dealloc_warn): Modules/_io/textio.c:15

    _Py_IDENTIFIER(__IOBase_closed): Modules/_io/iobase.c:183
    _Py_IDENTIFIER(__IOBase_closed): Modules/_io/iobase.c:62

    _Py_IDENTIFIER(read1): Modules/_io/bufferedio.c:24
    _Py_IDENTIFIER(read1): Modules/_io/textio.c:25

    _Py_IDENTIFIER(readable): Modules/_io/bufferedio.c:25
    _Py_IDENTIFIER(readable): Modules/_io/textio.c:26

    _Py_IDENTIFIER(readall): Modules/_io/fileio.c:590
    _Py_IDENTIFIER(readall): Modules/_io/iobase.c:802

    _Py_IDENTIFIER(seek): Modules/_io/iobase.c:101
    _Py_IDENTIFIER(seek): Modules/_io/textio.c:29

    _Py_IDENTIFIER(writable): Modules/_io/bufferedio.c:27
    _Py_IDENTIFIER(writable): Modules/_io/textio.c:33

    Duplicated identifiers in other files:

    _Py_IDENTIFIER(append): Modules/_elementtree.c:2373
    _Py_IDENTIFIER(append): Modules/_pickle.c:5056

    _Py_IDENTIFIER(bases): Objects/abstract.c:2417
    _Py_IDENTIFIER(bases): Objects/typeobject.c:2813

    _Py_IDENTIFIER(builtins): pythonrun.c
    _Py_IDENTIFIER(builtins): Python/sysmodule.c:169

    _Py_IDENTIFIER(bytes): Objects/bytesobject.c:2458
    _Py_IDENTIFIER(bytes): Objects/object.c:563

    _Py_IDENTIFIER(class): Objects/abstract.c:2492
    _Py_IDENTIFIER(class): Objects/typeobject.c:42
    _Py_IDENTIFIER(class): Python/codecs.c:470
    _Py_IDENTIFIER(class): Python/compile.c:553

    _Py_IDENTIFIER(close): Modules/_io/bufferedio.c:16
    _Py_IDENTIFIER(close): Modules/_io/fileio.c:129
    _Py_IDENTIFIER(close): Modules/_io/textio.c:14
    _Py_IDENTIFIER(close): Modules/mmapmodule.c:707
    _Py_IDENTIFIER(close): Modules/ossaudiodev.c:540
    _Py_IDENTIFIER(close): Modules/selectmodule.c:1513
    _Py_IDENTIFIER(close): Objects/genobject.c:173
    _Py_IDENTIFIER(close): Python/traceback.c:235

    _Py_IDENTIFIER(delitem): Objects/typeobject.c:5132
    _Py_IDENTIFIER(delitem): Objects/typeobject.c:5183

    _Py_IDENTIFIER(dict): Modules/arraymodule.c:2040
    _Py_IDENTIFIER(dict): Modules/_collectionsmodule.c:894
    _Py_IDENTIFIER(dict): Modules/_pickle.c:5204
    _Py_IDENTIFIER(dict): Objects/bytearrayobject.c:2704
    _Py_IDENTIFIER(dict): Objects/moduleobject.c:479
    _Py_IDENTIFIER(dict): Objects/setobject.c:1946
    _Py_IDENTIFIER(dict): Objects/typeobject.c:43
    _Py_IDENTIFIER(dict): Parser/asdl_c.py:702
    _Py_IDENTIFIER(dict): Python/bltinmodule.c:1942
    _Py_IDENTIFIER(dict): Python/ceval.c:4660
    _Py_IDENTIFIER(dict): Python/Python-ast.c:544

    _Py_IDENTIFIER(doc): Objects/descrobject.c:1443
    _Py_IDENTIFIER(doc): Objects/typeobject.c:44

    _Py_IDENTIFIER(enable): Modules/faulthandler.c:1050
    _Py_IDENTIFIER(enable): Modules/_posixsubprocess.c:50

    _Py_IDENTIFIER(encoding): Python/bltinmodule.c:1716
    _Py_IDENTIFIER(encoding): Python/pythonrun.c:1359
    _Py_IDENTIFIER(encoding): Python/sysmodule.c:107

    _Py_IDENTIFIER(filename): Python/errors.c:932
    _Py_IDENTIFIER(filename): Python/pythonrun.c:1634

    _Py_IDENTIFIER(fileno): Modules/faulthandler.c:133
    _Py_IDENTIFIER(fileno): Modules/_io/_iomodule.c:238
    _Py_IDENTIFIER(fileno): Modules/_io/textio.c:17
    _Py_IDENTIFIER(fileno): Objects/fileobject.c:200
    _Py_IDENTIFIER(fileno): Python/bltinmodule.c:35

    _Py_IDENTIFIER(flush): Modules/faulthandler.c:134
    _Py_IDENTIFIER(flush): Modules/_io/bufferedio.c:18
    _Py_IDENTIFIER(flush): Modules/_io/textio.c:18
    _Py_IDENTIFIER(flush): Python/bltinmodule.c:1550
    _Py_IDENTIFIER(flush): Python/bltinmodule.c:36
    _Py_IDENTIFIER(flush): Python/pythonrun.c:2120
    _Py_IDENTIFIER(flush): Python/pythonrun.c:519

    _Py_IDENTIFIER(getinitargs): Modules/_datetimemodule.c:3075
    _Py_IDENTIFIER(getinitargs): Modules/_pickle.c:4501

    _Py_IDENTIFIER(TextIOWrapper): Python/pythonrun.c:1005
    _Py_IDENTIFIER(TextIOWrapper): Python/traceback.c:237

    _Py_IDENTIFIER(getstate): Modules/_datetimemodule.c:3076
    _Py_IDENTIFIER(getstate): Objects/typeobject.c:3448

    _Py_IDENTIFIER(import): Python/ceval.c:2428
    _Py_IDENTIFIER(import): Python/import.c:1230

    _Py_IDENTIFIER(module): Objects/typeobject.c:48
    _Py_IDENTIFIER(module): Python/errors.c:840
    _Py_IDENTIFIER(module): Python/pythonrun.c:1916

    _Py_IDENTIFIER(name): Objects/classobject.c:17
    _Py_IDENTIFIER(name): Objects/typeobject.c:49
    _Py_IDENTIFIER(name): Objects/weakrefobject.c:159
    _Py_IDENTIFIER(name): Python/_warnings.c:260
    _Py_IDENTIFIER(name): Python/codecs.c:471
    _Py_IDENTIFIER(name): Python/import.c:1234

    _Py_IDENTIFIER(new): Modules/_ctypes/callproc.c:1643
    _Py_IDENTIFIER(new): Modules/_pickle.c:4511
    _Py_IDENTIFIER(new): Objects/typeobject.c:50
    _Py_IDENTIFIER(new): Objects/typeobject.c:5637

    _Py_IDENTIFIER(qualname): Objects/descrobject.c:367
    _Py_IDENTIFIER(qualname): Objects/methodobject.c:191
    _Py_IDENTIFIER(qualname): Objects/typeobject.c:2032

    _Py_IDENTIFIER(setitem): Modules/_collectionsmodule.c:1767
    _Py_IDENTIFIER(setitem): Objects/typeobject.c:5133
    _Py_IDENTIFIER(setitem): Objects/typeobject.c:5184

    _Py_IDENTIFIER(setstate): Modules/_ctypes/callproc.c:1644
    _Py_IDENTIFIER(setstate): Modules/_pickle.c:5147

    _Py_IDENTIFIER(trunc): Modules/mathmodule.c:1464
    _Py_IDENTIFIER(trunc): Objects/abstract.c:1273

    _Py_IDENTIFIER(get): Modules/_collectionsmodule.c:1766
    _Py_IDENTIFIER(get): Objects/descrobject.c:793

    _Py_IDENTIFIER(isatty): Modules/_io/_iomodule.c:237
    _Py_IDENTIFIER(isatty): Modules/_io/bufferedio.c:19
    _Py_IDENTIFIER(isatty): Modules/_io/textio.c:20
    _Py_IDENTIFIER(isatty): Python/pythonrun.c:1004

    _Py_IDENTIFIER(items): Modules/_collectionsmodule.c:1566
    _Py_IDENTIFIER(items): Modules/_pickle.c:2569
    _Py_IDENTIFIER(items): Objects/abstract.c:2039
    _Py_IDENTIFIER(items): Objects/descrobject.c:817
    _Py_IDENTIFIER(items): Objects/typeobject.c:3537
    _Py_IDENTIFIER(items): Python/Python-ast.c:108

    _Py_IDENTIFIER(keys): Objects/abstract.c:2022
    _Py_IDENTIFIER(keys): Objects/descrobject.c:803
    _Py_IDENTIFIER(keys): Objects/dictobject.c:1792
    _Py_IDENTIFIER(keys): Python/Python-ast.c:201

    _Py_IDENTIFIER(lineno): Python/Python-ast.c:29
    _Py_IDENTIFIER(lineno): Python/errors.c:933
    _Py_IDENTIFIER(lineno): Python/pythonrun.c:1635

    _Py_IDENTIFIER(mode): Modules/_io/_iomodule.c:239
    _Py_IDENTIFIER(mode): Modules/_io/bufferedio.c:20
    _Py_IDENTIFIER(mode): Modules/_io/textio.c:21
    _Py_IDENTIFIER(mode): Python/pythonrun.c:1007

    _Py_IDENTIFIER(msg): Python/Python-ast.c:130
    _Py_IDENTIFIER(msg): Python/errors.c:934
    _Py_IDENTIFIER(msg): Python/pythonrun.c:1633

    _Py_IDENTIFIER(name): Modules/_io/bufferedio.c:21
    _Py_IDENTIFIER(name): Modules/_io/fileio.c:62
    _Py_IDENTIFIER(name): Modules/_io/textio.c:22
    _Py_IDENTIFIER(name): Python/Python-ast.c:37
    _Py_IDENTIFIER(name): Python/pythonrun.c:1006
    _Py_IDENTIFIER(name): Python/pythonrun.c:222

    _Py_IDENTIFIER(offset): Python/errors.c:935
    _Py_IDENTIFIER(offset): Python/pythonrun.c:1636

    _Py_IDENTIFIER(open): Objects/fileobject.c:33
    _Py_IDENTIFIER(open): Parser/tokenizer.c:476
    _Py_IDENTIFIER(open): Python/pythonrun.c:1003
    _Py_IDENTIFIER(open): Python/traceback.c:155
    _Py_IDENTIFIER(open): Python/traceback.c:236

    _Py_IDENTIFIER(peek): Modules/_io/bufferedio.c:22
    _Py_IDENTIFIER(peek): Modules/_io/iobase.c:453
    _Py_IDENTIFIER(peek): Modules/_pickle.c:1181

    _Py_IDENTIFIER(print_file_and_line): Python/errors.c:936
    _Py_IDENTIFIER(print_file_and_line): Python/pythonrun.c:1865

    _Py_IDENTIFIER(raw): Modules/_io/textio.c:23
    _Py_IDENTIFIER(raw): Python/pythonrun.c:1029

    _Py_IDENTIFIER(read): Modules/_cursesmodule.c:2336
    _Py_IDENTIFIER(read): Modules/_io/bufferedio.c:23
    _Py_IDENTIFIER(read): Modules/_io/bufferedio.c:55
    _Py_IDENTIFIER(read): Modules/_io/iobase.c:452
    _Py_IDENTIFIER(read): Modules/_io/iobase.c:846
    _Py_IDENTIFIER(read): Modules/_io/textio.c:24
    _Py_IDENTIFIER(read): Modules/_pickle.c:1182
    _Py_IDENTIFIER(read): Modules/arraymodule.c:1267
    _Py_IDENTIFIER(read): Modules/pyexpat.c:898
    _Py_IDENTIFIER(read): Python/marshal.c:1595

    _Py_IDENTIFIER(readinto): Modules/_io/bufferedio.c:26
    _Py_IDENTIFIER(readinto): Python/marshal.c:621

    _Py_IDENTIFIER(readline): Modules/_pickle.c:1183
    _Py_IDENTIFIER(readline): Objects/fileobject.c:62
    _Py_IDENTIFIER(readline): Parser/tokenizer.c:477

    _Py_IDENTIFIER(replace): Modules/_datetimemodule.c:1071
    _Py_IDENTIFIER(replace): Modules/_io/textio.c:27
    _Py_IDENTIFIER(replace): Modules/zipimport.c:563
    _Py_IDENTIFIER(replace): Modules/zipimport.c:70

    _Py_IDENTIFIER(text): Modules/_elementtree.c:2357
    _Py_IDENTIFIER(text): Python/errors.c:937
    _Py_IDENTIFIER(text): Python/pythonrun.c:1637

    _Py_IDENTIFIER(time): Modules/_datetimemodule.c:1309
    _Py_IDENTIFIER(time): Modules/gcmodule.c:890

    _Py_IDENTIFIER(upper): Modules/_sqlite/connection.c:1500
    _Py_IDENTIFIER(upper): Modules/_sqlite/cursor.c:144
    _Py_IDENTIFIER(upper): Modules/_sqlite/module.c:190
    _Py_IDENTIFIER(upper): Python/Python-ast.c:329

    _Py_IDENTIFIER(values): Objects/abstract.c:2056
    _Py_IDENTIFIER(values): Objects/descrobject.c:810
    _Py_IDENTIFIER(values): Python/Python-ast.c:170

    _Py_IDENTIFIER(write): Modules/_csv.c:1374
    _Py_IDENTIFIER(write): Modules/_cursesmodule.c:1792
    _Py_IDENTIFIER(write): Modules/_io/bufferedio.c:28
    _Py_IDENTIFIER(write): Modules/_pickle.c:831
    _Py_IDENTIFIER(write): Modules/arraymodule.c:1340
    _Py_IDENTIFIER(write): Modules/cjkcodecs/multibytecodec.c:1572
    _Py_IDENTIFIER(write): Modules/cjkcodecs/multibytecodec.c:1642
    _Py_IDENTIFIER(write): Objects/fileobject.c:131
    _Py_IDENTIFIER(write): Python/marshal.c:1566
    _Py_IDENTIFIER(write): Python/sysmodule.c:129
    _Py_IDENTIFIER(write): Python/sysmodule.c:2007

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2013

    _Py_IDENTIFIER(delitem): Objects/typeobject.c:5132
    _Py_IDENTIFIER(delitem): Objects/typeobject.c:5183

    I moved this one to bpo-19514.

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2013

    If most identifiers are stored in the same place, it would become possible to have a "cleanup" function to clear all identifiers. Such function could be called at Python shutdown to release as much memory as possible.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 7, 2013

    What are you trying to achieve exactly? I don't think "sharing" identifier structs will gain anything significant. Please don't make the source code less readable in search for some mythical "efficiency".

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 7, 2013

    Victor: There already *is* a cleanup function that clears all allocated identifier memory at interpreter shutdown. Please read the source.

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2013

    Victor: There already *is* a cleanup function that clears all allocated identifier memory at interpreter shutdown. Please read the source.

    Oh, great! I never noticed _PyUnicode_ClearStaticStrings().

    Call trace: Py_Finalize() -> _PyUnicode_ClearStaticStrings() ->
    _PyUnicode_Fini().

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 7, 2013

    Well, that was one of the motivations of introducing this Py_IDENTIFIER machinery: to be able to cleanup at the end (unlike the static variables that were used before, which couldn't be cleaned up).

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2013

    Antoine, Martin: So, what do you think? Is it worth to move most common identifiers to a single place to not duplicate them?

    If identifiers are already cleared at exit, the advantage would be to initialize duplicated identifiers more quickly, and don't initialize duplicated identifiers multiple times (once per copy).

    If we choose to not share common identifiers, _PyId_xxx identifiers from pythonrun.c must be removed. There are also some identifiers duplicated in the same file which can be moved at the top to remove at least duplicates in a single file, as it was done for other identifiers in issue bpo-19514.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 7, 2013

    Antoine, Martin: So, what do you think? Is it worth to move most common identifiers
    to a single place to not duplicate them?

    Well, worth what? :)
    If you don't tell us what it brings (numbers?), I'm against it.

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2013

    If you don't tell us what it brings (numbers?), I'm against it.

    For performances, it's probably very close to zero speed up. For the memory, it's a few bytes per duplicated identifier.

    @pitrou
    Copy link
    Member

    pitrou commented Nov 7, 2013

    > If you don't tell us what it brings (numbers?), I'm against it.

    For performances, it's probably very close to zero speed up. For
    the memory, it's a few bytes per duplicated identifier.

    Well, then IMHO it's not worth it.

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2013

    Well, then IMHO it's not worth it.

    Ok, you are probably right :-)

    @andrei: Are you interested to work on a patch to remove identifiers duplicated in the same file?

    @elixir
    Copy link
    Mannequin

    elixir mannequin commented Nov 7, 2013

    @andrei: Are you interested to work on a patch to remove identifiers duplicated in the same file?

    Yes, I will provide a patch in a day or two.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 7, 2013

    New changeset 01c4a0af73cf by Victor Stinner in branch 'default':
    Issue bpo-19512, bpo-19515: remove shared identifiers, move identifiers where they
    http://hg.python.org/cpython/rev/01c4a0af73cf

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 7, 2013

    New changeset 01c4a0af73cf by Victor Stinner in branch 'default':
    Issue bpo-19512, bpo-19515: remove shared identifiers, move identifiers where they
    http://hg.python.org/cpython/rev/01c4a0af73cf

    This changeset removes some identifiers duplicated in the same file.

    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Nov 8, 2013

    I'd be +0 on extracting common identifiers. I see a slight increase of readability, and expect a slight reduction of memory usage, and a tiny reduction in runtime. It's not really worth the effort, but I fail to see that it causes harm.

    I see no point in reverting cases where this approach is already taken. I don't quite understand Victor's interest in this, either, as there are hundreds of open real bugs that could use his attention.

    @vstinner
    Copy link
    Member Author

    vstinner commented Nov 8, 2013

    2013/11/8 Martin v. Löwis <report@bugs.python.org>:

    I'd be +0 on extracting common identifiers. I (...) expect a slight reduction of memory usage, and a tiny reduction in runtime.

    Only duplicated Py_IDENTIFIER structures would be removed in memory,
    but these structures are very small (3 pointers or something like
    that). The identifier strings are already interned, so not duplicated
    in memory.

    It's not really worth the effort, but I fail to see that it causes harm.

    Initializing an identifier has to decode the literal byte string from
    UTF-8, but Python UTF-8 decoder is really fast. I'm not sure that it's
    possible to see a difference on the startup time.

    I see no point in reverting cases where this approach is already taken.

    I only reverted shared identifiers added a few days ago in issue
    bpo-19512. I agree to leave the old code unchanged.

    I don't quite understand Victor's interest in this, either, as there are hundreds of open real bugs that could use his attention.

    I tried to explain my motivation on using more identifiers in the issue bpo-19512.

    @vstinner
    Copy link
    Member Author

    @andrei: Ping?

    @elixir
    Copy link
    Mannequin

    elixir mannequin commented Nov 12, 2013

    Removed _Py_IDENTIFIERs duplicated in the same file, except:

    _Py_IDENTIFIER(replace): [Modules/zipimport.c:563](https://github.com/python/cpython/blob/main/Modules/zipimport.c#L563)
    

    which was enclosed in some #ifdefs.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 12, 2013

    New changeset 518e3b174120 by Victor Stinner in branch 'default':
    Issue bpo-19515: Remove identifiers duplicated in the same file.
    http://hg.python.org/cpython/rev/518e3b174120

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 12, 2013

    New changeset d2209b9f8971 by Victor Stinner in branch 'default':
    Issue bpo-19515: Remove duplicated identifiers in zipimport.c
    http://hg.python.org/cpython/rev/d2209b9f8971

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants