Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The __module__ attribute of non-heap classes is not interned #70043

Closed
serhiy-storchaka opened this issue Dec 13, 2015 · 15 comments
Closed

The __module__ attribute of non-heap classes is not interned #70043

serhiy-storchaka opened this issue Dec 13, 2015 · 15 comments
Assignees
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 25856
Nosy @rhettinger, @pitrou, @avassalotti, @benjaminp, @serhiy-storchaka
Files
  • intern___module__.patch
  • intern_and_cache___module__.patch
  • intern_and_cache___module__2.patch
  • intern_and_cache___module__3.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2016-09-09.21:58:55.242>
    created_at = <Date 2015-12-13.16:17:44.233>
    labels = ['extension-modules', 'type-feature']
    title = 'The __module__ attribute of non-heap classes is not interned'
    updated_at = <Date 2016-09-09.21:58:55.241>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2016-09-09.21:58:55.241>
    actor = 'serhiy.storchaka'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2016-09-09.21:58:55.242>
    closer = 'serhiy.storchaka'
    components = ['Extension Modules']
    creation = <Date 2015-12-13.16:17:44.233>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['41356', '41444', '44431', '44445']
    hgrepos = []
    issue_num = 25856
    keywords = ['patch']
    message_count = 15.0
    messages = ['256322', '256694', '257171', '274799', '274842', '274857', '274861', '274862', '274863', '274865', '275166', '275175', '275176', '275460', '275464']
    nosy_count = 6.0
    nosy_names = ['rhettinger', 'pitrou', 'alexandre.vassalotti', 'benjamin.peterson', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue25856'
    versions = ['Python 3.6']

    @serhiy-storchaka
    Copy link
    Member Author

    One of purposes of the STACK_GLOBAL opcode introduced in pickle protocol 4 is to avoid repeating module name for different globals of the same module.

    >>> pickletools.dis(pickletools.optimize(pickle.dumps([sys.getsizeof, sys.intern], 4)))
        0: \x80 PROTO      4
        2: \x95 FRAME      33
       11: ]    EMPTY_LIST
       12: (    MARK
       13: \x8c     SHORT_BINUNICODE 'sys'
       18: \x94     MEMOIZE    (as 0)
       19: \x8c     SHORT_BINUNICODE 'getsizeof'
       30: \x93     STACK_GLOBAL
       31: h        BINGET     0
       33: \x8c     SHORT_BINUNICODE 'intern'
       41: \x93     STACK_GLOBAL
       42: e        APPENDS    (MARK at 12)
       43: .    STOP
    highest protocol among opcodes = 4

    But this doesn't work with the itertools module.

    >>> pickletools.dis(pickletools.optimize(pickle.dumps([itertools.chain, itertools.accumulate], 4)))
        0: \x80 PROTO      4
        2: \x95 FRAME      47
       11: ]    EMPTY_LIST
       12: (    MARK
       13: \x8c     SHORT_BINUNICODE 'itertools'
       24: \x8c     SHORT_BINUNICODE 'chain'
       31: \x93     STACK_GLOBAL
       32: \x8c     SHORT_BINUNICODE 'itertools'
       43: \x8c     SHORT_BINUNICODE 'accumulate'
       55: \x93     STACK_GLOBAL
       56: e        APPENDS    (MARK at 12)
       57: .    STOP
    highest protocol among opcodes = 4

    That is because the __module__ attribute of itertools members is not interned.

    >>> sys.getsizeof.__module__ is sys.intern.__module__
    True
    >>> itertools.chain.__module__ is itertools.chain.__module__
    False

    In addition to inefficient pickle this perhaps leads to small performance hit on accessing the __module__ attribute or using its value as dictionary key.

    @serhiy-storchaka serhiy-storchaka added performance Performance or resource usage extension-modules C modules in the Modules dir labels Dec 13, 2015
    @serhiy-storchaka
    Copy link
    Member Author

    Actually this is not specific to itertools. Every non-heap class not from the builtins module is affected.

    Proposed patch just interns the __module__ value. The patch also cleans up the code.

    @serhiy-storchaka serhiy-storchaka changed the title The __module__ attribute of itertools members is not interned The __module__ attribute of non-heap classes is not interned Dec 18, 2015
    @serhiy-storchaka
    Copy link
    Member Author

    Interning the __module__ string causes small performance hit:

    $ ./python -m timeit -s "from itertools import chain" -- "chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__"

    Unpatched: 1.93 usec per loop
    Patched: 4.09 usec per loop

    This can be avoided if cache created string in type's __dict__. Following patch makes __module__ retrieving for non-heap types as fast as for heap types:

    Patched2: 0.871 usec per loop

    @serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement and removed performance Performance or resource usage labels Dec 29, 2015
    @serhiy-storchaka
    Copy link
    Member Author

    Could anyone please make a review?

    @serhiy-storchaka serhiy-storchaka self-assigned this Sep 7, 2016
    @benjaminp
    Copy link
    Contributor

    I don't understand this code:
    type->tp_dict && PyDict_Check(type->tp_dict)
    since the code explicitly assume it's not NULL and access it as a dict earlier in the function

    @serhiy-storchaka
    Copy link
    Member Author

    Good catch Benjamin! There is yet one bug -- the type type already has the __module__ attribute, but it is a descriptor, not a string.

    Updated patch fixes these bugs.

    @benjaminp
    Copy link
    Contributor

    I don't understand why you need to check the validity of tp_dict at all. We generally assume it's a dict.

    @serhiy-storchaka
    Copy link
    Member Author

    Indeed, the PyDict_Check() check can be omitted.

    @benjaminp
    Copy link
    Contributor

    I don't think it can be NULL either.

    On Wed, Sep 7, 2016, at 11:36, Serhiy Storchaka wrote:

    Serhiy Storchaka added the comment:

    Indeed, the PyDict_Check() check can be omitted.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue25856\>


    @serhiy-storchaka
    Copy link
    Member Author

    It can be NULL in very rare cases. See for example bpo-26906.

    @benjaminp
    Copy link
    Contributor

    I think it's better not write into the type dict in a getter. You might just use PyUnicode_InternFromString every time.

    FWIW, __name__ also has this behavior.

    @serhiy-storchaka
    Copy link
    Member Author

    This is what my first path does. But this slows down retrieving the __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I haven't bother.

    Interning __name__ and __qualname__ is less important, because different functions and classes usually have different names.

    @benjaminp
    Copy link
    Contributor

    I'm not too worried about slowing down __module__ especially since it's
    not any slower for heap types or types in builtins.

    On Thu, Sep 8, 2016, at 14:39, Serhiy Storchaka wrote:

    Serhiy Storchaka added the comment:

    This is what my first path does. But this slows down retrieving the
    __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I
    haven't bother.

    Interning __name__ and __qualname__ is less important, because different
    functions and classes usually have different names.

    ----------


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue25856\>


    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 9, 2016

    New changeset 861ddad3e0c1 by Serhiy Storchaka in branch 'default':
    Issue bpo-25856: The __module__ attribute of extension classes and functions
    https://hg.python.org/cpython/rev/861ddad3e0c1

    @serhiy-storchaka
    Copy link
    Member Author

    Thank you for helpful review Benjamin.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants