The module attribute of non-heap classes is not interned #70043

serhiy-storchaka · 2015-12-13T16:17:44Z

BPO	25856
Nosy	@rhettinger, @pitrou, @avassalotti, @benjaminp, @serhiy-storchaka
Files	intern___module__.patch intern_and_cache___module__.patch intern_and_cache___module__2.patch intern_and_cache___module__3.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = 'https://github.com/serhiy-storchaka'
closed_at = <Date 2016-09-09.21:58:55.242>
created_at = <Date 2015-12-13.16:17:44.233>
labels = ['extension-modules', 'type-feature']
title = 'The __module__ attribute of non-heap classes is not interned'
updated_at = <Date 2016-09-09.21:58:55.241>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2016-09-09.21:58:55.241>
actor = 'serhiy.storchaka'
assignee = 'serhiy.storchaka'
closed = True
closed_date = <Date 2016-09-09.21:58:55.242>
closer = 'serhiy.storchaka'
components = ['Extension Modules']
creation = <Date 2015-12-13.16:17:44.233>
creator = 'serhiy.storchaka'
dependencies = []
files = ['41356', '41444', '44431', '44445']
hgrepos = []
issue_num = 25856
keywords = ['patch']
message_count = 15.0
messages = ['256322', '256694', '257171', '274799', '274842', '274857', '274861', '274862', '274863', '274865', '275166', '275175', '275176', '275460', '275464']
nosy_count = 6.0
nosy_names = ['rhettinger', 'pitrou', 'alexandre.vassalotti', 'benjamin.peterson', 'python-dev', 'serhiy.storchaka']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue25856'
versions = ['Python 3.6']

serhiy-storchaka · 2015-12-13T16:17:42Z

One of purposes of the STACK_GLOBAL opcode introduced in pickle protocol 4 is to avoid repeating module name for different globals of the same module.

>>> pickletools.dis(pickletools.optimize(pickle.dumps([sys.getsizeof, sys.intern], 4)))
    0: \x80 PROTO      4
    2: \x95 FRAME      33
   11: ]    EMPTY_LIST
   12: (    MARK
   13: \x8c     SHORT_BINUNICODE 'sys'
   18: \x94     MEMOIZE    (as 0)
   19: \x8c     SHORT_BINUNICODE 'getsizeof'
   30: \x93     STACK_GLOBAL
   31: h        BINGET     0
   33: \x8c     SHORT_BINUNICODE 'intern'
   41: \x93     STACK_GLOBAL
   42: e        APPENDS    (MARK at 12)
   43: .    STOP
highest protocol among opcodes = 4

But this doesn't work with the itertools module.

>>> pickletools.dis(pickletools.optimize(pickle.dumps([itertools.chain, itertools.accumulate], 4)))
    0: \x80 PROTO      4
    2: \x95 FRAME      47
   11: ]    EMPTY_LIST
   12: (    MARK
   13: \x8c     SHORT_BINUNICODE 'itertools'
   24: \x8c     SHORT_BINUNICODE 'chain'
   31: \x93     STACK_GLOBAL
   32: \x8c     SHORT_BINUNICODE 'itertools'
   43: \x8c     SHORT_BINUNICODE 'accumulate'
   55: \x93     STACK_GLOBAL
   56: e        APPENDS    (MARK at 12)
   57: .    STOP
highest protocol among opcodes = 4

That is because the __module__ attribute of itertools members is not interned.

>>> sys.getsizeof.__module__ is sys.intern.__module__
True
>>> itertools.chain.__module__ is itertools.chain.__module__
False

In addition to inefficient pickle this perhaps leads to small performance hit on accessing the __module__ attribute or using its value as dictionary key.

serhiy-storchaka · 2015-12-18T16:49:06Z

Actually this is not specific to itertools. Every non-heap class not from the builtins module is affected.

Proposed patch just interns the __module__ value. The patch also cleans up the code.

serhiy-storchaka · 2015-12-29T07:43:40Z

Interning the __module__ string causes small performance hit:

$ ./python -m timeit -s "from itertools import chain" -- "chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__"

Unpatched: 1.93 usec per loop
Patched: 4.09 usec per loop

This can be avoided if cache created string in type's __dict__. Following patch makes __module__ retrieving for non-heap types as fast as for heap types:

Patched2: 0.871 usec per loop

serhiy-storchaka · 2016-09-07T11:43:11Z

Could anyone please make a review?

benjaminp · 2016-09-07T17:18:43Z

I don't understand this code:
type->tp_dict && PyDict_Check(type->tp_dict)
since the code explicitly assume it's not NULL and access it as a dict earlier in the function

serhiy-storchaka · 2016-09-07T18:19:55Z

Good catch Benjamin! There is yet one bug -- the type type already has the __module__ attribute, but it is a descriptor, not a string.

Updated patch fixes these bugs.

benjaminp · 2016-09-07T18:30:01Z

I don't understand why you need to check the validity of tp_dict at all. We generally assume it's a dict.

serhiy-storchaka · 2016-09-07T18:36:13Z

Indeed, the PyDict_Check() check can be omitted.

benjaminp · 2016-09-07T18:40:13Z

I don't think it can be NULL either.

On Wed, Sep 7, 2016, at 11:36, Serhiy Storchaka wrote:

Serhiy Storchaka added the comment:

Indeed, the PyDict_Check() check can be omitted.

----------

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue25856\>

serhiy-storchaka · 2016-09-07T18:45:24Z

It can be NULL in very rare cases. See for example bpo-26906.

benjaminp · 2016-09-08T21:29:36Z

I think it's better not write into the type dict in a getter. You might just use PyUnicode_InternFromString every time.

FWIW, __name__ also has this behavior.

serhiy-storchaka · 2016-09-08T21:39:35Z

This is what my first path does. But this slows down retrieving the __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I haven't bother.

Interning __name__ and __qualname__ is less important, because different functions and classes usually have different names.

benjaminp · 2016-09-08T21:42:51Z

I'm not too worried about slowing down __module__ especially since it's
not any slower for heap types or types in builtins.

On Thu, Sep 8, 2016, at 14:39, Serhiy Storchaka wrote:

Serhiy Storchaka added the comment:

This is what my first path does. But this slows down retrieving the
__module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I
haven't bother.

Interning __name__ and __qualname__ is less important, because different
functions and classes usually have different names.

----------

Python tracker <report@bugs.python.org>
<http://bugs.python.org/issue25856\>

python-dev · 2016-09-09T21:55:22Z

New changeset 861ddad3e0c1 by Serhiy Storchaka in branch 'default':
Issue bpo-25856: The __module__ attribute of extension classes and functions
https://hg.python.org/cpython/rev/861ddad3e0c1

serhiy-storchaka · 2016-09-09T21:58:55Z

Thank you for helpful review Benjamin.

serhiy-storchaka added performance Performance or resource usage extension-modules C modules in the Modules dir labels Dec 13, 2015

serhiy-storchaka changed the title ~~The __module__ attribute of itertools members is not interned~~ The __module__ attribute of non-heap classes is not interned Dec 18, 2015

serhiy-storchaka added type-feature A feature request or enhancement and removed performance Performance or resource usage labels Dec 29, 2015

serhiy-storchaka self-assigned this Sep 7, 2016

serhiy-storchaka closed this as completed Sep 9, 2016

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The module attribute of non-heap classes is not interned #70043

The module attribute of non-heap classes is not interned #70043

serhiy-storchaka commented Dec 13, 2015

serhiy-storchaka commented Dec 13, 2015

serhiy-storchaka commented Dec 18, 2015

serhiy-storchaka commented Dec 29, 2015

serhiy-storchaka commented Sep 7, 2016

benjaminp commented Sep 7, 2016

serhiy-storchaka commented Sep 7, 2016

benjaminp commented Sep 7, 2016

serhiy-storchaka commented Sep 7, 2016

benjaminp commented Sep 7, 2016

serhiy-storchaka commented Sep 7, 2016

benjaminp commented Sep 8, 2016

serhiy-storchaka commented Sep 8, 2016

benjaminp commented Sep 8, 2016

python-dev mannequin commented Sep 9, 2016

serhiy-storchaka commented Sep 9, 2016

The __module__ attribute of non-heap classes is not interned #70043

The __module__ attribute of non-heap classes is not interned #70043

Comments

serhiy-storchaka commented Dec 13, 2015

serhiy-storchaka commented Dec 13, 2015

serhiy-storchaka commented Dec 18, 2015

serhiy-storchaka commented Dec 29, 2015

serhiy-storchaka commented Sep 7, 2016

benjaminp commented Sep 7, 2016

serhiy-storchaka commented Sep 7, 2016

benjaminp commented Sep 7, 2016

serhiy-storchaka commented Sep 7, 2016

benjaminp commented Sep 7, 2016

serhiy-storchaka commented Sep 7, 2016

benjaminp commented Sep 8, 2016

serhiy-storchaka commented Sep 8, 2016

benjaminp commented Sep 8, 2016

python-dev mannequin commented Sep 9, 2016

serhiy-storchaka commented Sep 9, 2016

The module attribute of non-heap classes is not interned #70043

The module attribute of non-heap classes is not interned #70043