classification
Title: The __module__ attribute of non-heap classes is not interned
Type: enhancement Stage: resolved
Components: Extension Modules Versions: Python 3.6
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: alexandre.vassalotti, benjamin.peterson, pitrou, python-dev, rhettinger, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2015-12-13 16:17 by serhiy.storchaka, last changed 2016-09-09 21:58 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
intern___module__.patch serhiy.storchaka, 2015-12-18 16:49 review
intern_and_cache___module__.patch serhiy.storchaka, 2015-12-29 08:23 review
intern_and_cache___module__2.patch serhiy.storchaka, 2016-09-07 11:50 review
intern_and_cache___module__3.patch serhiy.storchaka, 2016-09-07 18:19 review
Messages (15)
msg256322 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-13 16:17
One of purposes of the STACK_GLOBAL opcode introduced in pickle protocol 4 is to avoid repeating module name for different globals of the same module.

>>> pickletools.dis(pickletools.optimize(pickle.dumps([sys.getsizeof, sys.intern], 4)))
    0: \x80 PROTO      4
    2: \x95 FRAME      33
   11: ]    EMPTY_LIST
   12: (    MARK
   13: \x8c     SHORT_BINUNICODE 'sys'
   18: \x94     MEMOIZE    (as 0)
   19: \x8c     SHORT_BINUNICODE 'getsizeof'
   30: \x93     STACK_GLOBAL
   31: h        BINGET     0
   33: \x8c     SHORT_BINUNICODE 'intern'
   41: \x93     STACK_GLOBAL
   42: e        APPENDS    (MARK at 12)
   43: .    STOP
highest protocol among opcodes = 4

But this doesn't work with the itertools module.

>>> pickletools.dis(pickletools.optimize(pickle.dumps([itertools.chain, itertools.accumulate], 4)))
    0: \x80 PROTO      4
    2: \x95 FRAME      47
   11: ]    EMPTY_LIST
   12: (    MARK
   13: \x8c     SHORT_BINUNICODE 'itertools'
   24: \x8c     SHORT_BINUNICODE 'chain'
   31: \x93     STACK_GLOBAL
   32: \x8c     SHORT_BINUNICODE 'itertools'
   43: \x8c     SHORT_BINUNICODE 'accumulate'
   55: \x93     STACK_GLOBAL
   56: e        APPENDS    (MARK at 12)
   57: .    STOP
highest protocol among opcodes = 4

That is because the __module__ attribute of itertools members is not interned.

>>> sys.getsizeof.__module__ is sys.intern.__module__
True
>>> itertools.chain.__module__ is itertools.chain.__module__
False

In addition to inefficient pickle this perhaps leads to small performance hit on accessing the __module__ attribute or using its value as dictionary key.
msg256694 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-18 16:49
Actually this is not specific to itertools. Every non-heap class not from the builtins module is affected.

Proposed patch just interns the __module__ value. The patch also cleans up the code.
msg257171 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-12-29 07:43
Interning the __module__ string causes small performance hit:

$ ./python -m timeit -s "from itertools import chain" -- "chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__; chain.__module__"

Unpatched:  1.93 usec per loop
Patched:    4.09 usec per loop

This can be avoided if cache created string in type's __dict__. Following patch makes __module__ retrieving for non-heap types as fast as for heap types:

Patched2:   0.871 usec per loop
msg274799 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 11:43
Could anyone please make a review?
msg274842 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-07 17:18
I don't understand this code:
   type->tp_dict && PyDict_Check(type->tp_dict)
since the code explicitly assume it's not NULL and access it as a dict earlier in the function
msg274857 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 18:19
Good catch Benjamin! There is yet one bug -- the type type already has the __module__ attribute, but it is a descriptor, not a string.

Updated patch fixes these bugs.
msg274861 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-07 18:30
I don't understand why you need to check the validity of tp_dict at all. We generally assume it's a dict.
msg274862 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 18:36
Indeed, the PyDict_Check() check can be omitted.
msg274863 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-07 18:40
I don't think it can be NULL either.

On Wed, Sep 7, 2016, at 11:36, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> Indeed, the PyDict_Check() check can be omitted.
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue25856>
> _______________________________________
msg274865 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-07 18:45
It can be NULL in very rare cases. See for example issue26906.
msg275166 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-08 21:29
I think it's better not write into the type dict in a getter. You might just use PyUnicode_InternFromString every time.

FWIW, __name__ also has this behavior.
msg275175 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-08 21:39
This is what my first path does. But this slows down retrieving the __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I haven't bother.

Interning __name__ and __qualname__ is less important, because different functions and classes usually have different names.
msg275176 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2016-09-08 21:42
I'm not too worried about slowing down __module__ especially since it's
not any slower for heap types or types in builtins.

On Thu, Sep 8, 2016, at 14:39, Serhiy Storchaka wrote:
> 
> Serhiy Storchaka added the comment:
> 
> This is what my first path does. But this slows down retrieving the
> __module__ attribute (from 0.2 usec to 0.4 usec on my computer). Maybe I
> haven't bother.
> 
> Interning __name__ and __qualname__ is less important, because different
> functions and classes usually have different names.
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <http://bugs.python.org/issue25856>
> _______________________________________
msg275460 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-09-09 21:55
New changeset 861ddad3e0c1 by Serhiy Storchaka in branch 'default':
Issue #25856: The __module__ attribute of extension classes and functions
https://hg.python.org/cpython/rev/861ddad3e0c1
msg275464 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-09-09 21:58
Thank you for helpful review Benjamin.
History
Date User Action Args
2016-09-09 21:58:55serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg275464

stage: patch review -> resolved
2016-09-09 21:55:21python-devsetnosy: + python-dev
messages: + msg275460
2016-09-08 21:42:50benjamin.petersonsetmessages: + msg275176
2016-09-08 21:39:35serhiy.storchakasetmessages: + msg275175
2016-09-08 21:29:36benjamin.petersonsetmessages: + msg275166
2016-09-07 18:45:24serhiy.storchakasetmessages: + msg274865
2016-09-07 18:40:12benjamin.petersonsetmessages: + msg274863
2016-09-07 18:36:12serhiy.storchakasetmessages: + msg274862
2016-09-07 18:30:00benjamin.petersonsetmessages: + msg274861
2016-09-07 18:19:55serhiy.storchakasetfiles: + intern_and_cache___module__3.patch

messages: + msg274857
2016-09-07 17:18:43benjamin.petersonsetnosy: + benjamin.peterson
messages: + msg274842
2016-09-07 11:50:55serhiy.storchakasetfiles: + intern_and_cache___module__2.patch
2016-09-07 11:43:10serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg274799
2015-12-29 08:23:12serhiy.storchakasetfiles: + intern_and_cache___module__.patch
2015-12-29 08:22:54serhiy.storchakasetfiles: - intern_and_cache___module__.patch
2015-12-29 07:43:41serhiy.storchakasetfiles: + intern_and_cache___module__.patch
type: performance -> enhancement
messages: + msg257171
2015-12-18 16:49:06serhiy.storchakasetfiles: + intern___module__.patch
title: The __module__ attribute of itertools members is not interned -> The __module__ attribute of non-heap classes is not interned
messages: + msg256694

keywords: + patch
stage: patch review
2015-12-13 16:17:44serhiy.storchakacreate