Title: Consider removing docstrings from co_consts in code objects
Type: resource usage Stage:
Components: Interpreter Core Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: inada.naoki, rhettinger, serhiy.storchaka, terry.reedy
Priority: normal Keywords:

Created on 2019-04-04 01:04 by rhettinger, last changed 2019-04-06 02:28 by inada.naoki.

Messages (6)
msg339422 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-04 01:04
Function objects provide __doc__ as a documented writeable attribute.  However, code objects also have the same information in co_consts[0].  When __doc__ is changed, the latter keeps a reference to the old string.  Also, the disassembly shows that co_consts[0] is never used.  Can we remove the entry in co_consts?  It looks like a compilation artifact rather than something that we need or want.

>>> def f(x):

>>> f.__doc__
>>> f.__code__.co_consts[0]
>>> f.__doc__ = 'z'
>>> f.__code__.co_consts[0]

>>> from dis import dis
>>> dis(f)
  2           0 LOAD_CONST               1 (None)
              2 RETURN_VALUE
msg339427 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-04 03:37
co_consts[0] is used for setting the initial value of __doc__. See PyFunction_NewWithQualName().

    consts = ((PyCodeObject *)code)->co_consts;
    if (PyTuple_Size(consts) >= 1) {
        doc = PyTuple_GetItem(consts, 0);
        if (!PyUnicode_Check(doc))
            doc = Py_None;
        doc = Py_None;
    op->func_doc = doc;
msg339430 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2019-04-04 03:47
> co_consts[0] is used for setting the initial value of __doc__.

Why is __doc__ set this way, but __name__ is set directly on the function object?  Setting __doc__ from the code object seems like an odd implementation hack that puts the responsibility in the wrong place and that leaves a dangling reference when __doc__ is updated.
msg339437 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2019-04-04 08:40
I think it is for historical reasons. Currently statements consisting of a constant expression are not compiled to a bytecode and do not add a value to co_consts. But when this optimization was not yet added, the first element of co_consts with a docstring was a docstring. So why add co_doc if the docstring is already available?

This can be changed, but this is a breaking change, and what we will got instead?

Function's __name__ is set from code object's co_name.
msg339514 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2019-04-05 19:11
So we have the same issue with f.__name__ and f.__code__.co_name becoming unsynchronized.

FWIW, I would prefer that the code docstring be co_doc, rather than hidden in co_constants, so that 'name' and 'doc' follow the same pattern.
msg339523 - (view) Author: Inada Naoki (inada.naoki) * (Python committer) Date: 2019-04-06 02:28
There is idea about reading docstring lazily, when func.__doc__ is accessed.

I don't think the idea can be implemented by 3.8.  But if we change code object now, I want new API can be used to implement this idea.

One breaking change is better than two.
Date User Action Args
2019-04-06 02:28:55inada.naokisetmessages: + msg339523
2019-04-05 19:11:25terry.reedysetnosy: + terry.reedy
messages: + msg339514
2019-04-04 08:40:34serhiy.storchakasetmessages: + msg339437
2019-04-04 04:12:41inada.naokisetnosy: + inada.naoki
2019-04-04 03:47:14rhettingersetmessages: + msg339430
2019-04-04 03:37:00serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg339427
2019-04-04 01:04:15rhettingercreate