classification
Title: Add `docstring` field to AST nodes
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: benjamin.peterson, georg.brandl, haypo, inada.naoki, jeff.allen, mbussonn, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2017-02-06 14:21 by inada.naoki, last changed 2017-05-29 06:23 by serhiy.storchaka. This issue is now closed.

Files
File name Uploaded Description Edit
ast-docstring.patch inada.naoki, 2017-02-06 14:21 review
ast-docstring-2.patch inada.naoki, 2017-02-08 02:59 review
ast-docstring-3.patch inada.naoki, 2017-02-08 04:18 fix docs review
Pull Requests
URL Status Linked Edit
PR 46 merged inada.naoki, 2017-02-12 09:17
PR 267 merged mbussonn, 2017-02-24 00:58
Messages (21)
msg287136 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-06 14:21
spin off of #11549.

http://bugs.python.org/issue11549#msg130955

> b) Docstring is now an attribute of Module, FunctionDef and ClassDef, > rather than a first statement. Docstring is a special syntactic 
> construction, it's not an executable code, so it makes sense to separate it. Otherwise, optimizer would have to take extra care not to introduce, change or remove docstring. For example:
>
>  def foo():
>    "doc" + "string"
>
>Without optimizations foo doesn't have a docstring. After folding, however, the first statement in foo is a string literal. This means that docstring depends on the level of optimizations. Making it an attribute avoids the problem.
msg287207 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-07 08:28
I like the change because (IMHO) it makes the code simpler, and becase it also changes the first line of code object. I reviewed the patch: need basic unit tests.
msg287215 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-07 09:08
I like this change. Added comments on Rietveld.

Are changes in importlib.h only due to changing first line numbers?
msg287273 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-08 03:36
lnotab is changed too.

-    0,0,0,115,12,0,0,0,8,4,4,2,8,8,8,12,
-    8,25,8,13,114,18,0,0,0,99,0,0,0,0,0,0,
+    0,0,0,115,10,0,0,0,12,6,8,8,8,12,8,25,
+    8,13,114,18,0,0,0,99,0,0,0,0,0,0,0,0,

115 is header for bytes type.
next 4 bytes is it's length (little endian, 12->10 bytes)
and everything after that is slided by 2 bytes.
msg287283 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-08 08:08
def func(): "doc" + "string"

Currently (Python 2.7-3.6), func.__doc__ is None. I suggest to add an unit test for this corner case, even if the result is going to change in a near future. We need to "specify" the expected behaviour, and make sure that we get the same result if optimizations are enabled or not.
msg287287 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-08 09:00
Support adding tests. Tests should cover all cases: module, class, function, coroutine and check also the first line number.

What is the value of co_firstlineno if the function doesn't have any statements?

    def f():
        '''docstring'''
msg287289 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-08 09:08
Oh, I misunderstood something.
patched Python 3.7 and system's Python 3.5 shows same output for code below.
I'll check what is actually changed.

inada-n@x250 ~/w/p/ast-docstring> cat -n x.py 
     1	"""module docstring"""
     2	
     3	def func():
     4	    """func docstring"""
     5	
     6	def func2():
     7	    """func docstring"""
     8	    1+1
     9	
    10	print(func.__code__.co_firstlineno)
    11	print(func.__code__.co_lnotab)
    12	print(func2.__code__.co_firstlineno)
    13	print(func2.__code__.co_lnotab)
inada-n@x250 ~/w/p/ast-docstring> ./python x.py 
3
b''
6
b'\x00\x02'
inada-n@x250 ~/w/p/ast-docstring> /usr/bin/python3 x.py 
3
b''
6
b'\x00\x02'
msg287292 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-08 09:33
2017-02-08 10:08 GMT+01:00 INADA Naoki <report@bugs.python.org>:
>      6  def func2():
>      7      """func docstring"""
>      8      1+1

1+1 is replaced with 2 and lone integer literals are removed by the
peephole optimizer. See also the issue #26204.
msg287293 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-08 09:35
Oops, I spoke too fast :-) "1+1" is not removed.

"1+1" is replaced with "2" by the peephole optimizer, whereas the compiler ignoring constants comes before the peephole optimizer.

One more time, it would be better to implement constant folding at the AST level ;-)

$ python3
Python 3.5.2 (default, Sep 14 2016, 11:28:32) 
>>> def func():
...  "docstring"
...  1+1
... 
>>> import dis
>>> dis.dis(func)
  3           0 LOAD_CONST               3 (2)
              3 POP_TOP
              4 LOAD_CONST               2 (None)
              7 RETURN_VALUE
msg287365 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-08 21:36
This patch affects firstlineno and lnotab of module and class, but not functions.

module:
-<code object <module> at 0x7f053a8f8b70, file "Lib/importlib/_bootstrap.py", line 8>
+<code object <module> at 0x7fdefbf10340, file "Lib/importlib/_bootstrap.py", line 25>
 filename Lib/importlib/_bootstrap.py
-firstlineno: 8
+firstlineno: 25
-lnotab: b'\x04\x11\x04\x02\x08\x08\x08\x07\x04\x02\x04\x03\x10\x04\x0eD\x0e\x15\x0e\x13\x08\x13\x08\x13\x08\x0b\x0e\x08\x08\x0b\x08\x0c\x08\x10\x08$\x0e\x1b\x0ee\x10\x1a\x06\x03\n-\x0e<\x08\x11\x08\x11\x08\x19\x08\x1d\x08\x17\x08\x10\x0eI\x0eM\x0e\r\x08\t\x08\t\n/\x08\x14\x04\x01\x08\x02\x08\x1b\x08\x06\n\x19\x08\x1f\x08\x1b\x12#\x08\x07\x08/'
-  8           0 LOAD_CONST               0 ('Core implementation of import.\n\nThis module is NOT meant to be directly imported! It has been designed such\nthat it can be bootstrapped into Python as the implementation of import. As\nsuch it requires the injection of specific modules and attributes in order to\nwork. One should use importlib as the public-facing version of this module.\n\n')
+lnotab: b'\x04\x00\x04\x02\x08\x08\x08\x07\x04\x02\x04\x03\x10\x04\x0eD\x0e\x15\x0e\x13\x08\x13\x08\x13\x08\x0b\x0e\x08\x08\x0b\x08\x0c\x08\x10\x08$\x0e\x1b\x0ee\x10\x1a\x06\x03\n-\x0e<\x08\x11\x08\x11\x08\x19\x08\x1d\x08\x17\x08\x10\x0eI\x0eM\x0e\r\x08\t\x08\t\n/\x08\x14\x04\x01\x08\x02\x08\x1b\x08\x06\n\x19\x08\x1f\x08\x1b\x12#\x08\x07\x08/'
+ 25           0 LOAD_CONST               0 ('Core implementation of import.\n\nThis module is NOT meant to be directly imported! It has been designed such\nthat it can be bootstrapped into Python as the implementation of import. As\nsuch it requires the injection of specific modules and attributes in order to\nwork. One should use importlib as the public-facing version of this module.\n\n')
               2 STORE_NAME               0 (__doc__)
-
- 25           4 LOAD_CONST               1 (None)
+              4 LOAD_CONST               1 (None)
               6 STORE_GLOBAL             1 (_bootstrap_external)
 

class:
-<code object _ModuleLock at 0x7f053ab61db0, file "Lib/importlib/_bootstrap.py", line 51>
+<code object _ModuleLock at 0x7fdefc61c580, file "Lib/importlib/_bootstrap.py", line 51>
 filename Lib/importlib/_bootstrap.py
 firstlineno: 51
-lnotab: b'\x08\x04\x04\x02\x08\x08\x08\x0c\x08\x19\x08\r'
+lnotab: b'\x0c\x06\x08\x08\x08\x0c\x08\x19\x08\r'
  51           0 LOAD_NAME                0 (__name__)
               2 STORE_NAME               1 (__module__)
               4 LOAD_CONST               0 ('_ModuleLock')
               6 STORE_NAME               2 (__qualname__)
-
- 55           8 LOAD_CONST               1 ('A recursive lock implementation which is able to detect deadlocks\n    (e.g. thread 1 trying to take locks A then B, and thread 2 trying to\n    take locks B then A).\n    ')
+              8 LOAD_CONST               1 ('A recursive lock implementation which is able to detect deadlocks\n    (e.g. thread 1 trying to take locks A then B, and thread 2 trying to\n    take locks B then A).\n    ')
              10 STORE_NAME               3 (__doc__)
msg287366 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-08 21:56
So what's new entry may be:

+* ``Module``, ``FunctionDef``, ``AsyncFunctionDef``, and
+  ``ClassDef`` AST nodes now have a new ``docstring`` attribute.
+  The first statement in their body is not considered as a docstring anymore.
+  This affects ``co_firstlineno`` and ``co_lnotab`` attribute of code object
+  for module and class.
+  (Contributed by Eugene Toder and INADA Naoki in :issue:`29463`.)
+
msg287378 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-09 02:39
Now I doubt about this patch is really good.

docstring of Module and Class generates two bytecode.
So it's a real, executed statement.

  LOAD_CONST  0 ("docstring")
  STORE_NAME  0 ("__doc__")
msg287620 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-02-11 19:47
Therefore we loss the possibility to set a breakpoint on the docstring? It doesn't look a great lost.
msg287634 - (view) Author: Jeff Allen (jeff.allen) * Date: 2017-02-12 07:22
Just terminology ... strictly speaking what you've done here is "add a *field* to the nodes Module, FunctionDef and ClassDef", rather than add an *attribute* -- that is, when one is consistent with the terms used in the ast module (https://docs.python.org/3/library/ast.html#node-classes) or Wang (https://docs.python.org/devguide/compiler.html#wang97).
msg287636 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-12 09:03
Thanks.  I don't familiar with language frontend.
I'll check NEWS entry.
msg288373 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-02-22 16:51
This issue broke the ast API. Copy of the following comment on the PR:
https://github.com/python/cpython/pull/46#issuecomment-281721296
---
Carreau: Thanks for this ! Improvement to the AST are welcome !

Would it have been possible to make the docstring optional ? (It's already breaking things, like IPython).

Should I comment on upstream bpo ?
---

I created the issue #29622 to fix this.
msg288374 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2017-02-22 16:53
thank you for your work on the AST, I know many developers are looking forward to improvement and stabilisation with the hope of having it stable (and documented) in the stdlib at some point. 

The recent change in PR 46 now change (at least) the constructor of `ast.Module` to take a second mandatory parameter (the docstring). 

I know the ast is autogenerated and "use at your own risk". But IPython for example, use `mod = ast.Module([nodes])`, with the second mandatory parameter added to Module that make it incompatible with current Python 3.7. 
Well it's long until it's released, and we can patch things, but I'm sure we are not the only one in this case, and we'd like older version of IPython to still be compatible with Python 3.7, so if  `Module()`'s second parameter (the docstring) could be optional, that would be great. 

I would be happy if it raise a deprecation warning that it will be required in the future.

I'm of course speaking about `Module` because that's the first error I encountered, but I'm guessing it applies to other changed AST nodes. 

Thanks.
msg288399 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-23 03:36
OK, let's continue on #29522, and close this issue.
msg288400 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-02-23 03:37
s/ #29522 / #29622 /
msg290419 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2017-03-24 23:50
New changeset 4c78c527d215c37472145152cb0e95f196cdddc9 by INADA Naoki in branch 'master':
bpo-29622: Make AST constructor to accept less than enough number of positional arguments (GH-249)
https://github.com/python/cpython/commit/4c78c527d215c37472145152cb0e95f196cdddc9
msg290429 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2017-03-24 23:51
New changeset cb41b2766de646435743b6af7dd152751b54e73f by Victor Stinner (INADA Naoki) in branch 'master':
bpo-29463: Add docstring field to some AST nodes. (#46)
https://github.com/python/cpython/commit/cb41b2766de646435743b6af7dd152751b54e73f
History
Date User Action Args
2017-05-29 06:23:05serhiy.storchakasetpull_requests: - pull_request584
2017-05-29 06:22:51serhiy.storchakasetpull_requests: - pull_request842
2017-03-31 16:36:08dstufftsetpull_requests: + pull_request842
2017-03-24 23:51:54hayposetmessages: + msg290429
2017-03-24 23:50:02inada.naokisetmessages: + msg290419
2017-03-17 21:00:32larrysetpull_requests: + pull_request584
2017-02-24 00:58:15mbussonnsetpull_requests: + pull_request238
2017-02-23 03:37:21inada.naokisetmessages: + msg288400
2017-02-23 03:36:33inada.naokisetstatus: open -> closed
resolution: fixed
messages: + msg288399

stage: patch review -> resolved
2017-02-22 16:53:35mbussonnsetnosy: + mbussonn
messages: + msg288374
2017-02-22 16:51:09hayposetmessages: + msg288373
2017-02-12 09:17:11inada.naokisetpull_requests: + pull_request46
2017-02-12 09:03:11inada.naokisetmessages: + msg287636
title: Add `docstring` attribute to AST nodes -> Add `docstring` field to AST nodes
2017-02-12 07:22:50jeff.allensetnosy: + jeff.allen
messages: + msg287634
2017-02-11 19:47:59serhiy.storchakasetmessages: + msg287620
2017-02-09 02:39:16inada.naokisetmessages: + msg287378
2017-02-08 21:56:37inada.naokisetmessages: + msg287366
2017-02-08 21:36:34inada.naokisetmessages: + msg287365
2017-02-08 09:35:53hayposetmessages: + msg287293
2017-02-08 09:33:48hayposetmessages: + msg287292
2017-02-08 09:08:12inada.naokisetmessages: + msg287289
2017-02-08 09:00:10serhiy.storchakasetmessages: + msg287287
2017-02-08 08:08:59hayposetmessages: + msg287283
2017-02-08 04:20:49inada.naokisettitle: Change docstring to attribute from first statement. -> Add `docstring` attribute to AST nodes
2017-02-08 04:18:19inada.naokisetfiles: + ast-docstring-3.patch
2017-02-08 03:36:06inada.naokisetmessages: + msg287273
2017-02-08 02:59:29inada.naokisetfiles: + ast-docstring-2.patch
2017-02-07 09:08:06serhiy.storchakasetnosy: + georg.brandl, serhiy.storchaka, benjamin.peterson
messages: + msg287215

components: + Interpreter Core
type: enhancement
stage: patch review
2017-02-07 08:28:44hayposetmessages: + msg287207
2017-02-07 04:02:51inada.naokilinkissue29469 dependencies
2017-02-06 14:21:16inada.naokicreate