classification
Title: Doc strings no longer stored in body of AST
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: Benjamin Ragan-Kelley, Mark.Shannon, benjamin.peterson, brett.cannon, eitan.adler, flherne, georg.brandl, inada.naoki, lukasz.langa, mbussonn, minrk, ncoghlan, ned.deily, rhettinger, serhiy.storchaka
Priority: release blocker Keywords: 3.7regression, patch

Created on 2018-02-22 12:20 by Mark.Shannon, last changed 2018-05-31 06:12 by serhiy.storchaka. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 5927 closed inada.naoki, 2018-02-27 11:58
PR 7121 merged serhiy.storchaka, 2018-05-25 18:20
PR 7197 merged serhiy.storchaka, 2018-05-29 08:31
PR 7272 merged serhiy.storchaka, 2018-05-31 04:53
PR 7273 merged serhiy.storchaka, 2018-05-31 04:53
Messages (55)
msg312557 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2018-02-22 12:20
Python 3.7.0b1+ (heads/3.7:dfa1144, Feb 22 2018, 12:10:59) 

>>> m = ast.parse("def foo():\n   'help'")
>>> m.body[0].body
[]


Correct behaviour (3.6 or earlier)

>>> m = ast.parse("def foo():\n   'help'")
>>> m.body[0].body
[<_ast.Expr object at 0x7fb9eeb1d4e0>]
msg312558 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-02-22 12:28
AST is changed slightly from Python 3.7.
ast.get_docstring() works for both of 3.6 and 3.7.

$ ./python
Python 3.8.0a0 (heads/master-dirty:451d1edaf4, Feb 22 2018, 21:11:54)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ast
>>> m = ast.parse("def foo():\n   'help'")
>>> ast.get_docstring(m.body[0])
'help'
msg312559 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2018-02-22 12:34
This is an unnecessary and breaking change.
Changes like this should not be made unless necessary to fix a bug.
msg312561 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-02-22 12:39
ref: https://bugs.python.org/issue29463
msg312562 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2018-02-22 12:47
That issue has to do with bytecode generation, not the AST.

The AST should be an accurate representation of the Python source code.
Making a better representation of the source would be fine, but this makes it worse.

Doc-strings may be semantically distinct from other expressions, but they are syntactically the same.
msg312564 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-02-22 13:11
We implemented AST-level constant folding (#29469) based on the AST change (#29463).

I feel it's possible to revert AST change without reverting AST constant folding:

* Docstring is exposed from AST before calling AST optimizer.
* AST optimizer removes unused constant.

But I don't remember how difficult it is because since this change is made one year ago (GH-46)...
msg312565 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-22 13:15
This isn't a bug, but a feature. You no longer need to check and skip the first statement if it is a literal string. The body attribute now always represents a sequence of statements, and the docstring attribute represents a docstring.
msg312567 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2018-02-22 14:14
Stating that "this is a feature not a bug" does not make it so.
This breaks existing code and reduces the capabilities of the `ast` module.
For example, how does one get the location of the docstring now?

From a syntactic point of view.
def foo():
   "help"
and
def foo():
   b'"help"
barely differ.
The S in AST stands for Syntax not Semantics.
msg312580 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-02-22 18:42
It seems we have a difference of opinion here.  Serhiy closed this issue so, Mark, if you feel strongly enough to pursue it, you should reopen it and solicit other opinions.  The clock has just about run out to change the now current behavior for 3.7.0.
msg312588 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-22 19:24
It may be worth to include docstrings explicitly in the grammar.
msg312625 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2018-02-23 10:16
Serhiy, thanks for reopening this issue.

It seems to be that there are three reasonable choices:
1. Revert to 3.6 behaviour, with the addition of `docstring` attribute.
2. Change the docstring attribute to an AST node, possibly by modifying the grammar.
3. Do nothing.

I would prefer 1, as it requires no changes to 3rd party code and doesn't present an additional obstacle when porting from Python 2.

2 would be acceptable, as it allows tools to easily convert the body back to its 3.6 form (or vice-versa)

3 is a pain as it involves re-tokenizing the file to get the location of the doc-string.
msg312629 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-23 11:19
I prefer 2 or 3. There are benefits from representing a docstring as a separate attribute. This simplifies the code for walking the AST tree (no longer special cases for skipping docstring in modules, classes and functions) and the code for retrieving a docstring. It solves the problem that an expression resulting to a constant string (like "a"+"b" or f"a") shouldn't be interpreted as docstrings.

The position of a docstring can be useful for determining the position of fragments inside a docstring (for example for doctests). Several active developed third-party libraries (like pyflakes, see https://github.com/PyCQA/pyflakes/pull/273) already updated their code for supporting 3.7. The position of nodes preceding or following a docstring could be used. This is not perfect, it doesn't work with empty lines before or after docstring, but it never was perfect due to escaped newlines in string literals and line continuations.
msg312630 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-23 11:29
Note, that getting the location of the dostring already was a pain since in case of multiline string literals CPython saves the location of the last line, and PyPy saves the location of the first line. Getting the location of the specific fragment of the docstring is even larger pain as shown in following examples:

    def f()
        """
        >>> f()
        """

    def f()
        """\
        >>> f()
        """

    def f()
        ""\
        ""\
        ">>> f()"\
        ""\
        ""
msg312856 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-02-25 20:41
Is this going to get resolved in time for 3.7.0b2?  If we need to change this behavior, I really do not want to delay this for b3 which is when the 3.7.0 ABI freezes; otherwise such a major behavior change would likely need to wait until 3.8.
msg312989 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-02-27 09:38
I tried this patch:

diff --git a/Python/ast.c b/Python/ast.c
index e2092f0f85..93be2bc839 100644
--- a/Python/ast.c
+++ b/Python/ast.c
@@ -3537,9 +3537,9 @@ docstring_from_stmts(asdl_seq *stmts)
         if (s->kind == Expr_kind && s->v.Expr.value->kind == Str_kind) {
             string doc = s->v.Expr.value->v.Str.s;
             /* not very efficient, but simple */
-            memmove(&asdl_seq_GET(stmts, 0), &asdl_seq_GET(stmts, 1),
-                    (stmts->size - 1) * sizeof(void*));
-            stmts->size--;
+            //memmove(&asdl_seq_GET(stmts, 0), &asdl_seq_GET(stmts, 1),
+            //        (stmts->size - 1) * sizeof(void*));
+            //stmts->size--;
             return doc;
         }
     }

But I got "SyntaxError: from __future__ imports must occur at the beginning of the file".

docstring is very special while it looks like just a single string literal statement...
msg312999 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-02-27 12:01
I'm implementing (2).  Please check GH-5927.
msg313000 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-27 12:17
Would be nice to discuss this design question on Python-list.
msg313001 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-02-27 12:45
Python-list? -dev? -ideas?
msg313002 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-02-27 12:51
Sorry, Python-Dev, of course.
msg313005 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-02-27 13:17
Since we are already past the 3.7.0b2 cutoff time and there does not seen to be a consensus that the current 3.7 behavior needs to change and the proposed change is quite large, I think we should not change anything now for b2.  You can have a discussion on python-dev and, if there is a consensus for change, we can look at it for b3.
msg313315 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-03-06 10:19
There are no enough +1 for merging GH-5927 yet on the ML discussion.
https://mail.python.org/pipermail/python-dev/2018-February/152311.html

@Mark.Shannon, would you comment your opinion to the ML thread?

For now, I don't want to break tools supporting Python 3.7 a1~b2 already.
Although I will select (2) if there is time machine...
msg313698 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2018-03-12 20:43
Python's AST already doesn't include a lot of syntax that has no runtime effect:
- comments;
- whitespace and punctuation.

The situation is now made "worse" with constant folding.  You won't have full line information on the full expression, you don't even fully *see* expressions as syntactically provided by user code.  The docstring is just a special case of that.

As a counter example, it still includes function-level type annotations which are not evaluated. Clearly we don't have an explicit rule about this.

If you asked *me*, I would say Python provides a "lossy" AST and this is the one it itself uses.  It also provides a "lossless" AST and this one is in lib2to3.  It's unfortunate that the latter is not documented and kind of neglected.  This is something I want to address soonish.

Summing up, I'm in favor of option 3.  The AST preserves the docstring, it simply doesn't have information on where exactly it lives.  In exchange, the tree makes it easier to *find* the docstring and easier to find the first instruction (as Serhiy points out, the first statement is now always code to be executed).

Quoting literally the second sentence of the `ast` documentation:
> The abstract syntax itself might change with each Python release.

I am not against Option 2 since it's already implemented but I feel it's a pretty big change, pretty late in the release process for 3.7.  The amount of work required *just* to preserve the line number of the docstring (which can be reasonably guessed anyway) seems silly to me.
msg313997 - (view) Author: Mark Shannon (Mark.Shannon) * (Python committer) Date: 2018-03-17 12:23
I still think that option 2 would be best, but given that it is late in the release cycle, I will accept the status quo.

A couple of comments for the record:

Constant folding occurs after AST creation, so doesn't make the AST worse.

The parse tree created by lib2to3 is a concrete parse tree, not an AST. However, it might would sense (as Łukasz suggests) to use it as a basis for an AST designed for tooling and leave the AST generated by the C parser for bytecode generation.


Happy to close this issue now, unless anyone else has something to add.
msg314056 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-03-18 20:46
Thanks everyone for your inputs.  Let's go with the status quo -> closing this issue.
msg315238 - (view) Author: Francis Herne (flherne) Date: 2018-04-13 00:40
Note:

Because this also applies to module-level docstrings, a rather strange effect that, for example, `ast.parse("'foo'")` is now an empty module.

While the root node is always an instance of `ast.Module`, in practice `ast.parse()` is often used with smaller fragments of source code that don't directly correspond to a complete module, and this behaviour makes no sense in such cases.

The resulting bug in kdev-python took a while to track down (because I had no immediate reason to suspect this change), and will be somewhat awkward to workaround.

I would prefer that this be reverted; it's likely to break a variety of users in strange ways.
msg315240 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-04-13 04:58
> in practice `ast.parse()` is often used with smaller fragments of source code that don't directly correspond to a complete module, and this behaviour makes no sense in such cases.
>
> The resulting bug in kdev-python took a while to track down (because I had no immediate reason to suspect this change), and will be somewhat awkward to workaround.

Would you elaborate it more, please?
Without concrete example, we cannot to decide how it's important problem.

> I would prefer that this be reverted; it's likely to break a variety of users in strange ways.

I don't want to revert, see discussions in this issue and PR-5927.

And would you give us real world example of "variety of users"?
How many applications use ast.parse for such way?
msg315581 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2018-04-21 20:21
Inadasan, what Francis is referring to is that people often use `ast.parse()` to parse code fragments, not entire modules.

For example:

  >>> some_code = '"just a string"'
  >>> some_code_ast = ast.parse(some_code)

The way `ast.parse()` works, it always considers the input to be a module:

  >>> some_code_ast
  <_ast.Module object at 0x109537a18>

Before Python 3.7, you could parse out a string from such "module":

  >>> some_code.body
  [<_ast.Expr object at 0x1079aaa20>]
  >>> some_code.body[0].value
  <_ast.Str object at 0x107a45198>

However, starting with Python 3.7, this module is empty:

  >>> some_code_ast.body
  []
  >>> some_code_ast.docstring
  'just a string'

`ast.literal_eval()` does not have this problem.


And now: where is the bug?  I think the user code has a bug trying to parse a code fragment with `mode="exec"`.  Instead of using the default `mode="exec"`, it seems to me `mode="single"` fits the bill better.  In some cases the programmer might prefer `mode="eval"` if they don't want to allow statements at all.

Inadasan, I think what we should do is to amend `ast.parse()` and `compile()` docs that mode="exec" treats given code as a module, we should even give the docstring handling as an example.

Francis, if your use case requires passing multiple statements/expressions at once and a string in the first statement has semantic meaning, please provide us with a concrete example.
msg315582 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2018-04-21 20:28
Oh, and Francis, you might actually want to use lib2to3.pytree instead, which will provide you with a Concrete Syntax Tree ("lossless AST").  That tree will keep information like whitespace, comments, and other semantically insignificant tokens like commas and parentheses.

This tree will never modify your input whereas in the AST we might be doing more modifications in the future (like cutting `if` branches that we statically know are always False, and so on).

Granted, the lib2to3 tree is a bit awkward to query since the concrete tree exposes the raw Python grammar and not the simplified AST grammar.
msg317450 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-23 20:13
Maybe we will need to return to this issue in future and merge Inada's patch. IPython has been needed to add a workaround for this change. [1] And maybe Python will be needed to add that workaround if once it will add the support of pasting multiple complex statements without empty lines between.

[1] https://github.com/ipython/ipython/issues/11133
msg317491 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-05-24 01:22
If Python 3.7 is released with current AST form, I don't want to change it again.

I prefer

* Merge my patch in 3.7rc1, or
* Add compile(..., "multi") mode for "sequence of statements"
msg317492 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-24 01:24
> * Merge my patch in 3.7rc1, or

Sorry but we are not going to revisit this decision for 3.7.0.  It's too late!
msg317493 - (view) Author: STINNER Victor (vstinner) * (Python committer) Date: 2018-05-24 01:32
IMHO Python 3.7 will be simpler to process, since it avoids to check if the first node is a docstring or not.

I agree that it's a backward incompatible, but it has been done on purpose. In fact, there is no warranty about backward compatibility on the genreated AST between Python versions (like 3.6 vs 3.7).

As Lukasz wrote, AST of Python 3.6 already omit a lot of information about formatting, we are far from a 1:1 mapping between code and AST. For example, tools to "unparse" AST breaks the formatting. The Python 'ast' module is just not appropriate for such use case. There are other solutions for that, like lib2to3.

Having a docstring attribute *and* a docstring node seem to be redundant to be, or more like a bug. Redundant data can quickly lead to inconsistencies.

The current trend is more to move the AST away from the original Python source code, to produce better AST, especially in term of performance. That's also why I added the ast.Constant node type ;-) And why many peephole optimizations have been reimplemented at the AST level, rather than on the bytecode level.
msg317654 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-05-25 03:15
Regardless of the value of .docstring change, it seems it had some unexpected consequences. I think providing a new mode for compile mulitline statements is a reasonable way to address those consequences going forward. That said, it's extremely late in the lifecycle of 3.7 to be thinking about new APIs. How about rolling back the whole .docstring change for 3.7? Then, it can be landed again with compile(..., mode='multiline) for 3.8.
msg317655 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-25 03:38
For the record, Serhiy commented on this issue today in a thread on the python-committers list:

"I have doubts about two issues. I feel the responsibility for them because I had the opportunity to solve them before, but I lost it.

1. Changes in the AST. Few third-party projects was broken by it and already are fixed. I suppose yet few projects will be changed after 3.7 be released. It is interesting that IPython was broken in different way than other projects. It was needed to reintroduce the docstring in the list of statements, effectively reverting the 3.7 change. IPython allows to enter several statements at prompt, and therefore it compiles them with the 'exec' mode instead of 'single' as the CPython REPL and IDLE shell. Currently CPython doesn't allow you to paste arbitrary script like the following:

if a:
    b
if c:
    d

You need to add an empty line between top-level complex statements. If one time CPython will add support of pasting several statements without empty lines between, it might need to add the same hack as IPython. I afraid that we might be needed to change AST again, in 3.7.1 or in 3.8.0."

And later:

"Inada's patch looked complex (actually it mostly restored the code before his previous change). We didn't know about IPython and we decided that it is not worth to change this code at this stage (after beta2). And definitely it will be later to do this after rc1."

Full context/content starting here:
https://mail.python.org/pipermail/python-committers/2018-May/005467.html
msg317657 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-25 04:42
Based on more recent discussions and indirect feedback from downstream users (primarily the recent IPython experience), I am reluctantly re-opening this issue for 3.7.0.  I take responsibility for encouraging us earlier in the beta phase to continue with the feature as it stood, thereby prioritizing schedule over any technical issues.  Although it is now very late in the 3.7.0 cycle, with the newly expressed concerns my opinion has changed: I think we owe it to our downstream users to think about this one more time.

I agree with Benjamin's remark that it is very late in the 3.7 cycle to be considering new or revised APIs for 3.7.0.  So it seems to me we have only two practical alternatives at this point:

A. Proceed with 3.7.0rc1 as planned with the docstrings feature as it now stands and consider API changes for 3.8.

or

B. Revert the feature now for 3.7.0, retargeting for 3.8, and produce a 3.7.0b5 on a somewhat shorter cycle to allow downstream users to adapt to the removal.

I don't know how controversial making this decision will be but I think we need to move quickly as this is now *the* blocker for 3.7.0.  Coming now at the start of a weekend, in some countries a 3-day holiday weekend, I do want to make sure that the major stakeholders involved with this issue get a chance to "vote", although ultimately it will be up to the release manager (me) to make the final decision.  So, to be clear, we will not be deciding here and now what the API should be if we were to retarget for 3.8; please save that discussion for later.  The only question open right now is a vote for either option A (proceed as is) or option B (revert for 3.7 and retarget).  Let's set a limit for "voting" until Tuesday 2018-05-29 18:00 UTC (to cover the holiday weekend) with the proviso that if a clear consensus one way or the other appears before that we may cut the period short and proceed to implemention.

I know this is not a pleasant task especially at this late date.  I apologize to everyone, especially to Inada-san, for dragging this out.  I certainly appreciate the hard work that has gone into this feature so far and look forward to seeing it in either 3.7 or 3.8.
msg317665 - (view) Author: Min RK (minrk) * Date: 2018-05-25 09:59
In the A/B vote, I cast mine for B, for what it is worth, but it is not strongly held.

From the IPython side, I don't view our particular issue as a major regression for users. The only affected case for us is interactively typed string literals in single statement cells not displaying themselves as results. Since the same string is necessarily already displayed in the input, this isn't a huge deal. This is pretty rare (maybe folks do this while investigating unicode issues?) and we can handle it by recompiling empty modules with 'single' instead of the usual 'exec' that we use because most IPython inputs are multi-statement cells coming from things like notebooks. It's relevant to note that *any* logic in the cell, e.g. `"%i" % 1` or additional statements have no issues.

The proposed 'muliline' or 'interactive' compile mode would suit IPython very well, since that's what we really want - single * N, not actually a module, and this is illustrated by the way we do execution: compile with exec, then iterate through module.body and run the nodes one at a time.
msg317671 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-05-25 12:15
Ouch, I'd completely missed the fact that this would affect the parsing of bare strings to a simple AST (I was focused on functions and classes, as in Mark's original example).

So even though I'm the author of https://bugs.python.org/issue11549#msg193656 (where I note that we consider it reasonable for AST manipulation code to require updates when going between major Python versions), I'm reluctantly voting "B", since there's a difference between "some AST manipulation code will need to change to account for new node types and arrangements" and "all code calling ast.parse with the default mode and processing the top level node will need to change to account for docstrings now being omitted from the module body, with no readily available quick fix to get the old behaviour back".

(Note that in voting for option B, I'm really only objecting to the change when it comes to Module AST nodes - rather than full reversion, I'd also be OK with a change that duplicated the new docstring attribute as body[0] for modules, while continuing to eliminate the redundancy for functions and classes - this would be a more selective variant of Mark's "Option 1" proposal from back in February).
msg317683 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2018-05-25 15:45
Łukasz Langa wrote:

> Inadasan, I think what we should do is to amend `ast.parse()` and `compile()` docs that mode="exec" treats given code as a module, we should even give the docstring handling as an example.

That is what I proposed in https://bugs.python.org/issue33477
and in PR https://github.com/python/cpython/pull/6973

The other surprise, is that even is I  knew AST were change and had a docstring field I was not expecting `compile()` to be affected.


I think that the change for ast to have a docstring field is a good one from an API point of view for ast nodes. But It should probably be opt-in, at least with a deprecation period to make it the default. The other issue is that as it affect only sequence of statement where the first node is (was) a string, it can lead to really subtle difference in execution. 



MinRk said:
> The only affected case for us is interactively typed string literals in single statement cells not displaying themselves as results

It might affect our downstream consumers that have syntax/ast transformation like sage, sympy on top of IPython. I doubt we have any but it's a possibility. 

While I would prefer (B) as well, I see how the new behavior can be useful in some case, and strongly also support the addition of a `multiline` mode, which correspond to what 3.6 and before were doing with `exec`. The other possibility is to leave `exec` and `ast.parse` with 3.6 behavior, and include a `'module'` mode that does reflect current 3.7 behavior.
msg317686 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-05-25 17:10
> B. Revert the feature now for 3.7.0, retargeting for 3.8, and produce a 3.7.0b5 on a somewhat shorter cycle to allow downstream users to adapt to the removal.

Please note that it can't be reverted simply.
The change was introduced to ease AST-layer optimization (`"abc" + "def"` is not docstring, but `"abcdef"` is docstring.)
And we implemented some AST-layer optimization already.

There are two ways to revert the change:

B1. Just remove `.docstring` attribute and implement hack to distinguish string and docstring in some places.

B2. Remove `.docstring` attribute, but introduce `DocString` statement.  This was my patch (PR-5927)

In case of B2, thirdparty libraries should follow the change anyway.
In case of IPython, `DocString(s="spam")` may be replaced with `Expr(Str(s="spam"))`

In case of B1, I hadn't tried it yet and I don't know how difficult it is.  I'll try but I can't say estimated time for now.
msg317696 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-25 18:24
PR 7121 is based on PR 5927 and goes one step further. It doesn't introduce a new AST type DocString. Third-party code should not need any adaptation for 3.7.
msg317716 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2018-05-25 20:24
Just to quickly touch on Matthias' question about opt-in or deprecations, a key thing to note is the ast module is automatically generated from the ASDL file, so either of those approaches would require developing a new mechanism in the code generator to support either approach.
msg317718 - (view) Author: Matthias Bussonnier (mbussonn) * Date: 2018-05-25 20:59
> Just to quickly touch on Matthias' question about opt-in or deprecations, a key thing to note is the ast module is automatically generated from the ASDL file, so either of those approaches would require developing a new mechanism in the code generator to support either approach.

Yes, I was mostly thinking of `compile(..., mode, flags=PyAstOnly)` as to where the deprecation could be. 

Thinking a bit more about the `compile`'s `multiline` option, that would also be quite useful to allow some construct that are so far forbidden, like top-level `await`.

I'm thinking that splitting `exec` into two options:
  - `module` which could do what current 3.7 does and find docstrings plus do some optimisation and sanity check if necessary, prevent top-level await. 
  - `multiline` which would treat the source as a sequence of statement, allwo top level `await` ... etc.

this could allow both to evolve and have their individual advantage, leaving `exec` unchanged for legacy reasons.
msg317777 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-05-27 03:21
Just noting that I'm going to attempt an alternative, hopefully lower impact, approach at implementing the reversion, which will be to modify Parser/asdl_c.py to inject the following constructor postamble for ASDL nodes that define both a "body" and a "docstring" field:

1. If docstring is set, also insert it as the first entry in the node's body sequence (this is enough to get AST compilation back to behaving the way it did in 3.6, since the docstrings will be reinserted as `body[0]` even after the AST compilation process took them out)
2. If docstring is not set, check if the first entry is usable as a docstring, and if so, also set it as the docstring (this would restore compatibility with node manipulation code that expects to be able to pass a docstring as body[0])

In 3.8, we'd then decide how to eliminate the duplication of information between the two locations, but do so armed with the ability to emit deprecation warnings for code which isn't setting the node docstring attribute explicitly.
msg317782 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-05-27 04:28
After looking at potential implementation approaches, I believe the following would be the ASDL code generator changes needed to present the docstring as part of the body in the Python API, while keeping it as a separate attribute in the C API:

* in ast2obj, add code in the converter postamble to inject the docstring as body[0]
* in obj2ast, add code in the converter preamble to pop and discard body[0] when docstring is not None
* in ast_type_init, add code to either inject the docstring as body[0] (if the docstring is set), or else to extract and set the docstring based on body[0]

Since CPython's own AST optimizer works at the C API layer, we then wouldn't need to revert any of the changes that relied on the separation of the docstring out to its own attribute.

At the Python level, some differences would still be visible though:

- there'd be a node for the docstring in the AST, but the value would still be optimised out of the resulting code object
- the first line number of code objects would still change to be that of the first non-docstring line
- the synthesised docstring node would always claim to be the line before the first non-docstring line (since escape characters mean we can't just count line feeds in the docstring)

Given the complexity of the required changes to the ASDL code generator, and the Python level API differences that would still remain, I think Serhiy's full reversion patch is going to be the more reliable option at this late stage of the release process.

While it isn't nice for the folks that already adapted their code to cope with the earlier beta releases, it's the lowest risk approach that lets us get 3.7.0 out the door without unintended consequences, while allowing the issue to be revisited for 3.8 with greater awareness of the potential backwards compatibility and code migration issues.
msg317783 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-05-27 07:19
I'm +1 on PR-7121 too.
msg317784 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-27 07:33
OK, since I believe everyone who has spoken up so far has chosen B or a variation on it, I think we can eliminate option A.  And there also seems to be a consensus so far among the core developers who have spoken up for the approach in PR 7121.  Before we commit to it and produce 3.7.0b5, I really would like to hear from at least one of the downsteamers that this approach seems OK for 3.7.0. mbussonn? MIN RK?

Thank you all for your (prompt) help so far with this!
msg317785 - (view) Author: Benjamin Ragan-Kelley (Benjamin Ragan-Kelley) Date: 2018-05-27 08:19
That should work well for us. Our patches for this are all conditional on the module body being empty, so reverting causes us no issues at all.

Thank you!
msg317988 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-29 07:49
New changeset 2641ee5040abb090e66e4ff80c33b76729b36e75 by Serhiy Storchaka in branch '3.7':
bpo-32911: Revert bpo-29463. (GH-7121)
https://github.com/python/cpython/commit/2641ee5040abb090e66e4ff80c33b76729b36e75
msg317992 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-29 09:04
New changeset 73cbe7a01a22d02dbe1ec841e8779c775cad3d08 by Serhiy Storchaka in branch 'master':
bpo-32911: Revert bpo-29463. (GH-7121) (GH-7197)
https://github.com/python/cpython/commit/73cbe7a01a22d02dbe1ec841e8779c775cad3d08
msg318089 - (view) Author: Ned Deily (ned.deily) * (Python committer) Date: 2018-05-29 20:44
Thank you all for your input and thank you, Inada-san and Serhiy, for the PRs.  Is there anything more that needs to go in for 3.7.0b5 tomorrow so that this can be tested before the release candidate?  If not, can we now close this again?  If new issues arise with b5 or later, we should probably track with another issue.
msg318147 - (view) Author: INADA Naoki (inada.naoki) * (Python committer) Date: 2018-05-30 03:50
I think it's OK.
msg318150 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-30 04:05
I concur.
msg318176 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2018-05-30 12:27
In that case, closing this as resolved. Folks interested in the idea of a "suite" compilation mode to compile a series of statements without special-casing the first one as a potential docstring, as well as the idea of a dedicated DocString AST node may want to open new RFEs for 3.8 :)
msg318258 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-31 06:11
New changeset 941ec210aaef517cf818b043ec28345962f78465 by Serhiy Storchaka in branch 'master':
bpo-32911: Add the historical note about the magic number. (GH-7273)
https://github.com/python/cpython/commit/941ec210aaef517cf818b043ec28345962f78465
msg318259 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-05-31 06:12
New changeset d6bbb57855da3f92521edd901f49705be46fb203 by Serhiy Storchaka in branch '3.7':
bpo-32911: Update the historical note about the magic number. (GH-7272)
https://github.com/python/cpython/commit/d6bbb57855da3f92521edd901f49705be46fb203
History
Date User Action Args
2018-05-31 06:12:17serhiy.storchakasetmessages: + msg318259
2018-05-31 06:11:58serhiy.storchakasetmessages: + msg318258
2018-05-31 04:53:13serhiy.storchakasetpull_requests: + pull_request6900
2018-05-31 04:53:00serhiy.storchakasetpull_requests: + pull_request6899
2018-05-30 12:27:57ncoghlansetstatus: open -> closed
resolution: fixed
messages: + msg318176

stage: patch review -> resolved
2018-05-30 04:05:30serhiy.storchakasetmessages: + msg318150
2018-05-30 03:50:33inada.naokisetmessages: + msg318147
2018-05-29 20:44:14ned.deilysetmessages: + msg318089
2018-05-29 09:04:58serhiy.storchakasetmessages: + msg317992
2018-05-29 08:31:28serhiy.storchakasetpull_requests: + pull_request6831
2018-05-29 07:49:21serhiy.storchakasetmessages: + msg317988
2018-05-28 10:27:24vstinnersetnosy: - vstinner
2018-05-27 08:19:54Benjamin Ragan-Kelleysetnosy: + Benjamin Ragan-Kelley
messages: + msg317785
2018-05-27 07:33:15ned.deilysetmessages: + msg317784
2018-05-27 07:19:23inada.naokisetmessages: + msg317783
2018-05-27 04:28:13ncoghlansetmessages: + msg317782
2018-05-27 03:21:30ncoghlansetmessages: + msg317777
2018-05-25 20:59:50mbussonnsetmessages: + msg317718
2018-05-25 20:24:36brett.cannonsetnosy: + brett.cannon
messages: + msg317716
2018-05-25 18:24:50serhiy.storchakasetmessages: + msg317696
2018-05-25 18:20:05serhiy.storchakasetkeywords: + patch
stage: commit review -> patch review
pull_requests: + pull_request6756
2018-05-25 17:10:57inada.naokisetmessages: + msg317686
2018-05-25 15:45:50mbussonnsetnosy: + mbussonn
messages: + msg317683
2018-05-25 12:15:37ncoghlansetnosy: + ncoghlan
messages: + msg317671
2018-05-25 10:41:17eitan.adlersetnosy: + eitan.adler
2018-05-25 09:59:28minrksetnosy: + minrk
messages: + msg317665
2018-05-25 04:42:21ned.deilysetstatus: closed -> open
priority: release blocker
messages: + msg317657

resolution: wont fix -> (no value)
stage: resolved -> commit review
2018-05-25 03:38:07ned.deilysetmessages: + msg317655
2018-05-25 03:15:19benjamin.petersonsetmessages: + msg317654
2018-05-24 01:32:56vstinnersetmessages: + msg317493
2018-05-24 01:24:12ned.deilysetmessages: + msg317492
2018-05-24 01:22:56inada.naokisetmessages: + msg317491
2018-05-23 20:13:55serhiy.storchakasetmessages: + msg317450
2018-04-21 20:28:24lukasz.langasetmessages: + msg315582
2018-04-21 20:21:15lukasz.langasetmessages: + msg315581
2018-04-13 04:58:10inada.naokisetmessages: + msg315240
2018-04-13 00:40:49flhernesetnosy: + flherne
messages: + msg315238
2018-03-18 20:46:22ned.deilysetstatus: open -> closed
priority: deferred blocker -> (no value)
messages: + msg314056

resolution: wont fix
stage: resolved
2018-03-17 12:23:31Mark.Shannonsetmessages: + msg313997
2018-03-12 20:43:58lukasz.langasetnosy: + lukasz.langa
messages: + msg313698
2018-03-06 10:19:00inada.naokisetmessages: + msg313315
2018-02-27 13:17:55ned.deilysetpriority: release blocker -> deferred blocker

messages: + msg313005
2018-02-27 12:51:25serhiy.storchakasetmessages: + msg313002
2018-02-27 12:45:51inada.naokisetmessages: + msg313001
2018-02-27 12:17:45serhiy.storchakasetmessages: + msg313000
2018-02-27 12:01:35inada.naokisetkeywords: - patch

messages: + msg312999
stage: patch review -> (no value)
2018-02-27 11:58:32inada.naokisetkeywords: + patch
stage: patch review
pull_requests: + pull_request5698
2018-02-27 09:38:04inada.naokisetmessages: + msg312989
2018-02-25 20:41:55ned.deilysetpriority: normal -> release blocker

messages: + msg312856
2018-02-23 15:35:19ppperrysettitle: Doc strings omitted from AST -> Doc strings no longer stored in body of AST
2018-02-23 11:29:44serhiy.storchakasetmessages: + msg312630
2018-02-23 11:19:43serhiy.storchakasetmessages: + msg312629
2018-02-23 10:16:55Mark.Shannonsetmessages: + msg312625
2018-02-22 19:24:41serhiy.storchakasetstatus: closed -> open
resolution: fixed -> (no value)
messages: + msg312588

stage: resolved -> (no value)
2018-02-22 18:42:44ned.deilysetnosy: + ned.deily
messages: + msg312580
2018-02-22 14:14:24Mark.Shannonsetmessages: + msg312567
2018-02-22 13:15:32serhiy.storchakasetstatus: open -> closed
resolution: fixed
messages: + msg312565

stage: resolved
2018-02-22 13:11:21inada.naokisetcomponents: + Interpreter Core, - Library (Lib)
2018-02-22 13:11:07inada.naokisetnosy: + georg.brandl, rhettinger, vstinner, benjamin.peterson, serhiy.storchaka
messages: + msg312564
2018-02-22 12:47:02Mark.Shannonsetmessages: + msg312562
2018-02-22 12:39:49inada.naokisetmessages: + msg312561
2018-02-22 12:34:51Mark.Shannonsetmessages: + msg312559
2018-02-22 12:28:52inada.naokisetnosy: + inada.naoki
messages: + msg312558
2018-02-22 12:20:07Mark.Shannoncreate