classification
Title: Caret positioned wrong for SyntaxError reported by ast.c
Type: Stage: resolved
Components: Versions: Python 3.8, Python 3.7, Python 3.6, Python 3.4, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ammar2, benjamin.peterson, emilyemorehouse, eric.smith, gvanrossum, pablogsal, xtreak
Priority: low Keywords: patch

Created on 2018-09-14 17:22 by gvanrossum, last changed 2018-09-25 12:50 by ammar2. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 9338 merged ammar2, 2018-09-15 23:34
Messages (10)
msg325366 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-09-14 17:22
Input file with a subtle error: a number where an assignment target is required:

for 1 in []: pass

Run it, it gives a SyntaxError.  Note how the caret pointing to the incorrect token is position one to the left of where you'd expect it:

  File "s.py", line 1
    for 1 in []: pass
       ^
SyntaxError: can't assign to literal

For every syntax error I've seen that's produced by ast.c this seems to be the case -- the caret is always positioned 1 too far left.

I tried to understand how this is happening but my AST-fu is lacking. It seems this has been happening since ast.c started added column numbers -- in Python 2.7 there's no caret at all, but in 3.4 and later there's a caret and it has the same problem. (Also in 3.3; I don't have 3.2 or older 3.x lying around to test.)
msg325375 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-09-14 17:51
I'm doing some fairly major surgery on line and column numbers for fixing f-string errors. I'll see if I can include this case, too.
msg325380 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-09-14 18:21
I think ast.c is in the right here and there are two complementary bugs in caret printing and the parser.

print_error_text does this:

    while (--offset > 0)
        PyFile_WriteString(" ", f);

which is off-by-one if offset is zero-indexed (which it should be IMO).

The parser has an off-by-one error in the other direction. When it reports an error, it reports the offset of the tokenizer's "cur" buffer. This is generally advanced past the location that actually resulted in a parser error. I believe the parser should be fixed to report the error offset as the offset of the start of the token that caused the error.
msg325382 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2018-09-14 18:42
I think Benjamin's diagnosis is correct. In which case, my f-string error reporting changes (see #34364) won't intersect with a fix to this issue.
msg325463 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-09-15 23:37
Added a PR that is an implementation of Benjamin's suggestion. The column offset returned has been made 0-indexed and errors now point to the start of the parsed token.

This makes errors like `def class` show up as

def class
    ^

instead of

def class
        ^

Also added tests for the original example and some more errors from ast.c
msg326286 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-09-24 21:12
New changeset 025eb98dc0c1dc27404df6c544fc2944e0fa9f3a by Guido van Rossum (Ammar Askar) in branch 'master':
bpo-34683: Make SyntaxError column offsets consistently 1-indexed (gh-9338)
https://github.com/python/cpython/commit/025eb98dc0c1dc27404df6c544fc2944e0fa9f3a
msg326287 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-09-24 21:13
Fixed by gh-9338. Thanks Ammar!
msg326288 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2018-09-24 21:14
Hm, should this be backported? I think it's safest not to.
msg326320 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2018-09-25 04:53
Wait, why did you make offsets 1-indexed? My suggestion was to make everything zero indexed...
msg326351 - (view) Author: Ammar Askar (ammar2) * (Python triager) Date: 2018-09-25 12:50
There's some context on the github PR but essentially it boiled down to:

- It would make it inconsistent with line offsets which are 1-indexed.
- Breaking change to a public C API (PyErr_SyntaxLocationObject)
- IDLE and other tools that catch syntax errors rely on it being 1-indexed and this would be a breaking change for them. As a quick example of that I found this vim plugin: https://github.com/vim-syntastic/syntastic/commit/6c91e8d802a77005d793d9cdd055c3da29f74a1b
History
Date User Action Args
2018-09-25 12:50:04ammar2setmessages: + msg326351
2018-09-25 04:53:03benjamin.petersonsetmessages: + msg326320
2018-09-24 21:14:40gvanrossumsetmessages: + msg326288
2018-09-24 21:13:59gvanrossumsetstatus: open -> closed
resolution: fixed
messages: + msg326287

stage: patch review -> resolved
2018-09-24 21:12:54gvanrossumsetmessages: + msg326286
2018-09-17 18:31:58xtreaksetnosy: + xtreak
2018-09-17 13:11:32pablogsalsetnosy: + pablogsal
2018-09-15 23:37:00ammar2setnosy: + ammar2
messages: + msg325463
2018-09-15 23:34:31ammar2setkeywords: + patch
stage: patch review
pull_requests: + pull_request8761
2018-09-14 18:42:07eric.smithsetmessages: + msg325382
2018-09-14 18:29:34gvanrossumsetnosy: + eric.smith
2018-09-14 18:21:05benjamin.petersonsetnosy: + benjamin.peterson, - eric.smith, xtreak
messages: + msg325380
2018-09-14 17:51:59eric.smithsetnosy: + eric.smith
messages: + msg325375
2018-09-14 17:30:28xtreaksetnosy: + xtreak
2018-09-14 17:24:29gvanrossumsetnosy: + emilyemorehouse
2018-09-14 17:22:22gvanrossumcreate