classification
Title: Syntax error caret confused by indentation
Type: behavior Stage: resolved
Components: Interpreter Core Versions: Python 3.7, Python 3.6, Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: martin.panter Nosy List: berker.peksag, martin.panter, mystor, python-dev, serhiy.storchaka, yan12125
Priority: normal Keywords: easy, patch

Created on 2015-11-20 05:50 by mystor, last changed 2017-03-31 16:36 by dstufft. This issue is now closed.

Files
File name Uploaded Description Edit
cpython25677.patch mystor, 2015-11-20 16:20 review
cpython25677.patch mystor, 2015-11-25 17:44 v2 - patch + test review
cpython25677.patch mystor, 2015-11-26 14:45 review
cpython25677.patch mystor, 2016-01-09 00:17
cpython25677.patch mystor, 2016-01-09 21:47
cpython25677.v6.patch martin.panter, 2016-12-11 00:29 Strip form feeds review
Pull Requests
URL Status Linked Edit
PR 552 closed dstufft, 2017-03-31 16:36
Messages (25)
msg254951 - (view) Author: Michael Layzell (mystor) * Date: 2015-11-20 05:50
It appears that syntax errors generated by checking the AST will get confused by indentation and place the caret incorrectly.

With no indentation:
===
1 + 1 = 2
===
  File "/Users/mlayzell/test.py", line 1
    1 + 1 = 2
    ^
SyntaxError: can't assign to operator
===

With 4 spaces of indentation:
===
if True:
    1 + 1 = 2
===
  File "/Users/mlayzell/test.py", line 2
    1 + 1 = 2
       ^
SyntaxError: can't assign to operator
===

As you can see, the caret appears to be placed randomly in the middle of the second statement, apparently offset (probably by the 4 space indentation).

The above examples were run with Python3.5 on Mac OS X 10.10.4, though my somewhat-recent trunk clone exhibits the same problems.
msg254979 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-20 14:40
It appears as though the line is always printed without the leading whitespace, but the caret takes the whitespace into account (try indenting the line even further).  So the problem is that the caret and the printing of the source line in the error message are not consistent about how whitespace is handled.
msg254981 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-20 14:44
By the way, python3.4 also shows the behavior.  Python2 does not, because python2 does not print the caret.  The bug presumably crept in when the caret was added.  I'm marking this as easy...presumably the hardest part of fixing it is figuring out exactly how the message is constructed :)
msg254992 - (view) Author: Michael Layzell (mystor) * Date: 2015-11-20 16:20
This should fix the problem. It appears as though the indentation was being stripped from the program text differently depending on how the error was produced. In the case of a syntax error from the parser, the indentation was maintained until it was printed (in print_error_text, correctly handling the offset value). In contrast, errors from the AST were created with PyErr_ProgramText*, which would also strip the indentation (without mutating any offset value). 

This patch causes the second code path to not strip whitespace. This shouldn't cause I problem (I think), as it seems to only be used to instantiate SyntaxErrors.
msg255007 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2015-11-20 18:26
Would be nice to have a test.
msg255370 - (view) Author: Michael Layzell (mystor) * Date: 2015-11-25 17:44
Sorry for the delay, I finally got around to adding a test.

I'm mildly concerned about the portability of the check, but it seems to work locally. If people have suggestions about how to make the check more portable, let me know!
msg255376 - (view) Author: R. David Murray (r.david.murray) * (Python committer) Date: 2015-11-25 18:19
Two options on portability, I think: either use splitlines() on the decoded result, or add a 'universal_newlines' keyword argument to assert_python_failure and friends that would get passed to Popen.  I'd favor the latter, as other users of script_helpers may find it useful.  On the other hand, that may involve a whole separate discussion, so you could use splitlines here and we could open a new issue for a script_helpers enhancement :)

Also, rather than a direct re, I'd use assertRegex.  That will give a more useful error message if the test fails.
msg255397 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-26 01:18
Another way to get universal newlines I have used in other cases is to use TextIOWrapper. It might be easier if you really need that multi-line RE search:

text = TextIOWrapper(BytesIO(stderr), "ascii").read()
msg255418 - (view) Author: Michael Layzell (mystor) * Date: 2015-11-26 14:45
Martin's solution seemed like the least work, so that's what I ended up using. I also switched over to assertRegex, and I agree that it produces better error messages.

I added a comment to explain the use of the ugly multiline regex, and removed some of its unnecessary parts.
msg255435 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2015-11-26 21:32
Michael, see the review comments if you already haven’t. There is an unused variable “p” in the last two patches.
msg256178 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2015-12-10 16:33
With the latest patch applied, the output is

  File "z.py", line 2
    1 + 2 = 3
    ^
SyntaxError: can't assign to operator

I think the caret is still in wrong place.

I would expect the following output:

  File "z.py", line 2
    1 + 2 = 3
          ^
SyntaxError: can't assign to operator
msg257789 - (view) Author: Michael Layzell (mystor) * Date: 2016-01-09 00:17
Sorry for missing the review comments earlier, I didn't notice them (I've not used this bug tracker before - sorry).

I've attached an updated patch which should not have the variable problem, and also takes some of the other review comments into account.

Berker: I agree that the current positioning of the caret is undesirable, and that ideally we would report the error at the site of the operator rather than the first character of the expression. I think that changing this behavior is a different bug though.

P.S: Sorry for disappearing for a month, I don't seem to get emails when there are changes made to the bug, and thus forgot to check up on the bug...
msg257864 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-01-09 21:39
+script = textwrap.dedent("1 + 1 = 2\n")
You don’t need that first textwrap.dedent() call. It is only useful if the string has extra indentation at the start, as in the second multiline one.
msg257870 - (view) Author: Michael Layzell (mystor) * Date: 2016-01-09 21:47
Oops. Should be fixed now.
msg279978 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-03 09:17
LGTM.
msg280205 - (view) Author: Michael Layzell (mystor) * Date: 2016-11-07 16:03
Is there something I should be doing to push this patch forward? I am not familiar with the contribution process for cpython unfortunately.
msg280206 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-11-07 16:22
FWIW, I don't think the patch fixes the actual problem. With the patch applied, the caret still placed at the wrong place. I'd prefer avoid pushing a patch that fixes a bug by introducing another bug.
msg280207 - (view) Author: Michael Layzell (mystor) * Date: 2016-11-07 16:26
This patch fixes a problem, which is that the caret was not placed where the source location information for the node is, but instead in a random position based on the indentation level.

The problem that the caret is not placed under the `=` sign, but instead at the front of the expression, feels like a different bug related to how source location information is computed throughout cpython for infix operators. I don't think it should be fixed in this bug, but rather in a follow-up.

Specifically, this fix (to make the printing actually respect the source location information) will also be required in order to fix the placement of the caret to under the = sign.
msg280215 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-07 17:39
The patch fixes one bug, and I'm going to push it.

There is other bug here. If add "1;" before the original example (so the line becomes "1;1 + 1 = 2"), the caret will be under ";". In some SyntaxErrors the caret points one position left before the start of incorrect expression. But this bug deserves separate issue.

The fact that in "1 + 1 = 2" the caret points to the first "1" instead of "=" can be considered as yet one different bug (but this is questionable).
msg280220 - (view) Author: Berker Peksag (berker.peksag) * (Python committer) Date: 2016-11-07 17:52
With patch applied:

  File "x.py", line 2
    1 + 1 = 2
    ^
SyntaxError: can't assign to operator

Without patch:

  File "x.py", line 2
    1 + 1 = 2
       ^
SyntaxError: can't assign to operator

The caret is located at the wrong place in both examples (which is the actual bug that needs to be fixed here.) This isn't a high priority bug and I think people can live with the current behavior until this is properly fixed. Pushing a premature patch will only introduce more code churn.
msg280223 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-11-07 18:13
This may be just bad example that it looks contradictorily to you. Is the example "None = 0" looks good to you? Without patch the caret is located at the middle of None (depending on the size of the indentation of this line). With patch applied it is located at the start of None independently of the indentation.

The original bug is that the location of the caret depends on the indentation. The patch fixes this bug (and only this bug).
msg282845 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-12-10 10:35
(Long story short: need to strip form feeds in print_error_text(), but I agree that this otherwise does fix a bug.)

There is one minor regression that this patch causes IMO. Given the following file, where <FF> represents a form feed character ('\014')

if True:
<FF>    1 + 1 = 2

the error report now has a blank line (due to outputting the FF character) and extra indentation that wasn’t stripped:

  File "indent.py", line 2
    
        1 + 1 = 2
        ^
SyntaxError: can't assign to operator

I think the fix for this is to strip form feed characters in print_error_text(), in Python/pythonrun.c.

Apart from that, I agree with Serhiy and Michael that it is okay to push this change. The bug that we are fixing is that SyntaxError.offset does not correspond with SyntaxError.text, due to indentation. Before the patch, the offset refers to the line without indentation removed, but the text has indentation removed. Here is a really bad case, where there is more indentation than code, print_error_text() tries to find the next line, and ends up printing a blank line:

>>> code = "if True:\n" + " "*16 + "1 + 1 = 2\n"
>>> with open("indent.py", "wt") as f: f.write(code)
... 
35
>>> try: compile(code, "indent.py", "exec")
... except SyntaxError as err:
...     print("offset={err.offset} text={err.text!r}".format(err=err))
...     raise
... 
offset=16 text='1 + 1 = 2\n'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "indent.py", line 2
    
         ^
SyntaxError: can't assign to operator

In Michael’s original report, I was curious why the caret is only shifted three spaces, despite there being four spaces of indentation. This is because offset points to the start of the line, but the caret points to the character _before_ offset. So offset=0 and offset=1 are both represented by pointing at the first character on the line. This is related to Serhiy’s bug with inserting “1;”. In some cases (e.g. the line “1 +”), the offset is the string index _after_ the error. But in the case of “1;1 + 1 = 2”, offset is the index where the error (statement) begins.

Both pieces of indentation stripping code were added in revision 27f04d714ecb (year 2001). It is strange that only one stripped form feeds though. The column offset was only added later (revision 861c35cef7aa). So Python 2 will have some of its SyntaxError.text lines stripped of indentation, but it does not matter so much because SyntaxError.offset is not set in those cases.
msg282890 - (view) Author: Martin Panter (martin.panter) * (Python committer) Date: 2016-12-11 00:29
Patch v6 strips the form feed, and adds an extra test.
msg282905 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2016-12-11 05:48
LGTM.
msg283592 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-12-19 06:47
New changeset 35b50c26f780 by Martin Panter in branch '3.5':
Issue #25677: Correct syntax error caret for indented blocks.
https://hg.python.org/cpython/rev/35b50c26f780

New changeset d4effdd699de by Martin Panter in branch '3.6':
Issue #25677: Merge SyntaxError caret positioning from 3.5
https://hg.python.org/cpython/rev/d4effdd699de

New changeset 2342726ea0c0 by Martin Panter in branch 'default':
Issue #25677: Merge SyntaxError caret positioning from 3.6
https://hg.python.org/cpython/rev/2342726ea0c0
History
Date User Action Args
2017-03-31 16:36:35dstufftsetpull_requests: + pull_request1080
2016-12-19 20:29:54martin.pantersetstatus: open -> closed
resolution: fixed
stage: commit review -> resolved
2016-12-19 06:47:07python-devsetnosy: + python-dev
messages: + msg283592
2016-12-11 05:48:02serhiy.storchakasetassignee: serhiy.storchaka -> martin.panter
messages: + msg282905
2016-12-11 00:29:35martin.pantersetfiles: + cpython25677.v6.patch

messages: + msg282890
2016-12-10 10:36:00martin.pantersetmessages: + msg282845
2016-11-07 18:13:07serhiy.storchakasetmessages: + msg280223
2016-11-07 17:52:51berker.peksagsetmessages: + msg280220
2016-11-07 17:43:40vstinnersetnosy: - vstinner
2016-11-07 17:39:35serhiy.storchakasetassignee: serhiy.storchaka
messages: + msg280215
2016-11-07 16:26:43mystorsetmessages: + msg280207
2016-11-07 16:22:48berker.peksagsetmessages: + msg280206
2016-11-07 16:03:40mystorsetmessages: + msg280205
2016-11-03 09:17:53serhiy.storchakasetstage: patch review -> commit review
messages: + msg279978
versions: + Python 3.7
2016-11-01 21:56:58yan12125setnosy: + yan12125
2016-11-01 21:51:57serhiy.storchakalinkissue28582 superseder
2016-01-09 21:47:02mystorsetfiles: + cpython25677.patch

messages: + msg257870
2016-01-09 21:39:21martin.pantersetmessages: + msg257864
2016-01-09 20:18:41r.david.murraysetnosy: - r.david.murray
2016-01-09 00:17:09mystorsetfiles: + cpython25677.patch

messages: + msg257789
2015-12-10 16:33:43berker.peksagsetnosy: + berker.peksag
messages: + msg256178
2015-11-26 21:33:31martin.pantersetstage: test needed -> patch review
2015-11-26 21:32:47martin.pantersetmessages: + msg255435
2015-11-26 14:45:43mystorsetfiles: + cpython25677.patch

messages: + msg255418
2015-11-26 01:18:25martin.pantersetnosy: + martin.panter
messages: + msg255397
2015-11-25 18:19:22r.david.murraysetmessages: + msg255376
2015-11-25 17:44:49mystorsetfiles: + cpython25677.patch

messages: + msg255370
2015-11-20 18:26:46serhiy.storchakasetmessages: + msg255007
stage: patch review -> test needed
2015-11-20 17:05:08r.david.murraysetstage: needs patch -> patch review
2015-11-20 17:04:24r.david.murraysetnosy: + vstinner, serhiy.storchaka
2015-11-20 16:20:39mystorsetfiles: + cpython25677.patch
keywords: + patch
messages: + msg254992
2015-11-20 14:44:12r.david.murraysetkeywords: + easy

messages: + msg254981
stage: needs patch
2015-11-20 14:40:32r.david.murraysetnosy: + r.david.murray
messages: + msg254979
2015-11-20 05:50:56mystorcreate