Author martin.panter
Recipients berker.peksag, martin.panter, mystor, serhiy.storchaka, yan12125
Date 2016-12-10.10:35:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1481366160.51.0.314949728574.issue25677@psf.upfronthosting.co.za>
In-reply-to
Content
(Long story short: need to strip form feeds in print_error_text(), but I agree that this otherwise does fix a bug.)

There is one minor regression that this patch causes IMO. Given the following file, where <FF> represents a form feed character ('\014')

if True:
<FF>    1 + 1 = 2

the error report now has a blank line (due to outputting the FF character) and extra indentation that wasn’t stripped:

  File "indent.py", line 2
    
        1 + 1 = 2
        ^
SyntaxError: can't assign to operator

I think the fix for this is to strip form feed characters in print_error_text(), in Python/pythonrun.c.

Apart from that, I agree with Serhiy and Michael that it is okay to push this change. The bug that we are fixing is that SyntaxError.offset does not correspond with SyntaxError.text, due to indentation. Before the patch, the offset refers to the line without indentation removed, but the text has indentation removed. Here is a really bad case, where there is more indentation than code, print_error_text() tries to find the next line, and ends up printing a blank line:

>>> code = "if True:\n" + " "*16 + "1 + 1 = 2\n"
>>> with open("indent.py", "wt") as f: f.write(code)
... 
35
>>> try: compile(code, "indent.py", "exec")
... except SyntaxError as err:
...     print("offset={err.offset} text={err.text!r}".format(err=err))
...     raise
... 
offset=16 text='1 + 1 = 2\n'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "indent.py", line 2
    
         ^
SyntaxError: can't assign to operator

In Michael’s original report, I was curious why the caret is only shifted three spaces, despite there being four spaces of indentation. This is because offset points to the start of the line, but the caret points to the character _before_ offset. So offset=0 and offset=1 are both represented by pointing at the first character on the line. This is related to Serhiy’s bug with inserting “1;”. In some cases (e.g. the line “1 +”), the offset is the string index _after_ the error. But in the case of “1;1 + 1 = 2”, offset is the index where the error (statement) begins.

Both pieces of indentation stripping code were added in revision 27f04d714ecb (year 2001). It is strange that only one stripped form feeds though. The column offset was only added later (revision 861c35cef7aa). So Python 2 will have some of its SyntaxError.text lines stripped of indentation, but it does not matter so much because SyntaxError.offset is not set in those cases.
History
Date User Action Args
2016-12-10 10:36:00martin.pantersetrecipients: + martin.panter, berker.peksag, serhiy.storchaka, yan12125, mystor
2016-12-10 10:36:00martin.pantersetmessageid: <1481366160.51.0.314949728574.issue25677@psf.upfronthosting.co.za>
2016-12-10 10:36:00martin.panterlinkissue25677 messages
2016-12-10 10:35:59martin.pantercreate