Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax error caret confused by indentation #69863

Closed
mystor mannequin opened this issue Nov 20, 2015 · 25 comments
Closed

Syntax error caret confused by indentation #69863

mystor mannequin opened this issue Nov 20, 2015 · 25 comments
Assignees
Labels
3.7 (EOL) end of life easy interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@mystor
Copy link
Mannequin

mystor mannequin commented Nov 20, 2015

BPO 25677
Nosy @berkerpeksag, @vadmium, @serhiy-storchaka, @yan12125
PRs
  • [Do Not Merge] Convert Misc/NEWS so that it is managed by towncrier #552
  • Files
  • cpython25677.patch
  • cpython25677.patch: v2 - patch + test
  • cpython25677.patch
  • cpython25677.patch
  • cpython25677.patch
  • cpython25677.v6.patch: Strip form feeds
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/vadmium'
    closed_at = <Date 2016-12-19.20:29:54.996>
    created_at = <Date 2015-11-20.05:50:55.967>
    labels = ['interpreter-core', 'easy', 'type-bug', '3.7']
    title = 'Syntax error caret confused by indentation'
    updated_at = <Date 2017-03-31.16:36:35.392>
    user = 'https://bugs.python.org/mystor'

    bugs.python.org fields:

    activity = <Date 2017-03-31.16:36:35.392>
    actor = 'dstufft'
    assignee = 'martin.panter'
    closed = True
    closed_date = <Date 2016-12-19.20:29:54.996>
    closer = 'martin.panter'
    components = ['Interpreter Core']
    creation = <Date 2015-11-20.05:50:55.967>
    creator = 'mystor'
    dependencies = []
    files = ['41100', '41163', '41169', '41539', '41559', '45844']
    hgrepos = []
    issue_num = 25677
    keywords = ['patch', 'easy']
    message_count = 25.0
    messages = ['254951', '254979', '254981', '254992', '255007', '255370', '255376', '255397', '255418', '255435', '256178', '257789', '257864', '257870', '279978', '280205', '280206', '280207', '280215', '280220', '280223', '282845', '282890', '282905', '283592']
    nosy_count = 6.0
    nosy_names = ['python-dev', 'berker.peksag', 'martin.panter', 'serhiy.storchaka', 'yan12125', 'mystor']
    pr_nums = ['552']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue25677'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7']

    @mystor
    Copy link
    Mannequin Author

    mystor mannequin commented Nov 20, 2015

    It appears that syntax errors generated by checking the AST will get confused by indentation and place the caret incorrectly.

    With no indentation:
    ===
    1 + 1 = 2
    ===
    File "/Users/mlayzell/test.py", line 1
    1 + 1 = 2
    ^
    SyntaxError: can't assign to operator
    ===

    With 4 spaces of indentation:
    ===

    if True:
        1 + 1 = 2

    ===
    File "/Users/mlayzell/test.py", line 2
    1 + 1 = 2
    ^
    SyntaxError: can't assign to operator
    ===

    As you can see, the caret appears to be placed randomly in the middle of the second statement, apparently offset (probably by the 4 space indentation).

    The above examples were run with Python3.5 on Mac OS X 10.10.4, though my somewhat-recent trunk clone exhibits the same problems.

    @mystor mystor mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error labels Nov 20, 2015
    @bitdancer
    Copy link
    Member

    It appears as though the line is always printed without the leading whitespace, but the caret takes the whitespace into account (try indenting the line even further). So the problem is that the caret and the printing of the source line in the error message are not consistent about how whitespace is handled.

    @bitdancer
    Copy link
    Member

    By the way, python3.4 also shows the behavior. Python2 does not, because python2 does not print the caret. The bug presumably crept in when the caret was added. I'm marking this as easy...presumably the hardest part of fixing it is figuring out exactly how the message is constructed :)

    @bitdancer bitdancer added the easy label Nov 20, 2015
    @mystor
    Copy link
    Mannequin Author

    mystor mannequin commented Nov 20, 2015

    This should fix the problem. It appears as though the indentation was being stripped from the program text differently depending on how the error was produced. In the case of a syntax error from the parser, the indentation was maintained until it was printed (in print_error_text, correctly handling the offset value). In contrast, errors from the AST were created with PyErr_ProgramText*, which would also strip the indentation (without mutating any offset value).

    This patch causes the second code path to not strip whitespace. This shouldn't cause I problem (I think), as it seems to only be used to instantiate SyntaxErrors.

    @serhiy-storchaka
    Copy link
    Member

    Would be nice to have a test.

    @mystor
    Copy link
    Mannequin Author

    mystor mannequin commented Nov 25, 2015

    Sorry for the delay, I finally got around to adding a test.

    I'm mildly concerned about the portability of the check, but it seems to work locally. If people have suggestions about how to make the check more portable, let me know!

    @bitdancer
    Copy link
    Member

    Two options on portability, I think: either use splitlines() on the decoded result, or add a 'universal_newlines' keyword argument to assert_python_failure and friends that would get passed to Popen. I'd favor the latter, as other users of script_helpers may find it useful. On the other hand, that may involve a whole separate discussion, so you could use splitlines here and we could open a new issue for a script_helpers enhancement :)

    Also, rather than a direct re, I'd use assertRegex. That will give a more useful error message if the test fails.

    @vadmium
    Copy link
    Member

    vadmium commented Nov 26, 2015

    Another way to get universal newlines I have used in other cases is to use TextIOWrapper. It might be easier if you really need that multi-line RE search:

    text = TextIOWrapper(BytesIO(stderr), "ascii").read()

    @mystor
    Copy link
    Mannequin Author

    mystor mannequin commented Nov 26, 2015

    Martin's solution seemed like the least work, so that's what I ended up using. I also switched over to assertRegex, and I agree that it produces better error messages.

    I added a comment to explain the use of the ugly multiline regex, and removed some of its unnecessary parts.

    @vadmium
    Copy link
    Member

    vadmium commented Nov 26, 2015

    Michael, see the review comments if you already haven’t. There is an unused variable “p” in the last two patches.

    @berkerpeksag
    Copy link
    Member

    With the latest patch applied, the output is

    File "z.py", line 2
    1 + 2 = 3
    ^
    SyntaxError: can't assign to operator

    I think the caret is still in wrong place.

    I would expect the following output:

    File "z.py", line 2
    1 + 2 = 3
    ^
    SyntaxError: can't assign to operator

    @mystor
    Copy link
    Mannequin Author

    mystor mannequin commented Jan 9, 2016

    Sorry for missing the review comments earlier, I didn't notice them (I've not used this bug tracker before - sorry).

    I've attached an updated patch which should not have the variable problem, and also takes some of the other review comments into account.

    Berker: I agree that the current positioning of the caret is undesirable, and that ideally we would report the error at the site of the operator rather than the first character of the expression. I think that changing this behavior is a different bug though.

    P.S: Sorry for disappearing for a month, I don't seem to get emails when there are changes made to the bug, and thus forgot to check up on the bug...

    @vadmium
    Copy link
    Member

    vadmium commented Jan 9, 2016

    +script = textwrap.dedent("1 + 1 = 2\n")
    You don’t need that first textwrap.dedent() call. It is only useful if the string has extra indentation at the start, as in the second multiline one.

    @mystor
    Copy link
    Mannequin Author

    mystor mannequin commented Jan 9, 2016

    Oops. Should be fixed now.

    @serhiy-storchaka
    Copy link
    Member

    LGTM.

    @serhiy-storchaka serhiy-storchaka added the 3.7 (EOL) end of life label Nov 3, 2016
    @mystor
    Copy link
    Mannequin Author

    mystor mannequin commented Nov 7, 2016

    Is there something I should be doing to push this patch forward? I am not familiar with the contribution process for cpython unfortunately.

    @berkerpeksag
    Copy link
    Member

    FWIW, I don't think the patch fixes the actual problem. With the patch applied, the caret still placed at the wrong place. I'd prefer avoid pushing a patch that fixes a bug by introducing another bug.

    @mystor
    Copy link
    Mannequin Author

    mystor mannequin commented Nov 7, 2016

    This patch fixes a problem, which is that the caret was not placed where the source location information for the node is, but instead in a random position based on the indentation level.

    The problem that the caret is not placed under the = sign, but instead at the front of the expression, feels like a different bug related to how source location information is computed throughout cpython for infix operators. I don't think it should be fixed in this bug, but rather in a follow-up.

    Specifically, this fix (to make the printing actually respect the source location information) will also be required in order to fix the placement of the caret to under the = sign.

    @serhiy-storchaka
    Copy link
    Member

    The patch fixes one bug, and I'm going to push it.

    There is other bug here. If add "1;" before the original example (so the line becomes "1;1 + 1 = 2"), the caret will be under ";". In some SyntaxErrors the caret points one position left before the start of incorrect expression. But this bug deserves separate issue.

    The fact that in "1 + 1 = 2" the caret points to the first "1" instead of "=" can be considered as yet one different bug (but this is questionable).

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 7, 2016
    @berkerpeksag
    Copy link
    Member

    With patch applied:

    File "x.py", line 2
    1 + 1 = 2
    ^
    SyntaxError: can't assign to operator

    Without patch:

    File "x.py", line 2
    1 + 1 = 2
    ^
    SyntaxError: can't assign to operator

    The caret is located at the wrong place in both examples (which is the actual bug that needs to be fixed here.) This isn't a high priority bug and I think people can live with the current behavior until this is properly fixed. Pushing a premature patch will only introduce more code churn.

    @serhiy-storchaka
    Copy link
    Member

    This may be just bad example that it looks contradictorily to you. Is the example "None = 0" looks good to you? Without patch the caret is located at the middle of None (depending on the size of the indentation of this line). With patch applied it is located at the start of None independently of the indentation.

    The original bug is that the location of the caret depends on the indentation. The patch fixes this bug (and only this bug).

    @vadmium
    Copy link
    Member

    vadmium commented Dec 10, 2016

    (Long story short: need to strip form feeds in print_error_text(), but I agree that this otherwise does fix a bug.)

    There is one minor regression that this patch causes IMO. Given the following file, where <FF> represents a form feed character ('\014')

    if True:
    <FF>    1 + 1 = 2

    the error report now has a blank line (due to outputting the FF character) and extra indentation that wasn’t stripped:

    File "indent.py", line 2

        1 + 1 = 2
        ^
    

    SyntaxError: can't assign to operator

    I think the fix for this is to strip form feed characters in print_error_text(), in Python/pythonrun.c.

    Apart from that, I agree with Serhiy and Michael that it is okay to push this change. The bug that we are fixing is that SyntaxError.offset does not correspond with SyntaxError.text, due to indentation. Before the patch, the offset refers to the line without indentation removed, but the text has indentation removed. Here is a really bad case, where there is more indentation than code, print_error_text() tries to find the next line, and ends up printing a blank line:

    >>> code = "if True:\n" + " "*16 + "1 + 1 = 2\n"
    >>> with open("indent.py", "wt") as f: f.write(code)
    ... 
    35
    >>> try: compile(code, "indent.py", "exec")
    ... except SyntaxError as err:
    ...     print("offset={err.offset} text={err.text!r}".format(err=err))
    ...     raise
    ... 
    offset=16 text='1 + 1 = 2\n'
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "indent.py", line 2
        
             ^
    SyntaxError: can't assign to operator

    In Michael’s original report, I was curious why the caret is only shifted three spaces, despite there being four spaces of indentation. This is because offset points to the start of the line, but the caret points to the character _before_ offset. So offset=0 and offset=1 are both represented by pointing at the first character on the line. This is related to Serhiy’s bug with inserting “1;”. In some cases (e.g. the line “1 +”), the offset is the string index _after_ the error. But in the case of “1;1 + 1 = 2”, offset is the index where the error (statement) begins.

    Both pieces of indentation stripping code were added in revision 27f04d714ecb (year 2001). It is strange that only one stripped form feeds though. The column offset was only added later (revision 861c35cef7aa). So Python 2 will have some of its SyntaxError.text lines stripped of indentation, but it does not matter so much because SyntaxError.offset is not set in those cases.

    @vadmium
    Copy link
    Member

    vadmium commented Dec 11, 2016

    Patch v6 strips the form feed, and adds an extra test.

    @serhiy-storchaka
    Copy link
    Member

    LGTM.

    @serhiy-storchaka serhiy-storchaka removed their assignment Dec 11, 2016
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 19, 2016

    New changeset 35b50c26f780 by Martin Panter in branch '3.5':
    Issue bpo-25677: Correct syntax error caret for indented blocks.
    https://hg.python.org/cpython/rev/35b50c26f780

    New changeset d4effdd699de by Martin Panter in branch '3.6':
    Issue bpo-25677: Merge SyntaxError caret positioning from 3.5
    https://hg.python.org/cpython/rev/d4effdd699de

    New changeset 2342726ea0c0 by Martin Panter in branch 'default':
    Issue bpo-25677: Merge SyntaxError caret positioning from 3.6
    https://hg.python.org/cpython/rev/2342726ea0c0

    @vadmium vadmium closed this as completed Dec 19, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life easy interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants