Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show "expected" token on syntax error #44453

Closed
olivergramberg mannequin opened this issue Jan 12, 2007 · 16 comments
Closed

Show "expected" token on syntax error #44453

olivergramberg mannequin opened this issue Jan 12, 2007 · 16 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement

Comments

@olivergramberg
Copy link
Mannequin

olivergramberg mannequin commented Jan 12, 2007

BPO 1634034
Nosy @rhettinger, @terryjreedy, @devdanzin, @benjaminp, @ezio-melotti, @davidmalcolm, @serhiy-storchaka, @pablogsal
PRs
  • bpo-1634034: Show "expected" token on syntax error. #6453
  • Files
  • pythonrun.patch: Patch for /Python/pythonrun.c
  • syntax-error-hints.patch
  • syntax-error-hints-3.4.patch
  • syntax-error-hints-3.4_2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-03-27.21:22:06.519>
    created_at = <Date 2007-01-12.13:03:27.000>
    labels = ['interpreter-core', 'type-feature', '3.8']
    title = 'Show "expected" token on syntax error'
    updated_at = <Date 2021-03-27.21:22:06.518>
    user = 'https://bugs.python.org/olivergramberg'

    bugs.python.org fields:

    activity = <Date 2021-03-27.21:22:06.518>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-03-27.21:22:06.519>
    closer = 'pablogsal'
    components = ['Interpreter Core']
    creation = <Date 2007-01-12.13:03:27.000>
    creator = 'oliver_gramberg'
    dependencies = []
    files = ['8286', '15860', '27530', '27708']
    hgrepos = []
    issue_num = 1634034
    keywords = ['patch']
    message_count = 16.0
    messages = ['54974', '54975', '54976', '84612', '97734', '113621', '172646', '173725', '173726', '179574', '179576', '179677', '179689', '315199', '389369', '389616']
    nosy_count = 10.0
    nosy_names = ['rhettinger', 'terry.reedy', 'sean_gillespie', 'ajaksu2', 'oliver_gramberg', 'benjamin.peterson', 'ezio.melotti', 'dmalcolm', 'serhiy.storchaka', 'pablogsal']
    pr_nums = ['6453']
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue1634034'
    versions = ['Python 3.8']

    @olivergramberg
    Copy link
    Mannequin Author

    olivergramberg mannequin commented Jan 12, 2007

    I suggest that the parser, when reporting a syntax
    error, should make use of its knowlegde of which token
    type is expected at the position where the error
    occurred. This results in more helpful error messages:

    -----------------------------------------------------

    >>> for a in (8,9)
      File "<stdin>", line 1
        for a in (8,9)
                     ^
    SyntaxError: invalid syntax - COLON expected
    -----------------------------------------------------
    >>> for a in (8,9: print a,
      File "<stdin>", line 1
        for a in (8,9: print a,
                     ^
    SyntaxError: invalid syntax: RPAR expected

    I tried the following patch (for pythonrun.c). It works
    well in the shell both interactively and in scripts,
    as well as in IDLE. But it's not complete:

    • It doesn't always print useful messages (only for
      fixed-size terminal token types, I assume.)
    • There sure are cases where more than one token type
      is allowed in a position. I believe I have seen that
      this information is available too somewhere in the
      parser, but it is not forwarded to the err_input
      routine.

    It's even nicer to show "')'" instead of "RPAR"...

    -----------------------------------------------------
    /* Set the error appropriate to the given input error code (see errcode.h) */

    static void
    err_input(perrdetail *err)
    {
    	PyObject *v, *w, *errtype;
    	PyObject* u = NULL;
    	char *msg = NULL;
    	errtype = PyExc_SyntaxError;
    	switch (err->error) {
    	case E_SYNTAX:
    		errtype = PyExc_IndentationError;
    		if (err->expected == INDENT)
    			msg = "expected an indented block";
    		else if (err->token == INDENT)
    			msg = "unexpected indent";
    		else if (err->token == DEDENT)
    			msg = "unexpected unindent";
    		else {
    			char buf[50];
    			errtype = PyExc_SyntaxError;
    			if(err->expected != -1) {
    				snprintf(buf, 48, "invalid syntax - %.16s expected\0",
    					_PyParser_TokenNames[err->expected]);
    				msg = buf;
    			} else {
    				msg = "invalid syntax";
    			}
    		}
    		break;
    		...

    I am willing to help work on this.

    Regards
    -Oliver

    @olivergramberg olivergramberg mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 12, 2007
    @seangillespie
    Copy link
    Mannequin

    seangillespie mannequin commented Mar 28, 2007

    Your patch seems to work.

    I agree that showing the token (as in ")") would indeed be much more useful, and it would be pretty easy to implement.

    However, I think that you should generate a diff for your patch. Its incredibly hard to read over SF.

    @olivergramberg
    Copy link
    Mannequin Author

    olivergramberg mannequin commented Mar 30, 2007

    Pfa a diff for my patch.

    Regards
    -Oliver

    File Added: pythonrun.patch

    @devdanzin
    Copy link
    Mannequin

    devdanzin mannequin commented Mar 30, 2009

    Sounds really useful.

    @davidmalcolm
    Copy link
    Member

    I'm attaching a new version of the patch, based on Oliver's (from 3 years ago). This patch is against the py3k branch.

    I've introduced a new table of (const) strings: _PyParser_TokenDescs, giving descriptions of each token type, so that you get e.g. "')'" rather than "RPAR"

    The patch of pythonrun.c is unchanged other than using the description table, rather than the name table.

    I've patched the expected results for the doctests in test_genexps and test_syntax.py so that these files pass: this gives the code the beginnings of a test suite.

    The existing patch adds this compiler warning for me (gcc 4.4.2, on Fedora 12):
    Python/pythonrun.c: In function ‘err_input’:
    Python/pythonrun.c:1923: warning: embedded ‘\0’ in format
    However I believe that snprintf isn't guaranteed to NUL-terminate the string under all conditions on all platforms, so the '\0' seems a sane failsafe.

    How does this look?

    I haven't attempted to deal with places where multiple token types are permitted, and it does sometimes simply emit "invalid syntax".

    @rhettinger
    Copy link
    Contributor

    +1 on the basic idea to make error messages more informative where possible, but am not sure how it would work in any but the more simple cases.

    How would work in cases where there are multiple possible "expected" tokens?

       >>> def f(x 3):	
       SyntaxError: invalid syntax

    It chokes at the "3". The expected token is either a comma, colon, or closing parenthesis.

    Also, the most annoying and least obvious syntax errors are ones that are revealed many characters away from the original cause (i.e. unbalanced opening brackets or parentheses). Am not sure how you can readily construct a helpful message in these cases.

    @serhiy-storchaka
    Copy link
    Member

    I'm attaching a new version of the patch, based on Dave's (from 2.5 years ago). This patch is against the 3.4.

    Previous patches contained an error in the message formatting. "buf" variable out of scope before "msg" used. Appending '\0' to the format string isn't guaranteed to NUL-terminate the string. Actually it do nothing (except producing a warning).

    @serhiy-storchaka
    Copy link
    Member

    Patch updated (thanks Benjamin for comments).

    @ezio-melotti
    Copy link
    Member

    Looking at the changes in the patch it seems to me that, in at least a few cases, it's better to have a bare "invalid syntax" than a misleading error.
    For example:

         >>> dict(a = i for i in range(10))
    +    SyntaxError: invalid syntax - ')' expected

    The () are ok, the message is misleading.

     >>> obj.None = 1
    +SyntaxError: invalid syntax - name expected

    'name' here is a bit vague.

     
     >>> def f(x, None):
     ...     pass
    +SyntaxError: invalid syntax - ')' expected
    
     >>> def f(*None):
     ...     pass
    +SyntaxError: invalid syntax - ')' expected

    Here the () are ok too.

     
     >>> def f(**None):
     ...     pass
    +SyntaxError: invalid syntax - name expected

    Here I would have expected the "')' expected" like in the previous example, but there's "name" instead (which is a bit better, albeit inconsistent).

    I wouldn't consider this an improvement, but for other situations the error message is probably useful.
    I see 3 options here:

    1. we find a way to show the expected token only when the message is not misleading;
    2. if the expected token is useful more often than it is misleading, then we could apply the patch as is;
    3. if it is misleading more often than it is useful, it's probably better to reject the patch.

    @serhiy-storchaka
    Copy link
    Member

     \>\>\> dict(a = i for i in range(10))
    
    • SyntaxError: invalid syntax - ')' expected

    The () are ok, the message is misleading.

    "dict(a = i)" is valid syntax, the compiler expects ")" instead of invalid "for".

    'name' here is a bit vague.

    The compiler actually expects a name (using Python terminology, see for example NameError). Of course you can propose an other name for "name" (this is just an entity in _PyParser_TokenDescs array).

    >>> def f(x, None):
    ... pass
    +SyntaxError: invalid syntax - ')' expected

    >>> def f(*None):
    ... pass
    +SyntaxError: invalid syntax - ')' expected

    Here the () are ok too.

    The compiler means "def f(x,)" and "def f(*)", not "def f()" as you possible expects.

    @ezio-melotti
    Copy link
    Member

    I'm not saying that these errors are wrong -- just that they are misleading (i.e. they might lead the user on the wrong path, and make finding the actual problem more difficult).

    It should be noted that the examples I pasted don't include a full traceback though. The presence of the caret (^) in the right place will definitely make things clearer.

    @serhiy-storchaka
    Copy link
    Member

    I agree, the main problem is in the fact that "expected token" is not always singular. And even "most expected token" is a little subjective. The better solution will be to expect several possible tokens. This requires some parser modification.

    @serhiy-storchaka
    Copy link
    Member

    Hmm, "expected" attribute is set when there is only one possible expected token in PyParser_AddToken(). I don't understand why error messages are so misleading for "def f(*23):" (here not only ')', but a name possible).

    @serhiy-storchaka
    Copy link
    Member

    Similar enhancement has been implemented in PyPy just now.

    https://morepypy.blogspot.de/2018/04/improving-syntaxerror-in-pypy.html

    @serhiy-storchaka serhiy-storchaka added 3.8 only security fixes labels Apr 11, 2018
    @terryjreedy
    Copy link
    Member

    I think that this issue should be closed as 'out of date' as it was pretty open-ended and it is unclear what request remains.

    For the specific case "for a in (8,9)", the suggested "expected ':'" has been added on another issue. I expect that there are other additions from other issues.

    For "for a in (8,9: print a," there is no change but for "for a in (8,9]" we now have "closing parenthesis ']' does not match opening parenthesis '('". This properly does not say that "expected ')'" as the needed fix might be to insert opener '['.

    For many other cases, the proposed additions were disputed as not helpful, mostly because multiple things could be expected. I think other suggestions, based on current master, should be new issues.

    @pablogsal
    Copy link
    Member

    Closing as per above

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    6 participants