This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Improve error messages with expected keywords
Type: Stage: resolved
Components: Interpreter Core Versions: Python 3.9
process
Status: closed Resolution: rejected
Dependencies: Superseder:
Assigned To: Nosy List: gvanrossum, lys.nikolaou, pablogsal
Priority: normal Keywords: patch

Created on 2020-05-11 22:10 by pablogsal, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 20039 closed pablogsal, 2020-05-11 22:11
Messages (7)
msg368664 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-05-11 22:10
Using the new parser, we could improve the plain "syntax error" messages with the tokens/keywords that would have made the parser advance. There is a proof of concept in https://github.com/python/cpython/pull/20039 you can play with.

I would like to get some initial opinions on the idea before going deeper in the issue :)
msg368674 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-05-11 22:44
Hmm... The errors get long, and by focusing only on keywords they can be misleading. E.g.

>>> from x import a b c
  File "<stdin>", line 1
    from x import a b c
                    ^
SyntaxError: Invalid syntax. Expected one of: as
>>> 

But the most likely error is omission of a comma.

>>> if x y: pass
  File "<stdin>", line 1
    if x y: pass
         ^
SyntaxError: Invalid syntax. Expected one of: not, is, or, in, and, if
>>> 

But the most likely error is probably a comparison operator.

And so on. Here's a nice one:

>>> /
  File "<stdin>", line 1
    /
    ^
SyntaxError: Invalid syntax. Expected one of: for, pass, lambda, False, global, True, __new_parser__, if, raise, continue, not, break, while, None, del, nonlocal, import, assert, return, class, with, def, try, from, yield
>>> 

(Huh, where did it get __new_parser__?)

The beauty of Python's detail-free syntax error is that it doesn't tell you what it expects -- because parsers are dumb, what the parser expected is rarely what's wrong with your code -- and it requires the user to understand how the parser works to interpret the error message.
msg368677 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-05-11 23:04
> SyntaxError: Invalid syntax. Expected one of: for, pass, lambda, False, global, True, __new_parser__, if, raise, continue, not, break, while, None, del, nonlocal, import, assert, return, class, with, def, try, from, yield

Haha, that is a good point. It also reveals the easter egg :)

> The beauty of Python's detail-free syntax error is that it doesn't tell you what it expects -- because parsers are dumb, what the parser expected is rarely what's wrong with your code -- and it requires the user to understand how the parser works to interpret the error message.

Right, I think will be very difficult to actually give you something very close to what the actual problem is.

I started this draft based on some similar errors that I have seen in other parsers but is true that with the exception of rust, all other grammars I explored and played with were mucn simpler, so the errors were not super verbose.

I think i will close the issue and the PR unless you think there is something worth exploring/discussing left, as it does not look that we can get something less verbose in an easy way.
msg368678 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2020-05-11 23:06
> (Huh, where did it get __new_parser__?)

From here:

https://github.com/python/cpython/blob/master/Parser/pegen/parse.c#L67
msg368679 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-05-11 23:06
I had hoped that error labels would get us closer to error recovery, but it appears that is still quite elusive. :-(
msg368680 - (view) Author: Lysandros Nikolaou (lys.nikolaou) * (Python committer) Date: 2020-05-11 23:10
I also concur with Guido here. I have played around with other languages and I dislike getting a long list of expected tokens, that are not helpful, if not actually confusing sometimes.

I think that the current generic SyntaxError description together with the error caret actually does a good job of directing someone to close where the error is, without providing too much information that might be misleading.
msg368681 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2020-05-11 23:26
In response to my PEG blogs last year someone showed me an entirely different algorithm, based on first looking for matching parentheses (and other matching things), then for operators by priority, and so on. The approach was designed with C in mind but looked like it would fit reasonably well with Python, once you view e.g. ':' as an operator of a certain priority, and figure out what to do with indentation.

This would actually be closer to the old approach, accepting "a+1 = b" initially as an assignment and then rejecting "a+1" as a target.

I wonder if we could (eventually) use this approach as a fallback when a syntax error is found. But it is an entirely different theoretical framework, so we should probably not hurry with this.

IOW I'm okay with closing this issue.
History
Date User Action Args
2022-04-11 14:59:30adminsetgithub: 84779
2020-05-11 23:36:42pablogsalsetstatus: open -> closed
resolution: rejected
stage: patch review -> resolved
2020-05-11 23:26:12gvanrossumsetmessages: + msg368681
2020-05-11 23:10:25lys.nikolaousetmessages: + msg368680
2020-05-11 23:06:41gvanrossumsetmessages: + msg368679
2020-05-11 23:06:09pablogsalsetmessages: + msg368678
2020-05-11 23:04:38pablogsalsetmessages: + msg368677
2020-05-11 22:44:46gvanrossumsetmessages: + msg368674
2020-05-11 22:11:07pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request19350
2020-05-11 22:10:51pablogsalcreate