New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Show "expected" token on syntax error #44453
Comments
I suggest that the parser, when reporting a syntax ----------------------------------------------------- >>> for a in (8,9)
File "<stdin>", line 1
for a in (8,9)
^
SyntaxError: invalid syntax - COLON expected
-----------------------------------------------------
>>> for a in (8,9: print a,
File "<stdin>", line 1
for a in (8,9: print a,
^
SyntaxError: invalid syntax: RPAR expected I tried the following patch (for pythonrun.c). It works
It's even nicer to show "')'" instead of "RPAR"... ----------------------------------------------------- static void
err_input(perrdetail *err)
{
PyObject *v, *w, *errtype;
PyObject* u = NULL;
char *msg = NULL;
errtype = PyExc_SyntaxError;
switch (err->error) {
case E_SYNTAX:
errtype = PyExc_IndentationError;
if (err->expected == INDENT)
msg = "expected an indented block";
else if (err->token == INDENT)
msg = "unexpected indent";
else if (err->token == DEDENT)
msg = "unexpected unindent";
else {
char buf[50];
errtype = PyExc_SyntaxError;
if(err->expected != -1) {
snprintf(buf, 48, "invalid syntax - %.16s expected\0",
_PyParser_TokenNames[err->expected]);
msg = buf;
} else {
msg = "invalid syntax";
}
}
break;
... I am willing to help work on this. Regards |
Your patch seems to work. I agree that showing the token (as in ")") would indeed be much more useful, and it would be pretty easy to implement. However, I think that you should generate a diff for your patch. Its incredibly hard to read over SF. |
Pfa a diff for my patch. Regards File Added: pythonrun.patch |
Sounds really useful. |
I'm attaching a new version of the patch, based on Oliver's (from 3 years ago). This patch is against the py3k branch. I've introduced a new table of (const) strings: _PyParser_TokenDescs, giving descriptions of each token type, so that you get e.g. "')'" rather than "RPAR" The patch of pythonrun.c is unchanged other than using the description table, rather than the name table. I've patched the expected results for the doctests in test_genexps and test_syntax.py so that these files pass: this gives the code the beginnings of a test suite. The existing patch adds this compiler warning for me (gcc 4.4.2, on Fedora 12): How does this look? I haven't attempted to deal with places where multiple token types are permitted, and it does sometimes simply emit "invalid syntax". |
+1 on the basic idea to make error messages more informative where possible, but am not sure how it would work in any but the more simple cases. How would work in cases where there are multiple possible "expected" tokens? >>> def f(x 3):
SyntaxError: invalid syntax It chokes at the "3". The expected token is either a comma, colon, or closing parenthesis. Also, the most annoying and least obvious syntax errors are ones that are revealed many characters away from the original cause (i.e. unbalanced opening brackets or parentheses). Am not sure how you can readily construct a helpful message in these cases. |
I'm attaching a new version of the patch, based on Dave's (from 2.5 years ago). This patch is against the 3.4. Previous patches contained an error in the message formatting. "buf" variable out of scope before "msg" used. Appending '\0' to the format string isn't guaranteed to NUL-terminate the string. Actually it do nothing (except producing a warning). |
Patch updated (thanks Benjamin for comments). |
Looking at the changes in the patch it seems to me that, in at least a few cases, it's better to have a bare "invalid syntax" than a misleading error. >>> dict(a = i for i in range(10))
+ SyntaxError: invalid syntax - ')' expected The () are ok, the message is misleading. >>> obj.None = 1
+SyntaxError: invalid syntax - name expected 'name' here is a bit vague.
>>> def f(x, None):
... pass
+SyntaxError: invalid syntax - ')' expected
>>> def f(*None):
... pass
+SyntaxError: invalid syntax - ')' expected Here the () are ok too.
>>> def f(**None):
... pass
+SyntaxError: invalid syntax - name expected Here I would have expected the "')' expected" like in the previous example, but there's "name" instead (which is a bit better, albeit inconsistent). I wouldn't consider this an improvement, but for other situations the error message is probably useful.
|
"dict(a = i)" is valid syntax, the compiler expects ")" instead of invalid "for".
The compiler actually expects a name (using Python terminology, see for example NameError). Of course you can propose an other name for "name" (this is just an entity in _PyParser_TokenDescs array).
The compiler means "def f(x,)" and "def f(*)", not "def f()" as you possible expects. |
I'm not saying that these errors are wrong -- just that they are misleading (i.e. they might lead the user on the wrong path, and make finding the actual problem more difficult). It should be noted that the examples I pasted don't include a full traceback though. The presence of the caret (^) in the right place will definitely make things clearer. |
I agree, the main problem is in the fact that "expected token" is not always singular. And even "most expected token" is a little subjective. The better solution will be to expect several possible tokens. This requires some parser modification. |
Hmm, "expected" attribute is set when there is only one possible expected token in PyParser_AddToken(). I don't understand why error messages are so misleading for "def f(*23):" (here not only ')', but a name possible). |
Similar enhancement has been implemented in PyPy just now. https://morepypy.blogspot.de/2018/04/improving-syntaxerror-in-pypy.html |
I think that this issue should be closed as 'out of date' as it was pretty open-ended and it is unclear what request remains. For the specific case "for a in (8,9)", the suggested "expected ':'" has been added on another issue. I expect that there are other additions from other issues. For "for a in (8,9: print a," there is no change but for "for a in (8,9]" we now have "closing parenthesis ']' does not match opening parenthesis '('". This properly does not say that "expected ')'" as the needed fix might be to insert opener '['. For many other cases, the proposed additions were disputed as not helpful, mostly because multiple things could be expected. I think other suggestions, based on current master, should be new issues. |
Closing as per above |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: