msg54974 - (view) |
Author: Oliver Gramberg (oliver_gramberg) |
Date: 2007-01-12 13:03 |
I suggest that the parser, when reporting a syntax
error, should make use of its knowlegde of which token
type is expected at the position where the error
occurred. This results in more helpful error messages:
-----------------------------------------------------
>>> for a in (8,9)
File "<stdin>", line 1
for a in (8,9)
^
SyntaxError: invalid syntax - COLON expected
-----------------------------------------------------
>>> for a in (8,9: print a,
File "<stdin>", line 1
for a in (8,9: print a,
^
SyntaxError: invalid syntax: RPAR expected
-----------------------------------------------------
I tried the following patch (for pythonrun.c). It works
well in the shell both interactively and in scripts,
as well as in IDLE. But it's not complete:
- It doesn't always print useful messages (only for
fixed-size terminal token types, I assume.)
- There sure are cases where more than one token type
is allowed in a position. I believe I have seen that
this information is available too somewhere in the
parser, but it is not forwarded to the err_input
routine.
It's even nicer to show "')'" instead of "RPAR"...
-----------------------------------------------------
/* Set the error appropriate to the given input error code (see errcode.h) */
static void
err_input(perrdetail *err)
{
PyObject *v, *w, *errtype;
PyObject* u = NULL;
char *msg = NULL;
errtype = PyExc_SyntaxError;
switch (err->error) {
case E_SYNTAX:
errtype = PyExc_IndentationError;
if (err->expected == INDENT)
msg = "expected an indented block";
else if (err->token == INDENT)
msg = "unexpected indent";
else if (err->token == DEDENT)
msg = "unexpected unindent";
else {
char buf[50];
errtype = PyExc_SyntaxError;
if(err->expected != -1) {
snprintf(buf, 48, "invalid syntax - %.16s expected\0",
_PyParser_TokenNames[err->expected]);
msg = buf;
} else {
msg = "invalid syntax";
}
}
break;
...
-----------------------------------------------------
I am willing to help work on this.
Regards
-Oliver
|
msg54975 - (view) |
Author: Sean Gillespie (sean_gillespie) |
Date: 2007-03-28 23:37 |
Your patch seems to work.
I agree that showing the token (as in ")") would indeed be much more useful, and it would be pretty easy to implement.
However, I think that you should generate a diff for your patch. Its incredibly hard to read over SF.
|
msg54976 - (view) |
Author: Oliver Gramberg (oliver_gramberg) |
Date: 2007-03-30 11:44 |
Pfa a diff for my patch.
Regards
-Oliver
File Added: pythonrun.patch
|
msg84612 - (view) |
Author: Daniel Diniz (ajaksu2) *  |
Date: 2009-03-30 18:51 |
Sounds really useful.
|
msg97734 - (view) |
Author: Dave Malcolm (dmalcolm)  |
Date: 2010-01-13 19:28 |
I'm attaching a new version of the patch, based on Oliver's (from 3 years ago). This patch is against the py3k branch.
I've introduced a new table of (const) strings: _PyParser_TokenDescs, giving descriptions of each token type, so that you get e.g. "')'" rather than "RPAR"
The patch of pythonrun.c is unchanged other than using the description table, rather than the name table.
I've patched the expected results for the doctests in test_genexps and test_syntax.py so that these files pass: this gives the code the beginnings of a test suite.
The existing patch adds this compiler warning for me (gcc 4.4.2, on Fedora 12):
Python/pythonrun.c: In function ‘err_input’:
Python/pythonrun.c:1923: warning: embedded ‘\0’ in format
However I believe that snprintf isn't guaranteed to NUL-terminate the string under all conditions on all platforms, so the '\0' seems a sane failsafe.
How does this look?
I haven't attempted to deal with places where multiple token types are permitted, and it does sometimes simply emit "invalid syntax".
|
msg113621 - (view) |
Author: Raymond Hettinger (rhettinger) *  |
Date: 2010-08-11 19:51 |
+1 on the basic idea to make error messages more informative where possible, but am not sure how it would work in any but the more simple cases.
How would work in cases where there are multiple possible "expected" tokens?
>>> def f(x 3):
SyntaxError: invalid syntax
It chokes at the "3". The expected token is either a comma, colon, or closing parenthesis.
Also, the most annoying and least obvious syntax errors are ones that are revealed many characters away from the original cause (i.e. unbalanced opening brackets or parentheses). Am not sure how you can readily construct a helpful message in these cases.
|
msg172646 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-10-11 14:44 |
I'm attaching a new version of the patch, based on Dave's (from 2.5 years ago). This patch is against the 3.4.
Previous patches contained an error in the message formatting. "buf" variable out of scope before "msg" used. Appending '\0' to the format string isn't guaranteed to NUL-terminate the string. Actually it do nothing (except producing a warning).
|
msg173725 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2012-10-25 00:06 |
Patch updated (thanks Benjamin for comments).
|
msg173726 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2012-10-25 00:16 |
Looking at the changes in the patch it seems to me that, in at least a few cases, it's better to have a bare "invalid syntax" than a misleading error.
For example:
>>> dict(a = i for i in range(10))
+ SyntaxError: invalid syntax - ')' expected
The () are ok, the message is misleading.
>>> obj.None = 1
+SyntaxError: invalid syntax - name expected
'name' here is a bit vague.
>>> def f(x, None):
... pass
+SyntaxError: invalid syntax - ')' expected
>>> def f(*None):
... pass
+SyntaxError: invalid syntax - ')' expected
Here the () are ok too.
>>> def f(**None):
... pass
+SyntaxError: invalid syntax - name expected
Here I would have expected the "')' expected" like in the previous example, but there's "name" instead (which is a bit better, albeit inconsistent).
I wouldn't consider this an improvement, but for other situations the error message is probably useful.
I see 3 options here:
1) we find a way to show the expected token only when the message is not misleading;
2) if the expected token is useful more often than it is misleading, then we could apply the patch as is;
3) if it is misleading more often than it is useful, it's probably better to reject the patch.
|
msg179574 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-10 16:52 |
> >>> dict(a = i for i in range(10))
> + SyntaxError: invalid syntax - ')' expected
>
> The () are ok, the message is misleading.
"dict(a = i)" is valid syntax, the compiler expects ")" instead of invalid "for".
> 'name' here is a bit vague.
The compiler actually expects a name (using Python terminology, see for example NameError). Of course you can propose an other name for "name" (this is just an entity in _PyParser_TokenDescs array).
> >>> def f(x, None):
> ... pass
> +SyntaxError: invalid syntax - ')' expected
>
> >>> def f(*None):
> ... pass
> +SyntaxError: invalid syntax - ')' expected
>
> Here the () are ok too.
The compiler means "def f(x,)" and "def f(*)", not "def f()" as you possible expects.
|
msg179576 - (view) |
Author: Ezio Melotti (ezio.melotti) *  |
Date: 2013-01-10 17:04 |
I'm not saying that these errors are wrong -- just that they are misleading (i.e. they might lead the user on the wrong path, and make finding the actual problem more difficult).
It should be noted that the examples I pasted don't include a full traceback though. The presence of the caret (^) in the right place will definitely make things clearer.
|
msg179677 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-11 13:21 |
I agree, the main problem is in the fact that "expected token" is not always singular. And even "most expected token" is a little subjective. The better solution will be to expect several possible tokens. This requires some parser modification.
|
msg179689 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2013-01-11 15:56 |
Hmm, "expected" attribute is set when there is only one possible expected token in PyParser_AddToken(). I don't understand why error messages are so misleading for "def f(*23):" (here not only ')', but a name possible).
|
msg315199 - (view) |
Author: Serhiy Storchaka (serhiy.storchaka) *  |
Date: 2018-04-11 18:51 |
Similar enhancement has been implemented in PyPy just now.
https://morepypy.blogspot.de/2018/04/improving-syntaxerror-in-pypy.html
|
msg389369 - (view) |
Author: Terry J. Reedy (terry.reedy) *  |
Date: 2021-03-23 02:50 |
I think that this issue should be closed as 'out of date' as it was pretty open-ended and it is unclear what request remains.
For the specific case "for a in (8,9)", the suggested "expected ':'" has been added on another issue. I expect that there are other additions from other issues.
For "for a in (8,9: print a," there is no change but for "for a in (8,9]" we now have "closing parenthesis ']' does not match opening parenthesis '('". This properly does not say that "expected ')'" as the needed fix might be to insert opener '['.
For many other cases, the proposed additions were disputed as not helpful, mostly because multiple things could be expected. I think other suggestions, based on current master, should be new issues.
|
msg389616 - (view) |
Author: Pablo Galindo Salgado (pablogsal) *  |
Date: 2021-03-27 21:22 |
Closing as per above
|
|
Date |
User |
Action |
Args |
2022-04-11 14:56:22 | admin | set | github: 44453 |
2021-03-27 21:22:06 | pablogsal | set | status: open -> closed resolution: out of date messages:
+ msg389616
stage: patch review -> resolved |
2021-03-23 02:50:20 | terry.reedy | set | nosy:
+ terry.reedy, pablogsal messages:
+ msg389369
|
2018-04-11 18:59:19 | serhiy.storchaka | set | stage: needs patch -> patch review pull_requests:
+ pull_request6147 |
2018-04-11 18:51:41 | serhiy.storchaka | set | messages:
+ msg315199 versions:
+ Python 3.8, - Python 3.4 |
2013-01-11 15:56:50 | serhiy.storchaka | set | messages:
+ msg179689 |
2013-01-11 13:21:07 | serhiy.storchaka | set | messages:
+ msg179677 stage: patch review -> needs patch |
2013-01-10 17:04:31 | ezio.melotti | set | messages:
+ msg179576 |
2013-01-10 16:52:44 | serhiy.storchaka | set | messages:
+ msg179574 |
2012-10-25 00:16:26 | ezio.melotti | set | nosy:
+ ezio.melotti messages:
+ msg173726
|
2012-10-25 00:06:03 | serhiy.storchaka | set | files:
+ syntax-error-hints-3.4_2.patch
messages:
+ msg173725 |
2012-10-24 09:29:03 | serhiy.storchaka | set | stage: test needed -> patch review |
2012-10-11 14:44:54 | serhiy.storchaka | set | files:
+ syntax-error-hints-3.4.patch versions:
+ Python 3.4, - Python 3.2 nosy:
+ serhiy.storchaka
messages:
+ msg172646
|
2010-08-11 19:51:41 | rhettinger | set | nosy:
+ rhettinger messages:
+ msg113621
|
2010-08-09 18:37:21 | terry.reedy | set | versions:
- Python 2.7 |
2010-01-15 15:50:55 | pitrou | set | nosy:
+ benjamin.peterson
versions:
+ Python 3.2, - Python 3.1 |
2010-01-13 19:28:34 | dmalcolm | set | files:
+ syntax-error-hints.patch
nosy:
+ dmalcolm messages:
+ msg97734
keywords:
+ patch |
2009-03-30 18:51:54 | ajaksu2 | set | versions:
+ Python 3.1, Python 2.7, - Python 2.6 nosy:
+ ajaksu2
messages:
+ msg84612
stage: test needed |
2007-01-12 13:03:27 | oliver_gramberg | create | |