Title: Show "expected" token on syntax error
Type: enhancement Stage: patch review
Components: Interpreter Core Versions: Python 3.8
Status: open Resolution:
Dependencies: Superseder:
Assigned To: Nosy List: ajaksu2, benjamin.peterson, dmalcolm, ezio.melotti, oliver_gramberg, rhettinger, sean_gillespie, serhiy.storchaka
Priority: normal Keywords: patch

Created on 2007-01-12 13:03 by oliver_gramberg, last changed 2018-04-11 18:59 by serhiy.storchaka.

File name Uploaded Description Edit
pythonrun.patch oliver_gramberg, 2007-03-30 11:44 Patch for /Python/pythonrun.c
syntax-error-hints.patch dmalcolm, 2010-01-13 19:28 review
syntax-error-hints-3.4.patch serhiy.storchaka, 2012-10-11 14:44 review
syntax-error-hints-3.4_2.patch serhiy.storchaka, 2012-10-25 00:06 review
Pull Requests
URL Status Linked Edit
PR 6453 closed serhiy.storchaka, 2018-04-11 18:59
Messages (14)
msg54974 - (view) Author: Oliver Gramberg (oliver_gramberg) Date: 2007-01-12 13:03
I suggest that the parser, when reporting a syntax
error, should make use of its knowlegde of which token
type is expected at the position where the error
occurred. This results in more helpful error messages:

>>> for a in (8,9)
  File "<stdin>", line 1
    for a in (8,9)
SyntaxError: invalid syntax - COLON expected
>>> for a in (8,9: print a,
  File "<stdin>", line 1
    for a in (8,9: print a,
SyntaxError: invalid syntax: RPAR expected

I tried the following patch (for pythonrun.c). It works
well in the shell both interactively and in scripts,
as well as in IDLE. But it's not complete:
- It doesn't always print useful messages (only for
fixed-size terminal token types, I assume.)
- There sure are cases where more than one token type
is allowed in a position. I believe I have seen that
this information is available too somewhere in the
parser, but it is not forwarded to the err_input

It's even nicer to show "')'" instead of "RPAR"...

/* Set the error appropriate to the given input error code (see errcode.h) */

static void
err_input(perrdetail *err)
	PyObject *v, *w, *errtype;
	PyObject* u = NULL;
	char *msg = NULL;
	errtype = PyExc_SyntaxError;
	switch (err->error) {
	case E_SYNTAX:
		errtype = PyExc_IndentationError;
		if (err->expected == INDENT)
			msg = "expected an indented block";
		else if (err->token == INDENT)
			msg = "unexpected indent";
		else if (err->token == DEDENT)
			msg = "unexpected unindent";
		else {
			char buf[50];
			errtype = PyExc_SyntaxError;
			if(err->expected != -1) {
				snprintf(buf, 48, "invalid syntax - %.16s expected\0",
				msg = buf;
			} else {
				msg = "invalid syntax";

I am willing to help work on this.

msg54975 - (view) Author: Sean Gillespie (sean_gillespie) Date: 2007-03-28 23:37
Your patch seems to work.

I agree that showing the token (as in ")") would indeed be much more useful, and it would be pretty easy to implement.

However, I think that you should generate a diff for your patch.  Its incredibly hard to read over SF.
msg54976 - (view) Author: Oliver Gramberg (oliver_gramberg) Date: 2007-03-30 11:44
Pfa a diff for my patch.


File Added: pythonrun.patch
msg84612 - (view) Author: Daniel Diniz (ajaksu2) (Python triager) Date: 2009-03-30 18:51
Sounds really useful.
msg97734 - (view) Author: Dave Malcolm (dmalcolm) (Python committer) Date: 2010-01-13 19:28
I'm attaching a new version of the patch, based on Oliver's (from 3 years ago).  This patch is against the py3k branch.

I've introduced a new table of (const) strings: _PyParser_TokenDescs, giving descriptions of each token type, so that you get e.g. "')'" rather than "RPAR"

The patch of pythonrun.c is unchanged other than using the description table, rather than the name table.

I've patched the expected results for the doctests in test_genexps and so that these files pass: this gives the code the beginnings of a test suite.

The existing patch adds this compiler warning for me (gcc 4.4.2, on Fedora 12):
  Python/pythonrun.c: In function ‘err_input’:
  Python/pythonrun.c:1923: warning: embedded ‘\0’ in format
However I believe that snprintf isn't guaranteed to NUL-terminate the string under all conditions on all platforms, so the '\0' seems a sane failsafe.

How does this look?

I haven't attempted to deal with places where multiple token types are permitted, and it does sometimes simply emit "invalid syntax".
msg113621 - (view) Author: Raymond Hettinger (rhettinger) * (Python committer) Date: 2010-08-11 19:51
+1 on the basic idea to make error messages more informative where possible, but am not sure how it would work in any but the more simple cases.  

How would work in cases where there are multiple possible "expected" tokens?

   >>> def f(x 3):	
   SyntaxError: invalid syntax

It chokes at the "3".  The expected token is either a comma, colon, or closing parenthesis.

Also, the most annoying and least obvious syntax errors are ones that are revealed many characters away from the original cause (i.e. unbalanced opening brackets or parentheses).  Am not sure how you can readily construct a helpful message in these cases.
msg172646 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-11 14:44
I'm attaching a new version of the patch, based on Dave's (from 2.5 years ago). This patch is against the 3.4.

Previous patches contained an error in the message formatting. "buf" variable out of scope before "msg" used. Appending '\0' to the format string isn't guaranteed to NUL-terminate the string. Actually it do nothing (except producing a warning).
msg173725 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2012-10-25 00:06
Patch updated (thanks Benjamin for comments).
msg173726 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2012-10-25 00:16
Looking at the changes in the patch it seems to me that, in at least a few cases, it's better to have a bare "invalid syntax" than a misleading error.
For example:

     >>> dict(a = i for i in range(10))
+    SyntaxError: invalid syntax - ')' expected

The () are ok, the message is misleading.

 >>> obj.None = 1
+SyntaxError: invalid syntax - name expected

'name' here is a bit vague.

 >>> def f(x, None):
 ...     pass
+SyntaxError: invalid syntax - ')' expected

 >>> def f(*None):
 ...     pass
+SyntaxError: invalid syntax - ')' expected

Here the () are ok too.

 >>> def f(**None):
 ...     pass
+SyntaxError: invalid syntax - name expected

Here I would have expected the "')' expected" like in the previous example, but there's "name" instead (which is a bit better, albeit inconsistent).

I wouldn't consider this an improvement, but for other situations the error message is probably useful.
I see 3 options here:
1) we find a way to show the expected token only when the message is not misleading;
2) if the expected token is useful more often than it is misleading, then we could apply the patch as is;
3) if it is misleading more often than it is useful, it's probably better to reject the patch.
msg179574 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-10 16:52
>      >>> dict(a = i for i in range(10))
> +    SyntaxError: invalid syntax - ')' expected
> The () are ok, the message is misleading.

"dict(a = i)" is valid syntax, the compiler expects ")" instead of invalid "for".

> 'name' here is a bit vague.

The compiler actually expects a name (using Python terminology, see for example NameError). Of course you can propose an other name for "name" (this is just an entity in _PyParser_TokenDescs array).

>  >>> def f(x, None):
>  ...     pass
> +SyntaxError: invalid syntax - ')' expected
>  >>> def f(*None):
>  ...     pass
> +SyntaxError: invalid syntax - ')' expected
> Here the () are ok too.

The compiler means "def f(x,)" and "def f(*)", not "def f()" as you possible expects.
msg179576 - (view) Author: Ezio Melotti (ezio.melotti) * (Python committer) Date: 2013-01-10 17:04
I'm not saying that these errors are wrong -- just that they are misleading (i.e. they might lead the user on the wrong path, and make finding the actual problem more difficult).

It should be noted that the examples I pasted don't include a full traceback though.  The presence of the caret (^) in the right place will definitely make things clearer.
msg179677 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-11 13:21
I agree, the main problem is in the fact that "expected token" is not always singular. And even "most expected token" is a little subjective. The better solution will be to expect several possible tokens. This requires some parser modification.
msg179689 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2013-01-11 15:56
Hmm, "expected" attribute is set when there is only one possible expected token in PyParser_AddToken(). I don't understand why error messages are so misleading for "def f(*23):" (here not only ')', but a name possible).
msg315199 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2018-04-11 18:51
Similar enhancement has been implemented in PyPy just now.
Date User Action Args
2018-04-11 18:59:19serhiy.storchakasetstage: needs patch -> patch review
pull_requests: + pull_request6147
2018-04-11 18:51:41serhiy.storchakasetmessages: + msg315199
versions: + Python 3.8, - Python 3.4
2013-01-11 15:56:50serhiy.storchakasetmessages: + msg179689
2013-01-11 13:21:07serhiy.storchakasetmessages: + msg179677
stage: patch review -> needs patch
2013-01-10 17:04:31ezio.melottisetmessages: + msg179576
2013-01-10 16:52:44serhiy.storchakasetmessages: + msg179574
2012-10-25 00:16:26ezio.melottisetnosy: + ezio.melotti
messages: + msg173726
2012-10-25 00:06:03serhiy.storchakasetfiles: + syntax-error-hints-3.4_2.patch

messages: + msg173725
2012-10-24 09:29:03serhiy.storchakasetstage: test needed -> patch review
2012-10-11 14:44:54serhiy.storchakasetfiles: + syntax-error-hints-3.4.patch
versions: + Python 3.4, - Python 3.2
nosy: + serhiy.storchaka

messages: + msg172646
2010-08-11 19:51:41rhettingersetnosy: + rhettinger
messages: + msg113621
2010-08-09 18:37:21terry.reedysetversions: - Python 2.7
2010-01-15 15:50:55pitrousetnosy: + benjamin.peterson

versions: + Python 3.2, - Python 3.1
2010-01-13 19:28:34dmalcolmsetfiles: + syntax-error-hints.patch

nosy: + dmalcolm
messages: + msg97734

keywords: + patch
2009-03-30 18:51:54ajaksu2setversions: + Python 3.1, Python 2.7, - Python 2.6
nosy: + ajaksu2

messages: + msg84612

stage: test needed
2007-01-12 13:03:27oliver_grambergcreate