Show "expected" token on syntax error #44453

olivergramberg · 2007-01-12T13:03:27Z

BPO	1634034
Nosy	@rhettinger, @terryjreedy, @devdanzin, @benjaminp, @ezio-melotti, @davidmalcolm, @serhiy-storchaka, @pablogsal
PRs	bpo-1634034: Show "expected" token on syntax error. #6453
Files	pythonrun.patch: Patch for /Python/pythonrun.c syntax-error-hints.patch syntax-error-hints-3.4.patch syntax-error-hints-3.4_2.patch

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-03-27.21:22:06.519>
created_at = <Date 2007-01-12.13:03:27.000>
labels = ['interpreter-core', 'type-feature', '3.8']
title = 'Show "expected" token on syntax error'
updated_at = <Date 2021-03-27.21:22:06.518>
user = 'https://bugs.python.org/olivergramberg'

bugs.python.org fields:

activity = <Date 2021-03-27.21:22:06.518>
actor = 'pablogsal'
assignee = 'none'
closed = True
closed_date = <Date 2021-03-27.21:22:06.519>
closer = 'pablogsal'
components = ['Interpreter Core']
creation = <Date 2007-01-12.13:03:27.000>
creator = 'oliver_gramberg'
dependencies = []
files = ['8286', '15860', '27530', '27708']
hgrepos = []
issue_num = 1634034
keywords = ['patch']
message_count = 16.0
messages = ['54974', '54975', '54976', '84612', '97734', '113621', '172646', '173725', '173726', '179574', '179576', '179677', '179689', '315199', '389369', '389616']
nosy_count = 10.0
nosy_names = ['rhettinger', 'terry.reedy', 'sean_gillespie', 'ajaksu2', 'oliver_gramberg', 'benjamin.peterson', 'ezio.melotti', 'dmalcolm', 'serhiy.storchaka', 'pablogsal']
pr_nums = ['6453']
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue1634034'
versions = ['Python 3.8']

olivergramberg · 2007-01-12T13:03:27Z

I suggest that the parser, when reporting a syntax
error, should make use of its knowlegde of which token
type is expected at the position where the error
occurred. This results in more helpful error messages:

-----------------------------------------------------

>>> for a in (8,9)
  File "<stdin>", line 1
    for a in (8,9)
                 ^
SyntaxError: invalid syntax - COLON expected
-----------------------------------------------------
>>> for a in (8,9: print a,
  File "<stdin>", line 1
    for a in (8,9: print a,
                 ^
SyntaxError: invalid syntax: RPAR expected

I tried the following patch (for pythonrun.c). It works
well in the shell both interactively and in scripts,
as well as in IDLE. But it's not complete:

It doesn't always print useful messages (only for
fixed-size terminal token types, I assume.)
There sure are cases where more than one token type
is allowed in a position. I believe I have seen that
this information is available too somewhere in the
parser, but it is not forwarded to the err_input
routine.

It's even nicer to show "')'" instead of "RPAR"...

-----------------------------------------------------
/* Set the error appropriate to the given input error code (see errcode.h) */

static void
err_input(perrdetail *err)
{
	PyObject *v, *w, *errtype;
	PyObject* u = NULL;
	char *msg = NULL;
	errtype = PyExc_SyntaxError;
	switch (err->error) {
	case E_SYNTAX:
		errtype = PyExc_IndentationError;
		if (err->expected == INDENT)
			msg = "expected an indented block";
		else if (err->token == INDENT)
			msg = "unexpected indent";
		else if (err->token == DEDENT)
			msg = "unexpected unindent";
		else {
			char buf[50];
			errtype = PyExc_SyntaxError;
			if(err->expected != -1) {
				snprintf(buf, 48, "invalid syntax - %.16s expected\0",
					_PyParser_TokenNames[err->expected]);
				msg = buf;
			} else {
				msg = "invalid syntax";
			}
		}
		break;
		...

I am willing to help work on this.

Regards
-Oliver

seangillespie · 2007-03-28T23:37:26Z

Your patch seems to work.

I agree that showing the token (as in ")") would indeed be much more useful, and it would be pretty easy to implement.

However, I think that you should generate a diff for your patch. Its incredibly hard to read over SF.

olivergramberg · 2007-03-30T11:44:04Z

Pfa a diff for my patch.

Regards
-Oliver

File Added: pythonrun.patch

devdanzin · 2009-03-30T18:51:55Z

Sounds really useful.

davidmalcolm · 2010-01-13T19:28:32Z

I'm attaching a new version of the patch, based on Oliver's (from 3 years ago). This patch is against the py3k branch.

I've introduced a new table of (const) strings: _PyParser_TokenDescs, giving descriptions of each token type, so that you get e.g. "')'" rather than "RPAR"

The patch of pythonrun.c is unchanged other than using the description table, rather than the name table.

I've patched the expected results for the doctests in test_genexps and test_syntax.py so that these files pass: this gives the code the beginnings of a test suite.

The existing patch adds this compiler warning for me (gcc 4.4.2, on Fedora 12):
Python/pythonrun.c: In function ‘err_input’:
Python/pythonrun.c:1923: warning: embedded ‘\0’ in format
However I believe that snprintf isn't guaranteed to NUL-terminate the string under all conditions on all platforms, so the '\0' seems a sane failsafe.

How does this look?

I haven't attempted to deal with places where multiple token types are permitted, and it does sometimes simply emit "invalid syntax".

rhettinger · 2010-08-11T19:51:41Z

+1 on the basic idea to make error messages more informative where possible, but am not sure how it would work in any but the more simple cases.

How would work in cases where there are multiple possible "expected" tokens?

   >>> def f(x 3):	
   SyntaxError: invalid syntax

It chokes at the "3". The expected token is either a comma, colon, or closing parenthesis.

Also, the most annoying and least obvious syntax errors are ones that are revealed many characters away from the original cause (i.e. unbalanced opening brackets or parentheses). Am not sure how you can readily construct a helpful message in these cases.

serhiy-storchaka · 2012-10-11T14:44:54Z

I'm attaching a new version of the patch, based on Dave's (from 2.5 years ago). This patch is against the 3.4.

Previous patches contained an error in the message formatting. "buf" variable out of scope before "msg" used. Appending '\0' to the format string isn't guaranteed to NUL-terminate the string. Actually it do nothing (except producing a warning).

serhiy-storchaka · 2012-10-25T00:06:03Z

Patch updated (thanks Benjamin for comments).

ezio-melotti · 2012-10-25T00:16:26Z

Looking at the changes in the patch it seems to me that, in at least a few cases, it's better to have a bare "invalid syntax" than a misleading error.
For example:

     >>> dict(a = i for i in range(10))
+    SyntaxError: invalid syntax - ')' expected

The () are ok, the message is misleading.

 >>> obj.None = 1
+SyntaxError: invalid syntax - name expected

'name' here is a bit vague.

 
 >>> def f(x, None):
 ...     pass
+SyntaxError: invalid syntax - ')' expected

 >>> def f(*None):
 ...     pass
+SyntaxError: invalid syntax - ')' expected

Here the () are ok too.

 
 >>> def f(**None):
 ...     pass
+SyntaxError: invalid syntax - name expected

Here I would have expected the "')' expected" like in the previous example, but there's "name" instead (which is a bit better, albeit inconsistent).

I wouldn't consider this an improvement, but for other situations the error message is probably useful.
I see 3 options here:

we find a way to show the expected token only when the message is not misleading;
if the expected token is useful more often than it is misleading, then we could apply the patch as is;
if it is misleading more often than it is useful, it's probably better to reject the patch.

serhiy-storchaka · 2013-01-10T16:52:44Z

 \>\>\> dict(a = i for i in range(10))
SyntaxError: invalid syntax - ')' expected

The () are ok, the message is misleading.

"dict(a = i)" is valid syntax, the compiler expects ")" instead of invalid "for".

'name' here is a bit vague.

The compiler actually expects a name (using Python terminology, see for example NameError). Of course you can propose an other name for "name" (this is just an entity in _PyParser_TokenDescs array).

>>> def f(x, None):
... pass
+SyntaxError: invalid syntax - ')' expected

>>> def f(*None):
... pass
+SyntaxError: invalid syntax - ')' expected

Here the () are ok too.

The compiler means "def f(x,)" and "def f(*)", not "def f()" as you possible expects.

ezio-melotti · 2013-01-10T17:04:30Z

I'm not saying that these errors are wrong -- just that they are misleading (i.e. they might lead the user on the wrong path, and make finding the actual problem more difficult).

It should be noted that the examples I pasted don't include a full traceback though. The presence of the caret (^) in the right place will definitely make things clearer.

serhiy-storchaka · 2013-01-11T13:21:07Z

I agree, the main problem is in the fact that "expected token" is not always singular. And even "most expected token" is a little subjective. The better solution will be to expect several possible tokens. This requires some parser modification.

serhiy-storchaka · 2013-01-11T15:56:50Z

Hmm, "expected" attribute is set when there is only one possible expected token in PyParser_AddToken(). I don't understand why error messages are so misleading for "def f(*23):" (here not only ')', but a name possible).

serhiy-storchaka · 2018-04-11T18:51:42Z

Similar enhancement has been implemented in PyPy just now.

https://morepypy.blogspot.de/2018/04/improving-syntaxerror-in-pypy.html

terryjreedy · 2021-03-23T02:50:19Z

I think that this issue should be closed as 'out of date' as it was pretty open-ended and it is unclear what request remains.

For the specific case "for a in (8,9)", the suggested "expected ':'" has been added on another issue. I expect that there are other additions from other issues.

For "for a in (8,9: print a," there is no change but for "for a in (8,9]" we now have "closing parenthesis ']' does not match opening parenthesis '('". This properly does not say that "expected ')'" as the needed fix might be to insert opener '['.

For many other cases, the proposed additions were disputed as not helpful, mostly because multiple things could be expected. I think other suggestions, based on current master, should be new issues.

pablogsal · 2021-03-27T21:22:06Z

Closing as per above

olivergramberg mannequin added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-feature A feature request or enhancement labels Jan 12, 2007

serhiy-storchaka added 3.8 only security fixes labels Apr 11, 2018

pablogsal closed this as completed Mar 27, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Show "expected" token on syntax error #44453

Show "expected" token on syntax error #44453

olivergramberg mannequin commented Jan 12, 2007

olivergramberg mannequin commented Jan 12, 2007

seangillespie mannequin commented Mar 28, 2007

olivergramberg mannequin commented Mar 30, 2007

devdanzin mannequin commented Mar 30, 2009

davidmalcolm commented Jan 13, 2010

rhettinger commented Aug 11, 2010

serhiy-storchaka commented Oct 11, 2012

serhiy-storchaka commented Oct 25, 2012

ezio-melotti commented Oct 25, 2012

serhiy-storchaka commented Jan 10, 2013

ezio-melotti commented Jan 10, 2013

serhiy-storchaka commented Jan 11, 2013

serhiy-storchaka commented Jan 11, 2013

serhiy-storchaka commented Apr 11, 2018

terryjreedy commented Mar 23, 2021

pablogsal commented Mar 27, 2021

Show "expected" token on syntax error #44453

Show "expected" token on syntax error #44453

Comments

olivergramberg mannequin commented Jan 12, 2007

olivergramberg mannequin commented Jan 12, 2007

seangillespie mannequin commented Mar 28, 2007

olivergramberg mannequin commented Mar 30, 2007

devdanzin mannequin commented Mar 30, 2009

davidmalcolm commented Jan 13, 2010

rhettinger commented Aug 11, 2010

serhiy-storchaka commented Oct 11, 2012

serhiy-storchaka commented Oct 25, 2012

ezio-melotti commented Oct 25, 2012

serhiy-storchaka commented Jan 10, 2013

ezio-melotti commented Jan 10, 2013

serhiy-storchaka commented Jan 11, 2013

serhiy-storchaka commented Jan 11, 2013

serhiy-storchaka commented Apr 11, 2018

terryjreedy commented Mar 23, 2021

pablogsal commented Mar 27, 2021