Issue 20608: 'SyntaxError: invalid token' is unfriendly

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64807

classification

Title:	'SyntaxError: invalid token' is unfriendly
Type:	enhancement	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.7

process

Status:	closed	Resolution:	out of date
Dependencies:		Superseder:
Assigned To:		Nosy List:	John Parejko, benjamin.peterson, mgedmin, serhiy.storchaka, terry.reedy
Priority:	normal	Keywords:	patch

Created on 2014-02-12 14:03 by mgedmin, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
better-errors.patch	mgedmin, 2014-02-15 12:15	tentative patch without tests	review
better-errors-v2.patch	mgedmin, 2014-02-15 12:49	updated patch, still no automated tests	review
better-errors-v3.patch	mgedmin, 2014-02-15 12:55	updated patch, still no automated tests	review
better-errors-test.patch	mgedmin, 2014-02-15 13:39	patch that adds tests	review
better-errors-test-v2.patch	mgedmin, 2014-02-15 13:46	updated patch that adds tests	review

Messages (14)
msg211090 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-12 14:03
Type something like the following at the interpreter prompt: >>> 04208 File "<stdin>", line 1 04208 ^ SyntaxError: invalid token This is not very descriptive. I suggest "SyntaxError: invalid octal digit".
msg211092 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-12 14:07
I was looking at the current hg tip. The lexer emits E_TOKEN errors for the following cases: - invalid hex digit - invalid octal digit - invalid binary digit - invalid digit in float exponent - old-style octal constant (e.g. 001), which is no longer accepted I think I can come up with a patch that replaces them all with different error codes (E_BAD_HEX_DIGIT etc.) and different error messages. Does that sound like an acceptable change? (I never contributed non-documentation patches to CPython before.)
msg211104 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-02-12 17:21
Note that "invalid token" is emitted only on invalid first digit: >>> 0b2 File "<stdin>", line 1 0b2 ^ SyntaxError: invalid token >>> 0b02 File "<stdin>", line 1 0b02 ^ SyntaxError: invalid syntax See also issue1634034.
msg211270 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-15 12:15
Oh, hey, PEP 3127 actually asks for a better error message than "invalid token" for this case: http://www.python.org/dev/peps/pep-3127/#tokenizer-exception-handling So here's a tentative patch to test the waters. I still haven't figured out how to write tests for it (is Lib/test/test_tokenize.py the right place for that?), and I haven't manually tested it either, because building CPython tip fails for me with a strange link error about _PyTraceMalloc_Init/_PyTraceMalloc_Fini. If there's some documentation I should read about submitting CPython patches, or some IRC channel I should join, please tell me!
msg211271 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-15 12:33
I resolved my compilation problems (by running 'make distclean'). There are some problems with my patch: - "leading" is misspelled (as "lleading") - literals like 0x1z, 0o18, 0b12, 1.2e-1x produce a generic "invalid syntax" message instead of the specific "bad digit in hex/octal/binary/float literal" - 1.2e-x produces "bad digit in float literal" correctly, but the caret points to the '-' sign instead of the 'x' character
msg211272 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-15 12:41
I see that I misunderstood Serhiy's comment. I assumed he meant the caret will be pointing to the 1st digit that is invalid. Instead what actually happens is that E_TOKEN is emitted only if the 1st digit after the 0x/0o/0b prefix is invalid. So, I get the nice error messages for 0b2, 0o8, 0xz and 0e-x (but the caret incorrectly points to the previous character).
msg211273 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-15 12:49
Here's version 2 of the patch: - spelling error fixed - 0b2, 0o8, 0xg, 0e-x show the expected error at the expected place - 0b02, 0o08, 0x0g, 0e-0x continue produce a generic "syntax error" because the tokenizer thinks these are a pair of valid tokens (0b0 followed by 2 etc.), and the error comes from the parser
msg211274 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-15 12:55
Version 3 of the patch catches bad digits in the middle of a literal, like this: >>> 0o01010118001 File "<stdin>", line 1 0o01010118001 ^ SyntaxError: bad digit in octal literal
msg211275 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-15 13:39
Here are some unit tests for the new syntax errors (in test_syntax.py; test_tokenize.py turned out to be totally unrelated). One possible shortcoming: they do not test the column of the syntax error.
msg211276 - (view)	Author: Marius Gedminas (mgedmin) *	Date: 2014-02-15 13:46
Updated test that checks the syntax error offset as well. I think I'm done with the iterations. I'll be waiting for feedback.
msg211293 - (view)	Author: Serhiy Storchaka (serhiy.storchaka) *	Date: 2014-02-15 20:39
Marius, you can remove unneeded patches (click on the "edit" link and then press the "Unlink" button).
msg211294 - (view)	Author: Benjamin Peterson (benjamin.peterson) *	Date: 2014-02-15 20:40
I can look at this when the time for 3.5 rolls around.
msg284603 - (view)	Author: John Parejko (John Parejko)	Date: 2017-01-03 23:40
I had filed issue 29146 but eventually found this, which has patches to fix the exception messages. Could someone please look at this and get it incorporated into python?
msg389368 - (view)	Author: Terry J. Reedy (terry.reedy) *	Date: 2021-03-23 02:20
Fixed elsewhere. >>> 04208 File "<stdin>", line 1 04208 ^ SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers >>> 0o38 File "<stdin>", line 1 0o38 ^ SyntaxError: invalid digit '8' in octal literal

History
Date	User	Action	Args
2022-04-11 14:57:58	admin	set	github: 64807
2021-03-23 02:20:23	terry.reedy	set	status: open -> closed nosy: + terry.reedy messages: + msg389368 resolution: out of date stage: patch review -> resolved
2017-01-03 23:40:34	John Parejko	set	messages: + msg284603
2017-01-03 23:38:43	John Parejko	set	nosy: + John Parejko
2017-01-03 23:24:16	berker.peksag	set	versions: + Python 3.7, - Python 3.5
2017-01-03 23:21:42	berker.peksag	link	issue29146 superseder
2014-02-15 20:40:04	benjamin.peterson	set	messages: + msg211294
2014-02-15 20:39:36	serhiy.storchaka	set	type: enhancement messages: + msg211293 stage: patch review
2014-02-15 13:46:51	mgedmin	set	files: + better-errors-test-v2.patch messages: + msg211276
2014-02-15 13:39:20	mgedmin	set	files: + better-errors-test.patch messages: + msg211275
2014-02-15 12:55:11	mgedmin	set	files: + better-errors-v3.patch messages: + msg211274
2014-02-15 12:49:29	mgedmin	set	files: + better-errors-v2.patch messages: + msg211273
2014-02-15 12:41:38	mgedmin	set	messages: + msg211272
2014-02-15 12:33:40	mgedmin	set	messages: + msg211271
2014-02-15 12:15:30	mgedmin	set	files: + better-errors.patch keywords: + patch messages: + msg211270
2014-02-12 17:21:19	serhiy.storchaka	set	nosy: + serhiy.storchaka messages: + msg211104
2014-02-12 14:10:30	vstinner	set	nosy: + benjamin.peterson
2014-02-12 14:07:56	mgedmin	set	messages: + msg211092
2014-02-12 14:03:20	mgedmin	create