classification
Title: 'SyntaxError: invalid token' is unfriendly
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.7
process
Status: closed Resolution: out of date
Dependencies: Superseder:
Assigned To: Nosy List: John Parejko, benjamin.peterson, mgedmin, serhiy.storchaka, terry.reedy
Priority: normal Keywords: patch

Created on 2014-02-12 14:03 by mgedmin, last changed 2021-03-23 02:20 by terry.reedy. This issue is now closed.

Files
File name Uploaded Description Edit
better-errors.patch mgedmin, 2014-02-15 12:15 tentative patch without tests review
better-errors-v2.patch mgedmin, 2014-02-15 12:49 updated patch, still no automated tests review
better-errors-v3.patch mgedmin, 2014-02-15 12:55 updated patch, still no automated tests review
better-errors-test.patch mgedmin, 2014-02-15 13:39 patch that adds tests review
better-errors-test-v2.patch mgedmin, 2014-02-15 13:46 updated patch that adds tests review
Messages (14)
msg211090 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-12 14:03
Type something like the following at the interpreter prompt:

>>> 04208
  File "<stdin>", line 1
    04208
        ^
SyntaxError: invalid token


This is not very descriptive.  I suggest "SyntaxError: invalid octal digit".
msg211092 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-12 14:07
I was looking at the current hg tip.  The lexer emits E_TOKEN errors for the following cases:
- invalid hex digit
- invalid octal digit
- invalid binary digit
- invalid digit in float exponent
- old-style octal constant (e.g. 001), which is no longer accepted

I think I can come up with a patch that replaces them all with different error codes (E_BAD_HEX_DIGIT etc.) and different error messages.  Does that sound like an acceptable change?  (I never contributed non-documentation patches to CPython before.)
msg211104 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-12 17:21
Note that "invalid token" is emitted only on invalid first digit:

>>> 0b2
  File "<stdin>", line 1
    0b2
     ^
SyntaxError: invalid token
>>> 0b02
  File "<stdin>", line 1
    0b02
       ^
SyntaxError: invalid syntax


See also issue1634034.
msg211270 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-15 12:15
Oh, hey, PEP 3127 actually asks for a better error message than "invalid token" for this case: http://www.python.org/dev/peps/pep-3127/#tokenizer-exception-handling

So here's a tentative patch to test the waters.  I still haven't figured out how to write tests for it (is Lib/test/test_tokenize.py the right place for that?), and I haven't manually tested it either, because building CPython tip fails for me with a strange link error about _PyTraceMalloc_Init/_PyTraceMalloc_Fini.

If there's some documentation I should read about submitting CPython patches, or some IRC channel I should join, please tell me!
msg211271 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-15 12:33
I resolved my compilation problems (by running 'make distclean').

There are some problems with my patch:
- "leading" is misspelled (as "lleading")
- literals like 0x1z, 0o18, 0b12, 1.2e-1x produce a generic "invalid syntax" message instead of the specific "bad digit in hex/octal/binary/float literal"
- 1.2e-x produces "bad digit in float literal" correctly, but the caret points to the '-' sign instead of the 'x' character
msg211272 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-15 12:41
I see that I misunderstood Serhiy's comment.  I assumed he meant the caret will be pointing to the 1st digit that is invalid.  Instead what actually happens is that E_TOKEN is emitted only if the 1st digit after the 0x/0o/0b prefix is invalid.

So, I get the nice error messages for 0b2, 0o8, 0xz and 0e-x (but the caret incorrectly points to the previous character).
msg211273 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-15 12:49
Here's version 2 of the patch:
- spelling error fixed
- 0b2, 0o8, 0xg, 0e-x show the expected error at the expected place
- 0b02, 0o08, 0x0g, 0e-0x continue produce a generic "syntax error" because the tokenizer thinks these are a pair of valid tokens (0b0 followed by 2 etc.), and the error comes from the parser
msg211274 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-15 12:55
Version 3 of the patch catches bad digits in the middle of a literal, like this:

>>> 0o01010118001
  File "<stdin>", line 1
    0o01010118001
             ^
SyntaxError: bad digit in octal literal
msg211275 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-15 13:39
Here are some unit tests for the new syntax errors (in test_syntax.py; test_tokenize.py turned out to be totally unrelated).

One possible shortcoming: they do not test the column of the syntax error.
msg211276 - (view) Author: Marius Gedminas (mgedmin) * Date: 2014-02-15 13:46
Updated test that checks the syntax error offset as well.

I think I'm done with the iterations.  I'll be waiting for feedback.
msg211293 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2014-02-15 20:39
Marius, you can remove unneeded patches (click on the "edit" link and then press the "Unlink" button).
msg211294 - (view) Author: Benjamin Peterson (benjamin.peterson) * (Python committer) Date: 2014-02-15 20:40
I can look at this when the time for 3.5 rolls around.
msg284603 - (view) Author: John Parejko (John Parejko) Date: 2017-01-03 23:40
I had filed issue 29146 but eventually found this, which has patches to fix the exception messages. Could someone please look at this and get it incorporated into python?
msg389368 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-03-23 02:20
Fixed elsewhere.

>>> 04208
  File "<stdin>", line 1
    04208
        ^
SyntaxError: leading zeros in decimal integer literals are not permitted; use an 0o prefix for octal integers
>>> 0o38
  File "<stdin>", line 1
    0o38
       ^
SyntaxError: invalid digit '8' in octal literal
History
Date User Action Args
2021-03-23 02:20:23terry.reedysetstatus: open -> closed

nosy: + terry.reedy
messages: + msg389368

resolution: out of date
stage: patch review -> resolved
2017-01-03 23:40:34John Parejkosetmessages: + msg284603
2017-01-03 23:38:43John Parejkosetnosy: + John Parejko
2017-01-03 23:24:16berker.peksagsetversions: + Python 3.7, - Python 3.5
2017-01-03 23:21:42berker.peksaglinkissue29146 superseder
2014-02-15 20:40:04benjamin.petersonsetmessages: + msg211294
2014-02-15 20:39:36serhiy.storchakasettype: enhancement
messages: + msg211293
stage: patch review
2014-02-15 13:46:51mgedminsetfiles: + better-errors-test-v2.patch

messages: + msg211276
2014-02-15 13:39:20mgedminsetfiles: + better-errors-test.patch

messages: + msg211275
2014-02-15 12:55:11mgedminsetfiles: + better-errors-v3.patch

messages: + msg211274
2014-02-15 12:49:29mgedminsetfiles: + better-errors-v2.patch

messages: + msg211273
2014-02-15 12:41:38mgedminsetmessages: + msg211272
2014-02-15 12:33:40mgedminsetmessages: + msg211271
2014-02-15 12:15:30mgedminsetfiles: + better-errors.patch
keywords: + patch
messages: + msg211270
2014-02-12 17:21:19serhiy.storchakasetnosy: + serhiy.storchaka
messages: + msg211104
2014-02-12 14:10:30vstinnersetnosy: + benjamin.peterson
2014-02-12 14:07:56mgedminsetmessages: + msg211092
2014-02-12 14:03:20mgedmincreate