This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Improve error message when source code contains invisible control characters
Type: enhancement Stage: resolved
Components: Interpreter Core Versions: Python 3.11
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: aroberge, pablogsal, steven.daprano, terry.reedy
Priority: normal Keywords: patch

Created on 2021-11-15 23:52 by steven.daprano, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 29654 merged pablogsal, 2021-11-20 02:32
Messages (3)
msg406379 - (view) Author: Steven D'Aprano (steven.daprano) * (Python committer) Date: 2021-11-15 23:52
Invisible control characters (aside from white space) are not permitted in source code, but the syntax error we get is confusing and lacks information:

>>> s = 'print\x17("Hello")'
>>> eval(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    print("Hello")
         ^
SyntaxError: invalid syntax


The caret points to an invisible character. The offending control character is not visible in the traceback, or the source code unless you use a hex editor. Copying and pasting the string from the traceback, or the source code, may remove the control character (depending on the tools you use), making it even harder to track down the problem.

I suggest that the syntax error should state that the problem is an invisible control character, and display it as a standard human-readable code together with its hex code:

SyntaxError: invisible control character ^W (0x17)


Just in case it isn't obvious what the mapping between controls and the human visible string is:

def control(char):
    n = ord(char)
    if 0 <= n <= 0x1F:
        # C0 control codes
        return '^' + chr(ord('@')+n)
    elif n == 0x7F:
        # DEL
        return '^?'
    elif 0x80 <= n <= 0x9F:
        # C1 control codes
        return 'Esc+' + chr(ord('@')+n-0x80)
    else:
        raise ValueError('Not a control character.')


https://en.wikipedia.org/wiki/C0_and_C1_control_codes
msg406630 - (view) Author: Terry J. Reedy (terry.reedy) * (Python committer) Date: 2021-11-19 23:42
I agree.
msg406682 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-11-20 18:28
New changeset 81f4e116ef7d30ef6e2041c2d6cf29af511a3a02 by Pablo Galindo Salgado in branch 'main':
bpo-45811: Improve error message when source code contains invisible control characters (GH-29654)
https://github.com/python/cpython/commit/81f4e116ef7d30ef6e2041c2d6cf29af511a3a02
History
Date User Action Args
2022-04-11 14:59:52adminsetgithub: 89969
2021-11-20 18:28:38pablogsalsetmessages: + msg406682
2021-11-20 18:28:37pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-11-20 02:32:49pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request27896
2021-11-19 23:42:14terry.reedysetnosy: + terry.reedy
messages: + msg406630
2021-11-19 23:37:01terry.reedysetnosy: + pablogsal
2021-11-16 00:19:52arobergesetnosy: + aroberge
2021-11-15 23:52:37steven.dapranocreate