classification
Title: Parser aborts on incomplete/incorrect unicode literals in interactive mode
Type: crash Stage: resolved
Components: Unicode Versions: Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: erlendaasland, ezio.melotti, lys.nikolaou, pablogsal
Priority: normal Keywords: patch

Created on 2021-03-22 10:25 by erlendaasland, last changed 2021-03-22 19:20 by erlendaasland. This issue is now closed.

Files
File name Uploaded Description Edit
patch.diff erlendaasland, 2021-03-22 10:25
Pull Requests
URL Status Linked Edit
PR 24973 merged pablogsal, 2021-03-22 15:41
Messages (5)
msg389297 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-03-22 10:25
Incomplete unicode literals abort iso. generating SyntaxError:

(lldb) target create "./python.exe"
Current executable set to '/Users/erlendaasland/src/cpython.git/python.exe' (x86_64).
(lldb) r
Process 98955 launched: '/Users/erlendaasland/src/cpython.git/python.exe' (x86_64)
Python 3.10.0a6+ (heads/main:9a50ef43e4, Mar 22 2021, 11:18:33) [Clang 12.0.0 (clang-1200.0.32.29)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "\u1f"
Assertion failed: (col_offset >= 0 && (unsigned long)col_offset <= strlen(str)), function byte_offset_to_character_offset, file Parser/pegen.c, line 150.
Process 98955 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = hit program assert
    frame #4: 0x0000000100009bd6 python.exe`byte_offset_to_character_offset(line=0x00000001013f1220, col_offset=7) at pegen.c:150:5
   147 	    if (!str) {
   148 	        return 0;
   149 	    }
-> 150 	    assert(col_offset >= 0 && (unsigned long)col_offset <= strlen(str));
   151 	    PyObject *text = PyUnicode_DecodeUTF8(str, col_offset, "replace");
   152 	    if (!text) {
   153 	        return 0;
Target 0: (python.exe) stopped.
(lldb) p col_offset
(Py_ssize_t) $0 = 7
(lldb) p str
(const char *) $1 = 0x00000001013f1250 "\"\\u1f\""
(lldb) p (size_t) strlen(str)
(size_t) $2 = 6



Python 3.9 behaviour:
Python 3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) 
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "\u1f"
  File "<stdin>", line 1
    "\u1f"
          ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-3: truncated \uXXXX escape



Git bisect says the regression was introduced by this commit:

commit 08fb8ac99ab03d767aa0f1cfab3573eddf9df018
Author: Pablo Galindo <Pablogsal@gmail.com>
Date:   Thu Mar 18 01:03:11 2021 +0000

    bpo-42128: Add 'missing :' syntax error message to match statements (GH-24733)


I made a workaround (see attached patch), but I guess that's far from the correct solution :)
msg389298 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-03-22 10:28
Correction, git bisect pointed to _this_ commit (not 08fb8ac99ab03d767aa0f1cfab3573eddf9df018):

commit cd8dcbc851fcc312722cdb5544c2f25cf46b3f8a
Author: Pablo Galindo <Pablogsal@gmail.com>
Date:   Sun Mar 14 04:38:40 2021 +0100

    bpo-43410: Fix crash in the parser when producing syntax errors when reading from stdin (GH-24763)
msg389325 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-03-22 15:41
Thanks for the report and the patch Erlend!. I have transformed it into a PR with attribution in PR 24973
msg389329 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-03-22 16:24
New changeset 123ff266cda9ad279106f20dca06ba114f6a9b8a by Pablo Galindo in branch 'master':
bpo-43591: Fix error location in interactive mode for errors at the end of the line (GH-24973)
https://github.com/python/cpython/commit/123ff266cda9ad279106f20dca06ba114f6a9b8a
msg389336 - (view) Author: Erlend E. Aasland (erlendaasland) * (Python triager) Date: 2021-03-22 19:20
Thanks, Pablo!
History
Date User Action Args
2021-03-22 19:20:55erlendaaslandsetmessages: + msg389336
2021-03-22 16:24:56pablogsalsetmessages: + msg389329
2021-03-22 16:24:51pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-03-22 15:41:56pablogsalsetmessages: + msg389325
2021-03-22 15:41:11pablogsalsetstage: patch review
pull_requests: + pull_request23732
2021-03-22 10:58:01vstinnersetnosy: - vstinner
2021-03-22 10:28:15erlendaaslandsetmessages: + msg389298
2021-03-22 10:25:46erlendaaslandcreate