This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: repl segfaults on non utf-8 input
Type: crash Stage: resolved
Components: Versions: Python 3.11, Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: jooon, miss-islington, pablogsal, xtreak
Priority: normal Keywords: patch

Created on 2022-03-25 10:06 by jooon, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 32129 merged pablogsal, 2022-03-26 15:46
PR 32130 merged pablogsal, 2022-03-26 17:26
Messages (8)
msg415992 - (view) Author: Jon Åslund (jooon) Date: 2022-03-25 10:06
Some bytes that are non utf-8 segfaults python repl in 3.10 and later on linux. Example:

$ python3.10
Python 3.10.4 (main, Mar 24 2022, 14:20:44) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> �
Segmentation fault (core dumped)

It is treated correctly in Python 3.9 and earlier

$ python3.9
Python 3.9.12 (main, Mar 24 2022, 14:21:53) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> �
  File "<stdin>", line 0
    
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

How to reproduce:

In Gnome on Ubuntu 20.04 with the Swedish keyboard layout, holding left alt and pressing the ö key enters the byte 0xb6 into the terminal.

I have only been able to make it crash the repl. I can't make it crash the parser. For instance trying to eval the byte.
msg416004 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2022-03-25 14:45
This looks similar to https://bugs.python.org/issue46206
msg416005 - (view) Author: Jon Åslund (jooon) Date: 2022-03-25 14:59
Yes. I think they are the same. I can reproduce the emoji crash. This is much easier to reproduce. No need to have a Swedish keyboard layout.

1. Copy _😀
2. Start python with a non unicode locale. LC_ALL=C python3.10
3. Paste in _😀
4. Press backspace once. It will look like the 2 character wide emoji is replaced by a 1 character wide space.
6. Press return
7. Crash
msg416006 - (view) Author: Jon Åslund (jooon) Date: 2022-03-25 15:07
very similar back trace too

(gdb) run
Starting program: /home/jon/.pyenv/versions/3.10.4/bin/python3.10 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Python 3.10.4 (main, Mar 24 2022, 14:20:44) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> _ 

Program received signal SIGSEGV, Segmentation fault.
__strchr_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:57
57	../sysdeps/x86_64/multiarch/strchr-avx2.S: No such file or directory.
(gdb) bt
#0  __strchr_avx2 () at ../sysdeps/x86_64/multiarch/strchr-avx2.S:57
#1  0x00005555557d4a7a in get_error_line (lineno=lineno@entry=0, p=<optimized out>, p=<optimized out>) at Parser/pegen.c:443
#2  0x00005555557d541b in _PyPegen_raise_error_known_location (p=0x7ffff7885ed0, 
    errtype=0x5555558fe420 <_PyExc_SyntaxError>, lineno=0, col_offset=0, end_lineno=0, end_col_offset=-1, 
    errmsg=0x5555558a2dd3 "(%s) %U", va=0x7fffffffd410) at Parser/pegen.c:499
#3  0x00005555557d5646 in _PyPegen_raise_error (p=p@entry=0x7ffff7885ed0, errtype=<optimized out>, 
    errmsg=errmsg@entry=0x5555558a2dd3 "(%s) %U") at Parser/pegen.c:422
#4  0x00005555557d5839 in raise_decode_error (p=p@entry=0x7ffff7885ed0) at Parser/pegen.c:271
#5  0x00005555557d6193 in initialize_token (token_type=60, end=0x0, start=<optimized out>, token=0x7ffff7a55d10, 
    p=0x7ffff7885ed0) at Parser/pegen.c:720
#6  _PyPegen_fill_token (p=p@entry=0x7ffff7885ed0) at Parser/pegen.c:793
#7  0x00005555557fec00 in statement_newline_rule (p=0x7ffff7885ed0) at Parser/parser.c:1080
#8  interactive_rule (p=0x7ffff7885ed0) at Parser/parser.c:1002
#9  _PyPegen_parse (p=p@entry=0x7ffff7885ed0) at Parser/parser.c:34508
#10 0x00005555557d6c60 in _PyPegen_run_parser (p=0x7ffff7885ed0) at Parser/pegen.c:1342
#11 0x00005555557d718f in _PyPegen_run_parser_from_file_pointer (fp=fp@entry=0x7ffff7e29980 <_IO_2_1_stdin_>, 
    start_rule=start_rule@entry=256, filename_ob=filename_ob@entry=0x7ffff7a85670, enc=enc@entry=0x7ffff7a7c1a0 "utf-8", 
    ps1=<optimized out>, ps1@entry=0x1e000000160 <error: Cannot access memory at address 0x1e000000160>, 
    ps2=ps2@entry=0xe0000001a0 <error: Cannot access memory at address 0xe0000001a0>, flags=0x7fffffffd7f8, 
    errcode=0x7fffffffd724, arena=0x7ffff792cc70) at Parser/pegen.c:1448
#12 0x000055555575661c in _PyParser_ASTFromFile (fp=fp@entry=0x7ffff7e29980 <_IO_2_1_stdin_>, 
    filename_ob=filename_ob@entry=0x7ffff7a85670, enc=enc@entry=0x7ffff7a7c1a0 "utf-8", mode=mode@entry=256, 
    ps1=0x1e000000160 <error: Cannot access memory at address 0x1e000000160>, ps1@entry=0x7ffff7acf960 ">>> ", 
    ps2=0xe0000001a0 <error: Cannot access memory at address 0xe0000001a0>, ps2@entry=0x7ffff7af02e0 "... ", 
    flags=<optimized out>, errcode=<optimized out>, arena=<optimized out>) at Parser/peg_api.c:26
#13 0x00005555556cad97 in PyRun_InteractiveOneObjectEx (fp=fp@entry=0x7ffff7e29980 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff7a85670, flags=flags@entry=0x7fffffffd7f8) at Python/pythonrun.c:257
#14 0x00005555556cba26 in _PyRun_InteractiveLoopObject (fp=fp@entry=0x7ffff7e29980 <_IO_2_1_stdin_>, filename=filename@entry=0x7ffff7a85670, flags=flags@entry=0x7fffffffd7f8) at Python/pythonrun.c:148
#15 0x00005555556cc5ce in _PyRun_AnyFileObject (flags=<optimized out>, closeit=<optimized out>, filename=0x7ffff7a85670, fp=<optimized out>) at Python/pythonrun.c:84
#16 PyRun_AnyFileExFlags (fp=0x7ffff7e29980 <_IO_2_1_stdin_>, filename=filename@entry=0x555555802103 "<stdin>", closeit=closeit@entry=0, flags=flags@entry=0x7fffffffd7f8) at Python/pythonrun.c:116
#17 0x00005555555bb5c7 in pymain_run_stdin (config=0x555555932ce0) at Modules/main.c:502
#18 pymain_run_python (exitcode=exitcode@entry=0x7fffffffd930) at Modules/main.c:590
#19 0x00005555555bba1f in Py_RunMain () at Modules/main.c:666
#20 pymain_main (args=0x7fffffffd8f0) at Modules/main.c:696
#21 Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:720
#22 0x00007ffff7c610b3 in __libc_start_main (main=0x5555555aedb0 <main>, argc=1, argv=0x7fffffffda58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffda48)
    at ../csu/libc-start.c:308
#23 0x00005555555ba57e in _start () at ./Include/internal/pycore_pyerrors.h:14
msg416070 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2022-03-26 15:54
Ah yes, we have been defeated by half an emoji :)
msg416072 - (view) Author: miss-islington (miss-islington) Date: 2022-03-26 16:29
New changeset 26cca8067bf5306e372c0e90036d832c5021fd90 by Pablo Galindo Salgado in branch 'main':
bpo-47117: Don't crash if we fail to decode characters when the tokenizer buffers are uninitialized (GH-32129)
https://github.com/python/cpython/commit/26cca8067bf5306e372c0e90036d832c5021fd90
msg416079 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2022-03-26 18:26
New changeset 27ee43183437c473725eba00def0ea7647688926 by Pablo Galindo Salgado in branch '3.10':
[3.10] bpo-47117: Don't crash if we fail to decode characters when the tokenizer buffers are uninitialized (GH-32129) (GH-32130)
https://github.com/python/cpython/commit/27ee43183437c473725eba00def0ea7647688926
msg416080 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2022-03-26 18:26
Thanks for the report, Jon!
History
Date User Action Args
2022-04-11 14:59:57adminsetgithub: 91273
2022-03-26 18:26:31pablogsalsetstatus: open -> closed
resolution: fixed
messages: + msg416080

stage: patch review -> resolved
2022-03-26 18:26:12pablogsalsetmessages: + msg416079
2022-03-26 17:26:44pablogsalsetpull_requests: + pull_request30210
2022-03-26 16:29:16miss-islingtonsetnosy: + miss-islington
messages: + msg416072
2022-03-26 15:54:44pablogsalsetmessages: + msg416070
2022-03-26 15:46:31pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request30209
2022-03-25 15:07:15jooonsetmessages: + msg416006
2022-03-25 14:59:12jooonsetmessages: + msg416005
2022-03-25 14:45:24xtreaksetnosy: + pablogsal, xtreak
messages: + msg416004
2022-03-25 10:06:05joooncreate