Issue 40958: ASAN/UBSAN: heap-buffer-overflow in pegen.c

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/85130

classification

Title:	ASAN/UBSAN: heap-buffer-overflow in pegen.c
Type:	security	Stage:	resolved
Components:	Interpreter Core	Versions:	Python 3.10, Python 3.9

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:		Nosy List:	christian.heimes, lys.nikolaou, miss-islington, pablogsal
Priority:	high	Keywords:	patch

Created on 2020-06-12 10:42 by christian.heimes, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL	Status	Linked	Edit
PR 20842	closed	pablogsal, 2020-06-12 21:18
PR 20875	merged	pablogsal, 2020-06-15 00:30
PR 20919	merged	miss-islington, 2020-06-16 15:49
PR 20968	closed	lys.nikolaou, 2020-06-18 21:19
PR 20970	merged	lys.nikolaou, 2020-06-18 21:45
PR 21001	merged	miss-islington, 2020-06-20 12:57

Messages (37)
msg371351 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2020-06-12 10:42
ASAN/UBSAN has detected a heap-buffer-overflow in pegen.c ==1625693==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x606000026b71 at pc 0x00000073574d bp 0x7fff297284f0 sp 0x7fff297284e0 READ of size 1 at 0x606000026b71 thread T0 #0 0x73574c in ascii_decode Objects/unicodeobject.c:4941 #1 0x82bd0f in unicode_decode_utf8 Objects/unicodeobject.c:4999 #2 0xf35859 in byte_offset_to_character_offset Parser/pegen.c:148 #3 0xf35859 in _PyPegen_raise_error_known_location Parser/pegen.c:412 #4 0xf36482 in _PyPegen_raise_error Parser/pegen.c:373 #5 0xf39e1d in tokenizer_error Parser/pegen.c:321 #6 0xf39e1d in _PyPegen_fill_token Parser/pegen.c:638 #7 0xf3ca0f in _PyPegen_expect_token Parser/pegen.c:753 #8 0xf4cc7a in _tmp_15_rule Parser/parser.c:16184 #9 0xf3c799 in _PyPegen_lookahead (/home/heimes/dev/python/cpython/python+0xf3c799) #10 0xfafb4a in compound_stmt_rule Parser/parser.c:1860 #11 0xfb7fc2 in statement_rule Parser/parser.c:1224 #12 0xfb7fc2 in _loop1_11_rule Parser/parser.c:15954 #13 0xfb7fc2 in statements_rule Parser/parser.c:1183 #14 0xfbbce7 in file_rule Parser/parser.c:716 #15 0xfbbce7 in _PyPegen_parse Parser/parser.c:24401 #16 0xf3f868 in _PyPegen_run_parser Parser/pegen.c:1077 #17 0xf4044f in _PyPegen_run_parser_from_file_pointer Parser/pegen.c:1137 #18 0xa27f36 in PyRun_FileExFlags Python/pythonrun.c:1057 #19 0xa2826a in PyRun_SimpleFileExFlags Python/pythonrun.c:400 #20 0x479b1b in pymain_run_file Modules/main.c:369 #21 0x479b1b in pymain_run_python Modules/main.c:553 #22 0x47bd59 in Py_RunMain Modules/main.c:632 #23 0x47bd59 in pymain_main Modules/main.c:662 #24 0x47bd59 in Py_BytesMain Modules/main.c:686 #25 0x7f59aa5cd041 in __libc_start_main (/lib64/libc.so.6+0x27041) #26 0x47643d in _start (/home/heimes/dev/python/cpython/python+0x47643d) 0x606000026b71 is located 0 bytes to the right of 49-byte region [0x606000026b40,0x606000026b71) allocated by thread T0 here: #0 0x7f59ab303667 in __interceptor_malloc (/lib64/libasan.so.6+0xb0667) #1 0x749c7d in PyUnicode_New Objects/unicodeobject.c:1437 #2 0x872f15 in _PyUnicode_Init Objects/unicodeobject.c:15535 #3 0x9fe0ab in pycore_init_types Python/pylifecycle.c:599 #4 0x9fe0ab in pycore_interp_init Python/pylifecycle.c:724 #5 0xa07c69 in pyinit_config Python/pylifecycle.c:765 #6 0xa07c69 in pyinit_core Python/pylifecycle.c:926 #7 0xa09b17 in Py_InitializeFromConfig Python/pylifecycle.c:1136 #8 0x4766c2 in pymain_init Modules/main.c:66 #9 0x47bd12 in pymain_main Modules/main.c:653 #10 0x47bd12 in Py_BytesMain Modules/main.c:686 #11 0x7f59aa5cd041 in __libc_start_main (/lib64/libc.so.6+0x27041) SUMMARY: AddressSanitizer: heap-buffer-overflow Objects/unicodeobject.c:4941 in ascii_decode Shadow bytes around the buggy address: 0x0c0c7fffcd10: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa 0x0c0c7fffcd20: 00 00 00 00 00 00 00 07 fa fa fa fa 00 00 00 00 0x0c0c7fffcd30: 00 00 00 00 fa fa fa fa 00 00 00 00 00 00 00 05 0x0c0c7fffcd40: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa 0x0c0c7fffcd50: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00 =>0x0c0c7fffcd60: 00 00 00 01 fa fa fa fa 00 00 00 00 00 00[01]fa 0x0c0c7fffcd70: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa 0x0c0c7fffcd80: 00 00 00 00 00 00 05 fa fa fa fa fa 00 00 00 00 0x0c0c7fffcd90: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 00 0x0c0c7fffcda0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa 0x0c0c7fffcdb0: fd fd fd fd fd fd fd fd fa fa fa fa 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==1625693==ABORTING
msg371358 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 12:34
Lysandros, could you take a look?
msg371359 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 12:38
Christian, what test are you running to get the error?
msg371360 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 12:51
> Lysandros, could you take a look? Yup, I'm on it.
msg371361 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 12:56
Note that although we could just exit if the length of the line is smaller than the column offset before calling https://github.com/python/cpython/blob/master/Parser/pegen.c#L148 (I assume that is the problem) is more important to understand how are we reaching that situation.
msg371362 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 13:04
Agreed. For that I'd probably need the input Christian is getting the error with. I'm currently trying things out, but haven't gotten the error yet.
msg371363 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 13:05
Seems that just this (at top level): >> yield from is able to produce a situation where the column offset is bigger than the line length: Col offset is 11 and line is 10
msg371365 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 13:08
Correcting my previous post (deleted): You can get an assert failure without ASAN with this patch: diff --git a/Parser/pegen.c b/Parser/pegen.c index e29910bf86..65fa44921f 100644 --- a/Parser/pegen.c +++ b/Parser/pegen.c @@ -145,6 +145,9 @@ byte_offset_to_character_offset(PyObject line, int col_offset) if (!str) { return 0; } + assert(col_offset <= PyUnicode_GET_LENGTH(line)); PyObject text = PyUnicode_DecodeUTF8(str, col_offset, "replace"); if (!text) { return 0;
msg371366 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 13:12
I theorize that this happens when the error points to the end of the line. I assume that the difference between col_offset and PyUnicode_GET_LENGTH(line) may be still bigger than 1 if the line contains some Unicode stuff like with: >> "Ṕýţĥòñ" + but I think that would be the same situation.
msg371368 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2020-06-12 13:53
I'm using an ASAN UBSAN build of Python: $ make clean $ ./configure --with-address-sanitizer --with-undefined-behavior-sanitizer $ cat > asan-suppression.txt << EOF # Python/initconfig.c leak:_PyWideStringList_Copy EOF $ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" make $ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python -m test test_eof
msg371369 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 14:03
I cannot reproduce this on my Ubuntu 20.04: ➜ cpython git:(master) ✗ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python -m test -v test_eof Modules/posixmodule.c:14682:9: runtime error: left shift of 34 by 26 places cannot be represented in type 'int' == CPython 3.10.0a0 (heads/master:e2fb8a2c42, Jun 12 2020, 16:59:27) [GCC 9.3.0] == Linux-5.4.0-33-generic-x86_64-with-glibc2.31 little-endian == cwd: /home/lysnikolaou/repos/cpython/build/test_python_71258 == CPU count: 4 == encodings: locale=UTF-8, FS=utf-8 0:00:00 load avg: 2.11 Run tests sequentially 0:00:00 load avg: 2.11 [1/1] test_eof test_EOFC (test.test_eof.EOFTestCase) ... ok test_EOFS (test.test_eof.EOFTestCase) ... ok test_eof_with_line_continuation (test.test_eof.EOFTestCase) ... ok test_line_continuation_EOF (test.test_eof.EOFTestCase) A continuation at the end of input must be an error; bpo2180. ... ok test_line_continuation_EOF_from_file_bpo2180 (test.test_eof.EOFTestCase) Ensure tok_nextc() does not add too many ending newlines. ... Modules/posixmodule.c:14682:9: runtime error: left shift of 34 by 26 places cannot be represented in type 'int' ok ---------------------------------------------------------------------- Ran 5 tests in 0.413s OK == Tests result: SUCCESS == 1 test OK. Total duration: 654 ms Tests result: SUCCESS
msg371370 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 14:05
Running Pablo's examples in the REPL does not reproduce the errors either: ➜ cpython git:(master) ✗ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python Modules/posixmodule.c:14682:9: runtime error: left shift of 34 by 26 places cannot be represented in type 'int' Python 3.10.0a0 (heads/master:e2fb8a2c42, Jun 12 2020, 16:59:27) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "Ṕýţĥòñ" + File "<stdin>", line 1 "Ṕýţĥòñ" + ^ SyntaxError: invalid syntax >>> yield from File "<stdin>", line 1 yield from ^ SyntaxError: invalid syntax >>> ^D ➜ cpython git:(master) ✗
msg371371 - (view)	Author: Christian Heimes (christian.heimes) *	Date: 2020-06-12 14:07
Did you clean and compile with asan/ubsan? Older compilers may not detect the issue. My system has GCC 10: libubsan-10.1.1-1.fc32.x86_64 gcc-10.1.1-1.fc32.x86_64 libasan-10.1.1-1.fc32.x86_64
msg371372 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 14:12
Lysandros, are you using a debug build? The assert() won't trigger if you don't have run configure with --with-pydebug. I would recommend to: make distclean && ./configure --with-pydebug && make
msg371373 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 14:15
Let me try that. I'm currently using gcc 9.3.
msg371374 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 14:17
I now configured Python with ./configure --with-pydebug --with-address-sanitizer --with-undefined-behavior-sanitizer.
msg371375 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 14:18
> Lysandros, are you using a debug build? The assert() won't trigger if you don't have run configure with --with-pydebug. BTW I'm not talking about the assert not triggering. I'm only saying that ASAN/UBSAN do not report an error when running `yield from` in the REPL.
msg371376 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 14:32
Oh, I was confused by this: > Running Pablo's examples in the REPL does not reproduce the errors either: I thought you meant that you could not reproduce the crash also with my patch. For the ASAN, you may need the newer gcc and other stuff. I was not able to reproduce this on my work macOS but will try on my Arch Linux today at night that has the same versions as Christian
msg371378 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 14:36
Is the only way to get gcc-10.1 to build from source? Because currently my internet connection is too slow to get all of gcc downloaded and `apt install gcc-10` installs 10.0.1, which I'm getting totally unrelated erros with.
msg371379 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 14:41
> Is the only way to get gcc-10.1 to build from source? Because currently my internet connection is too slow to get all of gcc downloaded and `apt install gcc-10` installs 10.0.1, which I'm getting totally unrelated erros with. If your package manager does not have one package or you cannot find a PPA, you would need to do it from source or maybe use Docker with s distribution that has GCC 10 like Arch.
msg371381 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 15:04
> Note that although we could just exit if the length of the line is smaller than the column offset before calling https://github.com/python/cpython/blob/master/Parser/pegen.c#L148 (I assume that is the problem) is more important to understand how are we reaching that situation. That's because of https://github.com/python/cpython/blob/e2fb8a2c42ee60c72a40d93da69e9efc4e359023/Parser/pegen.c#L404, which decreases the size of the string to be decoded by 1 if the last character is a newline (or, equivalently, if the error offset points past the end of the line). Thus, PyUnicode_DecodeUTF8 returns an object that is one character shorter than the col_offset and that's how we get to the situation you mentioned.
msg371382 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 15:15
> returns an object that is one character shorter than the col_offset and that's how we get to the situation you mentioned. What happens with: >>> "Ṕýţĥòñ" +
msg371384 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 15:29
> What happens with: >>> "Ṕýţĥòñ" + Exact same thing. The offset is 16 at the start and gets decreased to 15 in the line I linked to in my previous post. And then col_offset gets converted to col_number which is 9, which seems correct. Although it is correct, it feels correct by accident, since the if-branch on https://github.com/python/cpython/blob/e2fb8a2c42ee60c72a40d93da69e9efc4e359023/Parser/pegen.c#L154 only evaluates to false due to the decrease of size and I don't know if that was intended.
msg371388 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 15:47
Well, I think the solution here is to clip the col_offset to the line length and maybe adding assert that the difference is not bigger than 1.
msg371389 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 16:01
I'm guessing that some parts of the conversion code were only there to circumvent issues in displaying the error messages that weren't pegen's fault. These were fixed by Guido in GH-20072, so I think we can delete some of them. For example, this patch works just fine for me. diff --git a/Parser/pegen.c b/Parser/pegen.c index e29910bf86..2c348178fb 100644 --- a/Parser/pegen.c +++ b/Parser/pegen.c @@ -150,10 +150,6 @@ byte_offset_to_character_offset(PyObject line, int col_offset) return 0; } Py_ssize_t size = PyUnicode_GET_LENGTH(text); - str = PyUnicode_AsUTF8(text); - if (str != NULL && (int)strlen(str) == col_offset) { - size = strlen(str); - } Py_DECREF(text); return size; } @@ -400,9 +396,6 @@ _PyPegen_raise_error_known_location(Parser p, PyObject *errtype, if (!error_line) { Py_ssize_t size = p->tok->inp - p->tok->buf; - if (size && p->tok->buf[size-1] == '\n') { - size--; - } error_line = PyUnicode_DecodeUTF8(p->tok->buf, size, "replace"); if (!error_line) { goto error;
msg371391 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 16:14
Can you run the test suite before/after of pyflakes to make sure we don't introduce any regression to double check? If everything looks fine, open a PR :)
msg371393 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 16:28
> Can you run the test suite before/after of pyflakes to make sure we don't introduce any regression to double check? If everything looks fine, open a PR :) The exact same errors before and after! I'll wait on the PR though till someone has checked if this indeed solves the heap-buffer-overflow. I'm not at all confident that is does and I can't test.
msg371395 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 16:41
Ok, I was able to reproduce: ❯ gcc --version gcc (GCC) 10.1.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. make distclean ./configure --with-address-sanitizer --with-undefined-behavior-sanitizer LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" make -j ❯ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python -m test test_eof 0:00:00 load avg: 1.82 Run tests sequentially 0:00:00 load avg: 1.82 [1/1] test_eof test test_eof failed -- Traceback (most recent call last): File "/home/pablogsal/github/python/master/Lib/test/test_eof.py", line 54, in test_line_continuation_EOF_from_file_bpo2180 self.assertIn(b'unexpected EOF while parsing', err) AssertionError: b'unexpected EOF while parsing' not found in b'Parser/tokenizer.c:978:50: runtime error: pointer index expression with base 0x625000016900 overflowed to 0xbebebebebebee6be\n===================== ============================================\n==27549==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x606000027c51 at pc 0x5612210ca7d4 bp 0x7fffe6e9ff70 sp 0x7fffe6e9ff60\nREAD of size 1 at 0x60600 0027c51 thread T0\n #0 0x5612210ca7d3 in ascii_decode Objects/unicodeobject.c:4941\n #1 0x5612211c9f4a in unicode_decode_utf8 Objects/unicodeobject.c:4999\n #2 0x5612219201bd in byte_offset_to_characte r_offset Parser/pegen.c:148\n #3 0x5612219201bd in _PyPegen_raise_error_known_location Parser/pegen.c:412\n #4 0x561221920e4d in _PyPegen_raise_error Parser/pegen.c:373\n #5 0x561221924981 in tokenizer _error Parser/pegen.c:321\n #6 0x561221924981 in _PyPegen_fill_token Parser/pegen.c:638\n #7 0x56122192777f in _PyPegen_expect_token Parser/pegen.c:753\n #8 0x56122193817a in _tmp_15_rule Parser/parser .c:16184\n #9 0x5612219274f9 in _PyPegen_lookahead (/home/pablogsal/github/python/master/python+0x1c344f9)\n #10 0x56122199de2c in compound_stmt_rule Parser/parser.c:1860\n #11 0x5612219a65c2 in statem ent_rule Parser/parser.c:1224\n #12 0x5612219a65c2 in _loop1_11_rule Parser/parser.c:15954\n #13 0x5612219a65c2 in statements_rule Parser/parser.c:1183\n #14 0x5612219aa4b7 in file_rule Parser/parser.c :716\n #15 0x5612219aa4b7 in _PyPegen_parse Parser/parser.c:24401\n #16 0x56122192a768 in _PyPegen_run_parser Parser/pegen.c:1077\n #17 0x56122192b3ef in _PyPegen_run_parser_from_file_pointer Parser/pe gen.c:1137\n #18 0x5612213da3c6 in PyRun_FileExFlags Python/pythonrun.c:1057\n #19 0x5612213da72c in PyRun_SimpleFileExFlags Python/pythonrun.c:400\n #20 0x561220df0dbb in pymain_run_file Modules/main. c:369\n #21 0x561220df0dbb in pymain_run_python Modules/main.c:553\n #22 0x561220df3154 in Py_RunMain Modules/main.c:632\n #23 0x561220df3154 in pymain_main Modules/main.c:662\n #24 0x561220df3154 i n Py_BytesMain Modules/main.c:686\n #25 0x7f981bf9a001 in __libc_start_main (/usr/lib/libc.so.6+0x27001)\n #26 0x561220ded48d in _start (/home/pablogsal/github/python/master/python+0x10fa48d)\n\n0x6060000 27c51 is located 0 bytes to the right of 49-byte region [0x606000027c20,0x606000027c51)\nallocated by thread T0 here:\n #0 0x7f981ccce459 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_mal loc_linux.cpp:145\n #1 0x5612210dfa1d in PyUnicode_New Objects/unicodeobject.c:1437\n #2 0x56122121324b in _PyUnicode_Init Objects/unicodeobject.c:15535\n #3 0x5612213ae5c3 in pycore_init_types Python/ pylifecycle.c:599\n #4 0x5612213ae5c3 in pycore_interp_init Python/pylifecycle.c:724\n #5 0x5612213b8b4b in pyinit_config Python/pylifecycle.c:765\n #6 0x5612213b8b4b in pyinit_core Python/pylifecycle. c:926\n #7 0x5612213bab6c in Py_InitializeFromConfig Python/pylifecycle.c:1136\n #8 0x561220ded752 in pymain_init Modules/main.c:66\n #9 0x561220df310a in pymain_main Modules/main.c:653\n #10 0x5612 20df310a in Py_BytesMain Modules/main.c:686\n #11 0x7f981bf9a001 in __libc_start_main (/usr/lib/libc.so.6+0x27001)\n\nSUMMARY: AddressSanitizer: heap-buffer-overflow Objects/unicodeobject.c:4941 in ascii_dec ode\nShadow bytes around the buggy address:\n 0x0c0c7fffcf30: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00\n 0x0c0c7fffcf40: 00 00 00 07 fa fa fa fa 00 00 00 00 00 00 00 00\n 0x0c0c7fffcf50: fa fa fa fa 0 0 00 00 00 00 00 00 05 fa fa fa fa\n 0x0c0c7fffcf60: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00\n 0x0c0c7fffcf70: 00 00 00 00 fa fa fa fa 00 00 00 00 00 00 00 01\n=>0x0c0c7fffcf80: fa fa fa fa 00 00 00 0 0 00 00[01]fa fa fa fa fa\n 0x0c0c7fffcf90: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00\n 0x0c0c7fffcfa0: 00 00 05 fa fa fa fa fa 00 00 00 00 00 00 00 fa\n 0x0c0c7fffcfb0: fa fa fa fa 00 00 00 00 00 00 0 0 00 fa fa fa fa\n 0x0c0c7fffcfc0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd\n 0x0c0c7fffcfd0: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 fa\nShadow byte legend (one shadow byte represents 8 applicati on bytes):\n Addressable: 00\n Partially addressable: 01 02 03 04 05 06 07 \n Heap left redzone: fa\n Freed heap region: fd\n Stack left redzone: f1\n Stack mid redzone: f 2\n Stack right redzone: f3\n Stack after return: f5\n Stack use after scope: f8\n Global redzone: f9\n Global init order: f6\n Poisoned by user: f7\n Container overflow: fc\n Array cookie: ac\n Intra object redzone: bb\n ASan internal: fe\n Left alloca redzone: ca\n Right alloca redzone: cb\n Shadow gap: cc\n==27549==ABORT ING\n' test_eof failed == Tests result: FAILURE == 1 test failed: test_eof Total duration: 359 ms Tests result: FAILURE ---------- With this patch diff --git a/Parser/pegen.c b/Parser/pegen.c index e29910bf86..a9f24ca5fa 100644 --- a/Parser/pegen.c +++ b/Parser/pegen.c @@ -145,15 +145,15 @@ byte_offset_to_character_offset(PyObject line, int col_offset) if (!str) { return 0; } + Py_ssize_t linesize = PyUnicode_GET_LENGTH(line); + if (col_offset > linesize) { + col_offset = (int)linesize; + } PyObject text = PyUnicode_DecodeUTF8(str, col_offset, "replace"); if (!text) { return 0; } Py_ssize_t size = PyUnicode_GET_LENGTH(text); - str = PyUnicode_AsUTF8(text); - if (str != NULL && (int)strlen(str) == col_offset) { - size = strlen(str); - } Py_DECREF(text); return size; } @@ -400,9 +400,6 @@ _PyPegen_raise_error_known_location(Parser p, PyObject errtype, if (!error_line) { Py_ssize_t size = p->tok->inp - p->tok->buf; - if (size && p->tok->buf[size-1] == '\n') { - size--; - } error_line = PyUnicode_DecodeUTF8(p->tok->buf, size, "replace"); if (!error_line) { goto error; ❯ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python -m test test_eof 0:00:00 load avg: 1.39 Run tests sequentially 0:00:00 load avg: 1.39 [1/1] test_eof == Tests result: SUCCESS == 1 test OK. Total duration: 500 ms Tests result: SUCCESS
msg371399 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-12 17:28
> + Py_ssize_t linesize = PyUnicode_GET_LENGTH(line); This line is wrong though, since PyUnicode_GET_LENGTH returns the length in code points and PyUnicode_DecodeUTF8 expects the number of bytes. For non-ascii input this would push the caret further to the left. For example: $ ./python.exe Python 3.10.0a0 (heads/master-dirty:e2fb8a2c42, Jun 12 2020, 20:22:52) [Clang 11.0.0 (clang-1100.0.33.8)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> "Ṕýţĥòñ" + File "<stdin>", line 1 "Ṕýţĥòñ" + ^ SyntaxError: invalid syntax
msg371405 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-12 18:50
> This line is wrong though, since PyUnicode_GET_LENGTH returns the length in code points Whoops! Thanks for pointing that out. In any case, the patch still shows that the ASAN error goes away by limiting the access to the string so I suppose that we could transform it into a proper patch.
msg371675 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-16 15:49
New changeset 51c5896b6205911d29ac07f167ec7f3cf1cb600d by Pablo Galindo in branch 'master': bpo-40958: Avoid buffer overflow in the parser when indexing the current line (GH-20875) https://github.com/python/cpython/commit/51c5896b6205911d29ac07f167ec7f3cf1cb600d
msg371686 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-16 17:37
New changeset 7795ae8f05a5b1134a576a372f64699176cac5fb by Miss Islington (bot) in branch '3.9': bpo-40958: Avoid buffer overflow in the parser when indexing the current line (GH-20875) (GH-20919) https://github.com/python/cpython/commit/7795ae8f05a5b1134a576a372f64699176cac5fb
msg371831 - (view)	Author: neonene (neonene) *	Date: 2020-06-18 18:32
FYI, since PR 20875/20919, msvc(x64) has warned C4244 (conversion from 'Py_ssize_t' to 'int', possible loss of data). parse.c especially gets more than 700.
msg371837 - (view)	Author: Pablo Galindo Salgado (pablogsal) *	Date: 2020-06-18 21:13
Hummmm, that is because now lineno, col_offset, end_lineno, end_col_offset are Py_ssize_t so we probably need to adapt the other assignments. WIll make a PR soon
msg371838 - (view)	Author: Lysandros Nikolaou (lys.nikolaou) *	Date: 2020-06-18 21:16
> WIll make a PR soon No need. Already got something ready.
msg371936 - (view)	Author: miss-islington (miss-islington)	Date: 2020-06-20 12:57
New changeset 861efc6e8fe7f030b1e193989b13287b31385939 by Lysandros Nikolaou in branch 'master': bpo-40958: Avoid 'possible loss of data' warning on Windows (GH-20970) https://github.com/python/cpython/commit/861efc6e8fe7f030b1e193989b13287b31385939
msg371945 - (view)	Author: miss-islington (miss-islington)	Date: 2020-06-20 17:35
New changeset c9f83c173b0cc62d6fcdc363e9ab05f6664ff8f3 by Miss Islington (bot) in branch '3.9': bpo-40958: Avoid 'possible loss of data' warning on Windows (GH-20970) https://github.com/python/cpython/commit/c9f83c173b0cc62d6fcdc363e9ab05f6664ff8f3

History
Date	User	Action	Args
2022-04-11 14:59:32	admin	set	github: 85130
2020-06-20 17:35:07	miss-islington	set	messages: + msg371945
2020-06-20 12:57:39	miss-islington	set	pull_requests: + pull_request20176
2020-06-20 12:57:30	miss-islington	set	nosy: + miss-islington messages: + msg371936
2020-06-19 15:11:05	neonene	set	nosy: + christian.heimes, - neonene
2020-06-18 21:45:53	lys.nikolaou	set	pull_requests: + pull_request20148
2020-06-18 21:19:09	lys.nikolaou	set	pull_requests: + pull_request20146
2020-06-18 21:16:33	lys.nikolaou	set	messages: + msg371838
2020-06-18 21:13:56	pablogsal	set	messages: + msg371837
2020-06-18 18:32:08	neonene	set	nosy: + neonene, - christian.heimes, miss-islington messages: + msg371831
2020-06-16 17:37:25	pablogsal	set	messages: + msg371686
2020-06-16 15:51:14	pablogsal	set	status: open -> closed resolution: fixed stage: patch review -> resolved
2020-06-16 15:49:57	miss-islington	set	nosy: + miss-islington pull_requests: + pull_request20098
2020-06-16 15:49:51	pablogsal	set	messages: + msg371675
2020-06-15 00:30:46	pablogsal	set	pull_requests: + pull_request20063
2020-06-12 21:18:12	pablogsal	set	keywords: + patch stage: patch review pull_requests: + pull_request20035
2020-06-12 18:50:14	pablogsal	set	messages: + msg371405
2020-06-12 17:28:31	lys.nikolaou	set	messages: + msg371399
2020-06-12 16:41:10	pablogsal	set	messages: + msg371395
2020-06-12 16:28:22	lys.nikolaou	set	messages: + msg371393
2020-06-12 16:14:55	pablogsal	set	messages: + msg371391
2020-06-12 16:01:55	lys.nikolaou	set	messages: + msg371389
2020-06-12 15:47:44	pablogsal	set	messages: + msg371388
2020-06-12 15:29:33	lys.nikolaou	set	messages: + msg371384
2020-06-12 15:15:38	pablogsal	set	messages: + msg371382
2020-06-12 15:04:54	lys.nikolaou	set	messages: + msg371381
2020-06-12 14:41:32	pablogsal	set	messages: + msg371379
2020-06-12 14:36:48	lys.nikolaou	set	messages: + msg371378
2020-06-12 14:32:04	pablogsal	set	messages: + msg371376
2020-06-12 14:18:14	lys.nikolaou	set	messages: + msg371375
2020-06-12 14:17:10	lys.nikolaou	set	messages: + msg371374
2020-06-12 14:15:51	lys.nikolaou	set	messages: + msg371373
2020-06-12 14:12:32	pablogsal	set	messages: + msg371372
2020-06-12 14:07:04	christian.heimes	set	messages: + msg371371
2020-06-12 14:05:52	lys.nikolaou	set	messages: + msg371370
2020-06-12 14:03:39	lys.nikolaou	set	messages: + msg371369
2020-06-12 13:53:26	christian.heimes	set	messages: + msg371368
2020-06-12 13:12:57	pablogsal	set	messages: + msg371366
2020-06-12 13:08:51	pablogsal	set	messages: + msg371365
2020-06-12 13:08:36	pablogsal	set	messages: - msg371364
2020-06-12 13:08:21	pablogsal	set	messages: + msg371364
2020-06-12 13:05:00	pablogsal	set	messages: + msg371363
2020-06-12 13:04:14	lys.nikolaou	set	messages: + msg371362
2020-06-12 12:56:09	pablogsal	set	messages: + msg371361
2020-06-12 12:51:59	lys.nikolaou	set	messages: + msg371360
2020-06-12 12:38:15	pablogsal	set	messages: + msg371359
2020-06-12 12:34:49	pablogsal	set	messages: + msg371358
2020-06-12 12:31:01	pablogsal	set	nosy: + lys.nikolaou
2020-06-12 10:42:31	christian.heimes	create