Issue40958
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2020-06-12 10:42 by christian.heimes, last changed 2022-04-11 14:59 by admin. This issue is now closed.
Pull Requests | |||
---|---|---|---|
URL | Status | Linked | Edit |
PR 20842 | closed | pablogsal, 2020-06-12 21:18 | |
PR 20875 | merged | pablogsal, 2020-06-15 00:30 | |
PR 20919 | merged | miss-islington, 2020-06-16 15:49 | |
PR 20968 | closed | lys.nikolaou, 2020-06-18 21:19 | |
PR 20970 | merged | lys.nikolaou, 2020-06-18 21:45 | |
PR 21001 | merged | miss-islington, 2020-06-20 12:57 |
Messages (37) | |||
---|---|---|---|
msg371351 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2020-06-12 10:42 | |
ASAN/UBSAN has detected a heap-buffer-overflow in pegen.c ==1625693==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x606000026b71 at pc 0x00000073574d bp 0x7fff297284f0 sp 0x7fff297284e0 READ of size 1 at 0x606000026b71 thread T0 #0 0x73574c in ascii_decode Objects/unicodeobject.c:4941 #1 0x82bd0f in unicode_decode_utf8 Objects/unicodeobject.c:4999 #2 0xf35859 in byte_offset_to_character_offset Parser/pegen.c:148 #3 0xf35859 in _PyPegen_raise_error_known_location Parser/pegen.c:412 #4 0xf36482 in _PyPegen_raise_error Parser/pegen.c:373 #5 0xf39e1d in tokenizer_error Parser/pegen.c:321 #6 0xf39e1d in _PyPegen_fill_token Parser/pegen.c:638 #7 0xf3ca0f in _PyPegen_expect_token Parser/pegen.c:753 #8 0xf4cc7a in _tmp_15_rule Parser/parser.c:16184 #9 0xf3c799 in _PyPegen_lookahead (/home/heimes/dev/python/cpython/python+0xf3c799) #10 0xfafb4a in compound_stmt_rule Parser/parser.c:1860 #11 0xfb7fc2 in statement_rule Parser/parser.c:1224 #12 0xfb7fc2 in _loop1_11_rule Parser/parser.c:15954 #13 0xfb7fc2 in statements_rule Parser/parser.c:1183 #14 0xfbbce7 in file_rule Parser/parser.c:716 #15 0xfbbce7 in _PyPegen_parse Parser/parser.c:24401 #16 0xf3f868 in _PyPegen_run_parser Parser/pegen.c:1077 #17 0xf4044f in _PyPegen_run_parser_from_file_pointer Parser/pegen.c:1137 #18 0xa27f36 in PyRun_FileExFlags Python/pythonrun.c:1057 #19 0xa2826a in PyRun_SimpleFileExFlags Python/pythonrun.c:400 #20 0x479b1b in pymain_run_file Modules/main.c:369 #21 0x479b1b in pymain_run_python Modules/main.c:553 #22 0x47bd59 in Py_RunMain Modules/main.c:632 #23 0x47bd59 in pymain_main Modules/main.c:662 #24 0x47bd59 in Py_BytesMain Modules/main.c:686 #25 0x7f59aa5cd041 in __libc_start_main (/lib64/libc.so.6+0x27041) #26 0x47643d in _start (/home/heimes/dev/python/cpython/python+0x47643d) 0x606000026b71 is located 0 bytes to the right of 49-byte region [0x606000026b40,0x606000026b71) allocated by thread T0 here: #0 0x7f59ab303667 in __interceptor_malloc (/lib64/libasan.so.6+0xb0667) #1 0x749c7d in PyUnicode_New Objects/unicodeobject.c:1437 #2 0x872f15 in _PyUnicode_Init Objects/unicodeobject.c:15535 #3 0x9fe0ab in pycore_init_types Python/pylifecycle.c:599 #4 0x9fe0ab in pycore_interp_init Python/pylifecycle.c:724 #5 0xa07c69 in pyinit_config Python/pylifecycle.c:765 #6 0xa07c69 in pyinit_core Python/pylifecycle.c:926 #7 0xa09b17 in Py_InitializeFromConfig Python/pylifecycle.c:1136 #8 0x4766c2 in pymain_init Modules/main.c:66 #9 0x47bd12 in pymain_main Modules/main.c:653 #10 0x47bd12 in Py_BytesMain Modules/main.c:686 #11 0x7f59aa5cd041 in __libc_start_main (/lib64/libc.so.6+0x27041) SUMMARY: AddressSanitizer: heap-buffer-overflow Objects/unicodeobject.c:4941 in ascii_decode Shadow bytes around the buggy address: 0x0c0c7fffcd10: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa 0x0c0c7fffcd20: 00 00 00 00 00 00 00 07 fa fa fa fa 00 00 00 00 0x0c0c7fffcd30: 00 00 00 00 fa fa fa fa 00 00 00 00 00 00 00 05 0x0c0c7fffcd40: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa 0x0c0c7fffcd50: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00 =>0x0c0c7fffcd60: 00 00 00 01 fa fa fa fa 00 00 00 00 00 00[01]fa 0x0c0c7fffcd70: fa fa fa fa 00 00 00 00 00 00 00 00 fa fa fa fa 0x0c0c7fffcd80: 00 00 00 00 00 00 05 fa fa fa fa fa 00 00 00 00 0x0c0c7fffcd90: 00 00 00 fa fa fa fa fa 00 00 00 00 00 00 00 00 0x0c0c7fffcda0: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa 0x0c0c7fffcdb0: fd fd fd fd fd fd fd fd fa fa fa fa 00 00 00 00 Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb Shadow gap: cc ==1625693==ABORTING |
|||
msg371358 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 12:34 | |
Lysandros, could you take a look? |
|||
msg371359 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 12:38 | |
Christian, what test are you running to get the error? |
|||
msg371360 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 12:51 | |
> Lysandros, could you take a look? Yup, I'm on it. |
|||
msg371361 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 12:56 | |
Note that although we could just exit if the length of the line is smaller than the column offset before calling https://github.com/python/cpython/blob/master/Parser/pegen.c#L148 (I assume that is the problem) is more important to understand how are we reaching that situation. |
|||
msg371362 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 13:04 | |
Agreed. For that I'd probably need the input Christian is getting the error with. I'm currently trying things out, but haven't gotten the error yet. |
|||
msg371363 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 13:05 | |
Seems that just this (at top level): >> yield from is able to produce a situation where the column offset is bigger than the line length: Col offset is 11 and line is 10 |
|||
msg371365 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 13:08 | |
Correcting my previous post (deleted): You can get an assert failure without ASAN with this patch: diff --git a/Parser/pegen.c b/Parser/pegen.c index e29910bf86..65fa44921f 100644 --- a/Parser/pegen.c +++ b/Parser/pegen.c @@ -145,6 +145,9 @@ byte_offset_to_character_offset(PyObject *line, int col_offset) if (!str) { return 0; } + assert(col_offset <= PyUnicode_GET_LENGTH(line)); PyObject *text = PyUnicode_DecodeUTF8(str, col_offset, "replace"); if (!text) { return 0; |
|||
msg371366 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 13:12 | |
I theorize that this happens when the error points to the end of the line. I assume that the difference between col_offset and PyUnicode_GET_LENGTH(line) may be still bigger than 1 if the line contains some Unicode stuff like with: >> "Ṕýţĥòñ" + but I think that would be the same situation. |
|||
msg371368 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2020-06-12 13:53 | |
I'm using an ASAN UBSAN build of Python: $ make clean $ ./configure --with-address-sanitizer --with-undefined-behavior-sanitizer $ cat > asan-suppression.txt << EOF # Python/initconfig.c leak:_PyWideStringList_Copy EOF $ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" make $ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python -m test test_eof |
|||
msg371369 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 14:03 | |
I cannot reproduce this on my Ubuntu 20.04: ➜ cpython git:(master) ✗ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python -m test -v test_eof Modules/posixmodule.c:14682:9: runtime error: left shift of 34 by 26 places cannot be represented in type 'int' == CPython 3.10.0a0 (heads/master:e2fb8a2c42, Jun 12 2020, 16:59:27) [GCC 9.3.0] == Linux-5.4.0-33-generic-x86_64-with-glibc2.31 little-endian == cwd: /home/lysnikolaou/repos/cpython/build/test_python_71258 == CPU count: 4 == encodings: locale=UTF-8, FS=utf-8 0:00:00 load avg: 2.11 Run tests sequentially 0:00:00 load avg: 2.11 [1/1] test_eof test_EOFC (test.test_eof.EOFTestCase) ... ok test_EOFS (test.test_eof.EOFTestCase) ... ok test_eof_with_line_continuation (test.test_eof.EOFTestCase) ... ok test_line_continuation_EOF (test.test_eof.EOFTestCase) A continuation at the end of input must be an error; bpo2180. ... ok test_line_continuation_EOF_from_file_bpo2180 (test.test_eof.EOFTestCase) Ensure tok_nextc() does not add too many ending newlines. ... Modules/posixmodule.c:14682:9: runtime error: left shift of 34 by 26 places cannot be represented in type 'int' ok ---------------------------------------------------------------------- Ran 5 tests in 0.413s OK == Tests result: SUCCESS == 1 test OK. Total duration: 654 ms Tests result: SUCCESS |
|||
msg371370 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 14:05 | |
Running Pablo's examples in the REPL does not reproduce the errors either: ➜ cpython git:(master) ✗ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python Modules/posixmodule.c:14682:9: runtime error: left shift of 34 by 26 places cannot be represented in type 'int' Python 3.10.0a0 (heads/master:e2fb8a2c42, Jun 12 2020, 16:59:27) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> "Ṕýţĥòñ" + File "<stdin>", line 1 "Ṕýţĥòñ" + ^ SyntaxError: invalid syntax >>> yield from File "<stdin>", line 1 yield from ^ SyntaxError: invalid syntax >>> ^D ➜ cpython git:(master) ✗ |
|||
msg371371 - (view) | Author: Christian Heimes (christian.heimes) * | Date: 2020-06-12 14:07 | |
Did you clean and compile with asan/ubsan? Older compilers may not detect the issue. My system has GCC 10: libubsan-10.1.1-1.fc32.x86_64 gcc-10.1.1-1.fc32.x86_64 libasan-10.1.1-1.fc32.x86_64 |
|||
msg371372 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 14:12 | |
Lysandros, are you using a debug build? The assert() won't trigger if you don't have run configure with --with-pydebug. I would recommend to: make distclean && ./configure --with-pydebug && make |
|||
msg371373 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 14:15 | |
Let me try that. I'm currently using gcc 9.3. |
|||
msg371374 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 14:17 | |
I now configured Python with ./configure --with-pydebug --with-address-sanitizer --with-undefined-behavior-sanitizer. |
|||
msg371375 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 14:18 | |
> Lysandros, are you using a debug build? The assert() won't trigger if you don't have run configure with --with-pydebug. BTW I'm not talking about the assert not triggering. I'm only saying that ASAN/UBSAN do not report an error when running `yield from` in the REPL. |
|||
msg371376 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 14:32 | |
Oh, I was confused by this: > Running Pablo's examples in the REPL does not reproduce the errors either: I thought you meant that you could not reproduce the crash also with my patch. For the ASAN, you may need the newer gcc and other stuff. I was not able to reproduce this on my work macOS but will try on my Arch Linux today at night that has the same versions as Christian |
|||
msg371378 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 14:36 | |
Is the only way to get gcc-10.1 to build from source? Because currently my internet connection is too slow to get all of gcc downloaded and `apt install gcc-10` installs 10.0.1, which I'm getting totally unrelated erros with. |
|||
msg371379 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 14:41 | |
> Is the only way to get gcc-10.1 to build from source? Because currently my internet connection is too slow to get all of gcc downloaded and `apt install gcc-10` installs 10.0.1, which I'm getting totally unrelated erros with. If your package manager does not have one package or you cannot find a PPA, you would need to do it from source or maybe use Docker with s distribution that has GCC 10 like Arch. |
|||
msg371381 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 15:04 | |
> Note that although we could just exit if the length of the line is smaller than the column offset before calling https://github.com/python/cpython/blob/master/Parser/pegen.c#L148 (I assume that is the problem) is more important to understand how are we reaching that situation. That's because of https://github.com/python/cpython/blob/e2fb8a2c42ee60c72a40d93da69e9efc4e359023/Parser/pegen.c#L404, which decreases the size of the string to be decoded by 1 if the last character is a newline (or, equivalently, if the error offset points past the end of the line). Thus, PyUnicode_DecodeUTF8 returns an object that is one character shorter than the col_offset and that's how we get to the situation you mentioned. |
|||
msg371382 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 15:15 | |
> returns an object that is one character shorter than the col_offset and that's how we get to the situation you mentioned. What happens with: >>> "Ṕýţĥòñ" + |
|||
msg371384 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 15:29 | |
> What happens with: >>> "Ṕýţĥòñ" + Exact same thing. The offset is 16 at the start and gets decreased to 15 in the line I linked to in my previous post. And then col_offset gets converted to col_number which is 9, which seems correct. Although it is correct, it feels correct by accident, since the if-branch on https://github.com/python/cpython/blob/e2fb8a2c42ee60c72a40d93da69e9efc4e359023/Parser/pegen.c#L154 only evaluates to false due to the decrease of size and I don't know if that was intended. |
|||
msg371388 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 15:47 | |
Well, I think the solution here is to clip the col_offset to the line length and maybe adding assert that the difference is not bigger than 1. |
|||
msg371389 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 16:01 | |
I'm guessing that some parts of the conversion code were only there to circumvent issues in displaying the error messages that weren't pegen's fault. These were fixed by Guido in GH-20072, so I think we can delete some of them. For example, this patch works just fine for me. diff --git a/Parser/pegen.c b/Parser/pegen.c index e29910bf86..2c348178fb 100644 --- a/Parser/pegen.c +++ b/Parser/pegen.c @@ -150,10 +150,6 @@ byte_offset_to_character_offset(PyObject *line, int col_offset) return 0; } Py_ssize_t size = PyUnicode_GET_LENGTH(text); - str = PyUnicode_AsUTF8(text); - if (str != NULL && (int)strlen(str) == col_offset) { - size = strlen(str); - } Py_DECREF(text); return size; } @@ -400,9 +396,6 @@ _PyPegen_raise_error_known_location(Parser *p, PyObject *errtype, if (!error_line) { Py_ssize_t size = p->tok->inp - p->tok->buf; - if (size && p->tok->buf[size-1] == '\n') { - size--; - } error_line = PyUnicode_DecodeUTF8(p->tok->buf, size, "replace"); if (!error_line) { goto error; |
|||
msg371391 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 16:14 | |
Can you run the test suite before/after of pyflakes to make sure we don't introduce any regression to double check? If everything looks fine, open a PR :) |
|||
msg371393 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 16:28 | |
> Can you run the test suite before/after of pyflakes to make sure we don't introduce any regression to double check? If everything looks fine, open a PR :) The exact same errors before and after! I'll wait on the PR though till someone has checked if this indeed solves the heap-buffer-overflow. I'm not at all confident that is does and I can't test. |
|||
msg371395 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 16:41 | |
Ok, I was able to reproduce: ❯ gcc --version gcc (GCC) 10.1.0 Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. make distclean ./configure --with-address-sanitizer --with-undefined-behavior-sanitizer LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" make -j ❯ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python -m test test_eof 0:00:00 load avg: 1.82 Run tests sequentially 0:00:00 load avg: 1.82 [1/1] test_eof test test_eof failed -- Traceback (most recent call last): File "/home/pablogsal/github/python/master/Lib/test/test_eof.py", line 54, in test_line_continuation_EOF_from_file_bpo2180 self.assertIn(b'unexpected EOF while parsing', err) AssertionError: b'unexpected EOF while parsing' not found in b'Parser/tokenizer.c:978:50: runtime error: pointer index expression with base 0x625000016900 overflowed to 0xbebebebebebee6be\n===================== ============================================\n==27549==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x606000027c51 at pc 0x5612210ca7d4 bp 0x7fffe6e9ff70 sp 0x7fffe6e9ff60\nREAD of size 1 at 0x60600 0027c51 thread T0\n #0 0x5612210ca7d3 in ascii_decode Objects/unicodeobject.c:4941\n #1 0x5612211c9f4a in unicode_decode_utf8 Objects/unicodeobject.c:4999\n #2 0x5612219201bd in byte_offset_to_characte r_offset Parser/pegen.c:148\n #3 0x5612219201bd in _PyPegen_raise_error_known_location Parser/pegen.c:412\n #4 0x561221920e4d in _PyPegen_raise_error Parser/pegen.c:373\n #5 0x561221924981 in tokenizer _error Parser/pegen.c:321\n #6 0x561221924981 in _PyPegen_fill_token Parser/pegen.c:638\n #7 0x56122192777f in _PyPegen_expect_token Parser/pegen.c:753\n #8 0x56122193817a in _tmp_15_rule Parser/parser .c:16184\n #9 0x5612219274f9 in _PyPegen_lookahead (/home/pablogsal/github/python/master/python+0x1c344f9)\n #10 0x56122199de2c in compound_stmt_rule Parser/parser.c:1860\n #11 0x5612219a65c2 in statem ent_rule Parser/parser.c:1224\n #12 0x5612219a65c2 in _loop1_11_rule Parser/parser.c:15954\n #13 0x5612219a65c2 in statements_rule Parser/parser.c:1183\n #14 0x5612219aa4b7 in file_rule Parser/parser.c :716\n #15 0x5612219aa4b7 in _PyPegen_parse Parser/parser.c:24401\n #16 0x56122192a768 in _PyPegen_run_parser Parser/pegen.c:1077\n #17 0x56122192b3ef in _PyPegen_run_parser_from_file_pointer Parser/pe gen.c:1137\n #18 0x5612213da3c6 in PyRun_FileExFlags Python/pythonrun.c:1057\n #19 0x5612213da72c in PyRun_SimpleFileExFlags Python/pythonrun.c:400\n #20 0x561220df0dbb in pymain_run_file Modules/main. c:369\n #21 0x561220df0dbb in pymain_run_python Modules/main.c:553\n #22 0x561220df3154 in Py_RunMain Modules/main.c:632\n #23 0x561220df3154 in pymain_main Modules/main.c:662\n #24 0x561220df3154 i n Py_BytesMain Modules/main.c:686\n #25 0x7f981bf9a001 in __libc_start_main (/usr/lib/libc.so.6+0x27001)\n #26 0x561220ded48d in _start (/home/pablogsal/github/python/master/python+0x10fa48d)\n\n0x6060000 27c51 is located 0 bytes to the right of 49-byte region [0x606000027c20,0x606000027c51)\nallocated by thread T0 here:\n #0 0x7f981ccce459 in __interceptor_malloc /build/gcc/src/gcc/libsanitizer/asan/asan_mal loc_linux.cpp:145\n #1 0x5612210dfa1d in PyUnicode_New Objects/unicodeobject.c:1437\n #2 0x56122121324b in _PyUnicode_Init Objects/unicodeobject.c:15535\n #3 0x5612213ae5c3 in pycore_init_types Python/ pylifecycle.c:599\n #4 0x5612213ae5c3 in pycore_interp_init Python/pylifecycle.c:724\n #5 0x5612213b8b4b in pyinit_config Python/pylifecycle.c:765\n #6 0x5612213b8b4b in pyinit_core Python/pylifecycle. c:926\n #7 0x5612213bab6c in Py_InitializeFromConfig Python/pylifecycle.c:1136\n #8 0x561220ded752 in pymain_init Modules/main.c:66\n #9 0x561220df310a in pymain_main Modules/main.c:653\n #10 0x5612 20df310a in Py_BytesMain Modules/main.c:686\n #11 0x7f981bf9a001 in __libc_start_main (/usr/lib/libc.so.6+0x27001)\n\nSUMMARY: AddressSanitizer: heap-buffer-overflow Objects/unicodeobject.c:4941 in ascii_dec ode\nShadow bytes around the buggy address:\n 0x0c0c7fffcf30: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00\n 0x0c0c7fffcf40: 00 00 00 07 fa fa fa fa 00 00 00 00 00 00 00 00\n 0x0c0c7fffcf50: fa fa fa fa 0 0 00 00 00 00 00 00 05 fa fa fa fa\n 0x0c0c7fffcf60: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00\n 0x0c0c7fffcf70: 00 00 00 00 fa fa fa fa 00 00 00 00 00 00 00 01\n=>0x0c0c7fffcf80: fa fa fa fa 00 00 00 0 0 00 00[01]fa fa fa fa fa\n 0x0c0c7fffcf90: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00\n 0x0c0c7fffcfa0: 00 00 05 fa fa fa fa fa 00 00 00 00 00 00 00 fa\n 0x0c0c7fffcfb0: fa fa fa fa 00 00 00 00 00 00 0 0 00 fa fa fa fa\n 0x0c0c7fffcfc0: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd\n 0x0c0c7fffcfd0: fd fd fd fd fa fa fa fa 00 00 00 00 00 00 00 fa\nShadow byte legend (one shadow byte represents 8 applicati on bytes):\n Addressable: 00\n Partially addressable: 01 02 03 04 05 06 07 \n Heap left redzone: fa\n Freed heap region: fd\n Stack left redzone: f1\n Stack mid redzone: f 2\n Stack right redzone: f3\n Stack after return: f5\n Stack use after scope: f8\n Global redzone: f9\n Global init order: f6\n Poisoned by user: f7\n Container overflow: fc\n Array cookie: ac\n Intra object redzone: bb\n ASan internal: fe\n Left alloca redzone: ca\n Right alloca redzone: cb\n Shadow gap: cc\n==27549==ABORT ING\n' test_eof failed == Tests result: FAILURE == 1 test failed: test_eof Total duration: 359 ms Tests result: FAILURE ---------- With this patch diff --git a/Parser/pegen.c b/Parser/pegen.c index e29910bf86..a9f24ca5fa 100644 --- a/Parser/pegen.c +++ b/Parser/pegen.c @@ -145,15 +145,15 @@ byte_offset_to_character_offset(PyObject *line, int col_offset) if (!str) { return 0; } + Py_ssize_t linesize = PyUnicode_GET_LENGTH(line); + if (col_offset > linesize) { + col_offset = (int)linesize; + } PyObject *text = PyUnicode_DecodeUTF8(str, col_offset, "replace"); if (!text) { return 0; } Py_ssize_t size = PyUnicode_GET_LENGTH(text); - str = PyUnicode_AsUTF8(text); - if (str != NULL && (int)strlen(str) == col_offset) { - size = strlen(str); - } Py_DECREF(text); return size; } @@ -400,9 +400,6 @@ _PyPegen_raise_error_known_location(Parser *p, PyObject *errtype, if (!error_line) { Py_ssize_t size = p->tok->inp - p->tok->buf; - if (size && p->tok->buf[size-1] == '\n') { - size--; - } error_line = PyUnicode_DecodeUTF8(p->tok->buf, size, "replace"); if (!error_line) { goto error; ❯ LSAN_OPTIONS="suppressions=asan-suppression.txt,print_suppressions=0" ./python -m test test_eof 0:00:00 load avg: 1.39 Run tests sequentially 0:00:00 load avg: 1.39 [1/1] test_eof == Tests result: SUCCESS == 1 test OK. Total duration: 500 ms Tests result: SUCCESS |
|||
msg371399 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-12 17:28 | |
> + Py_ssize_t linesize = PyUnicode_GET_LENGTH(line); This line is wrong though, since PyUnicode_GET_LENGTH returns the length in code points and PyUnicode_DecodeUTF8 expects the number of bytes. For non-ascii input this would push the caret further to the left. For example: $ ./python.exe Python 3.10.0a0 (heads/master-dirty:e2fb8a2c42, Jun 12 2020, 20:22:52) [Clang 11.0.0 (clang-1100.0.33.8)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> "Ṕýţĥòñ" + File "<stdin>", line 1 "Ṕýţĥòñ" + ^ SyntaxError: invalid syntax |
|||
msg371405 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-12 18:50 | |
> This line is wrong though, since PyUnicode_GET_LENGTH returns the length in code points Whoops! Thanks for pointing that out. In any case, the patch still shows that the ASAN error goes away by limiting the access to the string so I suppose that we could transform it into a proper patch. |
|||
msg371675 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-16 15:49 | |
New changeset 51c5896b6205911d29ac07f167ec7f3cf1cb600d by Pablo Galindo in branch 'master': bpo-40958: Avoid buffer overflow in the parser when indexing the current line (GH-20875) https://github.com/python/cpython/commit/51c5896b6205911d29ac07f167ec7f3cf1cb600d |
|||
msg371686 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-16 17:37 | |
New changeset 7795ae8f05a5b1134a576a372f64699176cac5fb by Miss Islington (bot) in branch '3.9': bpo-40958: Avoid buffer overflow in the parser when indexing the current line (GH-20875) (GH-20919) https://github.com/python/cpython/commit/7795ae8f05a5b1134a576a372f64699176cac5fb |
|||
msg371831 - (view) | Author: neonene (neonene) * | Date: 2020-06-18 18:32 | |
FYI, since PR 20875/20919, msvc(x64) has warned C4244 (conversion from 'Py_ssize_t' to 'int', possible loss of data). parse.c especially gets more than 700. |
|||
msg371837 - (view) | Author: Pablo Galindo Salgado (pablogsal) * | Date: 2020-06-18 21:13 | |
Hummmm, that is because now lineno, col_offset, end_lineno, end_col_offset are Py_ssize_t so we probably need to adapt the other assignments. WIll make a PR soon |
|||
msg371838 - (view) | Author: Lysandros Nikolaou (lys.nikolaou) * | Date: 2020-06-18 21:16 | |
> WIll make a PR soon No need. Already got something ready. |
|||
msg371936 - (view) | Author: miss-islington (miss-islington) | Date: 2020-06-20 12:57 | |
New changeset 861efc6e8fe7f030b1e193989b13287b31385939 by Lysandros Nikolaou in branch 'master': bpo-40958: Avoid 'possible loss of data' warning on Windows (GH-20970) https://github.com/python/cpython/commit/861efc6e8fe7f030b1e193989b13287b31385939 |
|||
msg371945 - (view) | Author: miss-islington (miss-islington) | Date: 2020-06-20 17:35 | |
New changeset c9f83c173b0cc62d6fcdc363e9ab05f6664ff8f3 by Miss Islington (bot) in branch '3.9': bpo-40958: Avoid 'possible loss of data' warning on Windows (GH-20970) https://github.com/python/cpython/commit/c9f83c173b0cc62d6fcdc363e9ab05f6664ff8f3 |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:59:32 | admin | set | github: 85130 |
2020-06-20 17:35:07 | miss-islington | set | messages: + msg371945 |
2020-06-20 12:57:39 | miss-islington | set | pull_requests: + pull_request20176 |
2020-06-20 12:57:30 | miss-islington | set | nosy:
+ miss-islington messages: + msg371936 |
2020-06-19 15:11:05 | neonene | set | nosy: + christian.heimes, - neonene |
2020-06-18 21:45:53 | lys.nikolaou | set | pull_requests: + pull_request20148 |
2020-06-18 21:19:09 | lys.nikolaou | set | pull_requests: + pull_request20146 |
2020-06-18 21:16:33 | lys.nikolaou | set | messages: + msg371838 |
2020-06-18 21:13:56 | pablogsal | set | messages: + msg371837 |
2020-06-18 18:32:08 | neonene | set | nosy:
+ neonene, - christian.heimes, miss-islington messages: + msg371831 |
2020-06-16 17:37:25 | pablogsal | set | messages: + msg371686 |
2020-06-16 15:51:14 | pablogsal | set | status: open -> closed resolution: fixed stage: patch review -> resolved |
2020-06-16 15:49:57 | miss-islington | set | nosy:
+ miss-islington pull_requests: + pull_request20098 |
2020-06-16 15:49:51 | pablogsal | set | messages: + msg371675 |
2020-06-15 00:30:46 | pablogsal | set | pull_requests: + pull_request20063 |
2020-06-12 21:18:12 | pablogsal | set | keywords:
+ patch stage: patch review pull_requests: + pull_request20035 |
2020-06-12 18:50:14 | pablogsal | set | messages: + msg371405 |
2020-06-12 17:28:31 | lys.nikolaou | set | messages: + msg371399 |
2020-06-12 16:41:10 | pablogsal | set | messages: + msg371395 |
2020-06-12 16:28:22 | lys.nikolaou | set | messages: + msg371393 |
2020-06-12 16:14:55 | pablogsal | set | messages: + msg371391 |
2020-06-12 16:01:55 | lys.nikolaou | set | messages: + msg371389 |
2020-06-12 15:47:44 | pablogsal | set | messages: + msg371388 |
2020-06-12 15:29:33 | lys.nikolaou | set | messages: + msg371384 |
2020-06-12 15:15:38 | pablogsal | set | messages: + msg371382 |
2020-06-12 15:04:54 | lys.nikolaou | set | messages: + msg371381 |
2020-06-12 14:41:32 | pablogsal | set | messages: + msg371379 |
2020-06-12 14:36:48 | lys.nikolaou | set | messages: + msg371378 |
2020-06-12 14:32:04 | pablogsal | set | messages: + msg371376 |
2020-06-12 14:18:14 | lys.nikolaou | set | messages: + msg371375 |
2020-06-12 14:17:10 | lys.nikolaou | set | messages: + msg371374 |
2020-06-12 14:15:51 | lys.nikolaou | set | messages: + msg371373 |
2020-06-12 14:12:32 | pablogsal | set | messages: + msg371372 |
2020-06-12 14:07:04 | christian.heimes | set | messages: + msg371371 |
2020-06-12 14:05:52 | lys.nikolaou | set | messages: + msg371370 |
2020-06-12 14:03:39 | lys.nikolaou | set | messages: + msg371369 |
2020-06-12 13:53:26 | christian.heimes | set | messages: + msg371368 |
2020-06-12 13:12:57 | pablogsal | set | messages: + msg371366 |
2020-06-12 13:08:51 | pablogsal | set | messages: + msg371365 |
2020-06-12 13:08:36 | pablogsal | set | messages: - msg371364 |
2020-06-12 13:08:21 | pablogsal | set | messages: + msg371364 |
2020-06-12 13:05:00 | pablogsal | set | messages: + msg371363 |
2020-06-12 13:04:14 | lys.nikolaou | set | messages: + msg371362 |
2020-06-12 12:56:09 | pablogsal | set | messages: + msg371361 |
2020-06-12 12:51:59 | lys.nikolaou | set | messages: + msg371360 |
2020-06-12 12:38:15 | pablogsal | set | messages: + msg371359 |
2020-06-12 12:34:49 | pablogsal | set | messages: + msg371358 |
2020-06-12 12:31:01 | pablogsal | set | nosy:
+ lys.nikolaou |
2020-06-12 10:42:31 | christian.heimes | create |