This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [fuzzer] Parser null deref with continuation characters and generator parenthesis error
Type: crash Stage: resolved
Components: Parser Versions: Python 3.11, Python 3.10, Python 3.9
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: ammar2, gregory.p.smith, lukasz.langa, lys.nikolaou, miss-islington, pablogsal
Priority: high Keywords: patch

Created on 2021-10-16 14:24 by ammar2, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 28993 merged pablogsal, 2021-10-16 16:31
PR 29070 merged lukasz.langa, 2021-10-19 20:05
PR 29071 merged lukasz.langa, 2021-10-19 20:19
PR 29108 merged pablogsal, 2021-10-20 17:09
PR 29672 merged miss-islington, 2021-11-20 17:41
Messages (11)
msg404082 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2021-10-16 14:24
Another parser crash found by the fuzzer:

"\
"(1for c in I,\
\

Recreator:

>>> import ast
>>> ast.literal_eval('"\\\n"(1for c in I,\\\n\\')
[1]    17916 segmentation fault  ./python

>>> import ast
>>> ast.literal_eval(r'''
... "\
... "(1for c in I,\
... \ ''')
[1]    17935 segmentation fault  ./python


-------------------
Raw ASAN stacktrace
-------------------

	==1668==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f4157e5e08c bp 0x7fffbd48b300 sp 0x7fffbd48aab8 T0)
==1668==The signal is caused by a READ memory access.
==1668==Hint: address points to the zero page.
    #0 0x7f4157e5e08c in strchr-avx2.S:57 /build/glibc-eX1tMB/glibc-2.31/sysdeps/x86_64/multiarch/strchr-avx2.S:57
    #1 0x4d7a88 in strchr /src/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:0
    #2 0x9fa6f5 in get_error_line cpython3/Parser/pegen.c:406:25
    #3 0x9fa6f5 in _PyPegen_raise_error_known_location cpython3/Parser/pegen.c:497:26
    #4 0xa18a92 in RAISE_ERROR_KNOWN_LOCATION cpython3/Parser/pegen.h:169:5
    #5 0xa331d5 in invalid_arguments_rule cpython3/Parser/parser.c:17831:20
    #6 0xa21a87 in arguments_rule cpython3/Parser/parser.c:15462:38
    #7 0xa2056b in primary_raw cpython3/Parser/parser.c:12867:18
    #8 0xa2056b in primary_rule cpython3/Parser/parser.c:12745:22
    #9 0xa1f9cd in await_primary_rule cpython3/Parser/parser.c:12700:28
    #10 0xa1f119 in power_rule cpython3/Parser/parser.c:12578:18
    #11 0xa1eabc in factor_rule cpython3/Parser/parser.c:12530:26
    #12 0xa1dc04 in term_raw cpython3/Parser/parser.c:12373:27
    #13 0xa1dc04 in term_rule cpython3/Parser/parser.c:12138:22
    #14 0xa1c899 in sum_raw cpython3/Parser/parser.c:12093:25
    #15 0xa1c899 in sum_rule cpython3/Parser/parser.c:11975:22
    #16 0xa1bb99 in shift_expr_raw cpython3/Parser/parser.c:11936:24
    #17 0xa1bb99 in shift_expr_rule cpython3/Parser/parser.c:11818:22
    #18 0xa1af2c in bitwise_and_raw cpython3/Parser/parser.c:11779:31
    #19 0xa1af2c in bitwise_and_rule cpython3/Parser/parser.c:11700:22
    #20 0xa1a49c in bitwise_xor_raw cpython3/Parser/parser.c:11661:32
    #21 0xa1a49c in bitwise_xor_rule cpython3/Parser/parser.c:11582:22
    #22 0xa1917c in bitwise_or_raw cpython3/Parser/parser.c:11543:32
    #23 0xa1917c in bitwise_or_rule cpython3/Parser/parser.c:11464:22
    #24 0xa2cd39 in comparison_rule cpython3/Parser/parser.c:10727:18
    #25 0xa2c912 in inversion_rule cpython3/Parser/parser.c:10680:31
    #26 0xa2b951 in conjunction_rule cpython3/Parser/parser.c:10559:18
    #27 0xa258e1 in disjunction_rule cpython3/Parser/parser.c:10473:18
    #28 0xa17cb1 in invalid_expression_rule cpython3/Parser/parser.c:18253:18
    #29 0xa17cb1 in expression_rule cpython3/Parser/parser.c:9754:39
    #30 0xa56979 in expressions_rule cpython3/Parser/parser.c:9628:18
    #31 0xa0acf5 in eval_rule cpython3/Parser/parser.c:1035:18
    #32 0xa0acf5 in _PyPegen_parse cpython3/Parser/parser.c:33076:18
    #33 0xa001a5 in _PyPegen_run_parser cpython3/Parser/pegen.c:1350:9
    #34 0xa01fa5 in _PyPegen_run_parser_from_string cpython3/Parser/pegen.c:1482:14
    #35 0xa80fc9 in _PyParser_ASTFromString cpython3/Parser/peg_api.c:14:21
    #36 0x8611ca in Py_CompileStringObject cpython3/Python/pythonrun.c:1371:11
    #37 0xc04a8f in builtin_compile_impl cpython3/Python/bltinmodule.c:842:14
    #38 0xc04a8f in builtin_compile cpython3/Python/clinic/bltinmodule.c.h:249:20
    #39 0xb78ade in cfunction_vectorcall_FASTCALL_KEYWORDS cpython3/Objects/methodobject.c:446:24
    #40 0x57c0ec in _PyObject_VectorcallTstate cpython3/Include/internal/pycore_call.h:89:11
    #41 0x57c0ec in PyObject_Vectorcall cpython3/Objects/call.c:298:12
    #42 0x766191 in call_function cpython3/Python/ceval.c:6619:13
    #43 0x748137 in _PyEval_EvalFrameDefault cpython3/Python/ceval.c:4734:19
    #44 0x741ae4 in _PyEval_EvalFrame cpython3/Include/internal/pycore_ceval.h:48:16
    #45 0x741ae4 in _PyEval_Vector cpython3/Python/ceval.c:5810:24
    #46 0x57cb50 in _PyFunction_Vectorcall cpython3/Objects/call.c:0
    #47 0x57c0ec in _PyObject_VectorcallTstate cpython3/Include/internal/pycore_call.h:89:11
    #48 0x57c0ec in PyObject_Vectorcall cpython3/Objects/call.c:298:12
    #49 0x766191 in call_function cpython3/Python/ceval.c:6619:13
    #50 0x748137 in _PyEval_EvalFrameDefault cpython3/Python/ceval.c:4734:19
    #51 0x741ae4 in _PyEval_EvalFrame cpython3/Include/internal/pycore_ceval.h:48:16
    #52 0x741ae4 in _PyEval_Vector cpython3/Python/ceval.c:5810:24
    #53 0x57cb50 in _PyFunction_Vectorcall cpython3/Objects/call.c:0
    #54 0x57c920 in _PyObject_VectorcallTstate cpython3/Include/internal/pycore_call.h:89:11
    #55 0x57c920 in PyObject_CallOneArg cpython3/Objects/call.c:375:12
    #56 0x579d18 in fuzz_ast_literal_eval cpython3/Modules/_xxtestfuzz/fuzzer.c:425:25
    #57 0x579d18 in _run_fuzz cpython3/Modules/_xxtestfuzz/fuzzer.c:443:14
    #58 0x579d18 in LLVMFuzzerTestOneInput cpython3/Modules/_xxtestfuzz/fuzzer.c:565:11
    #59 0x472623 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) cxa_noexception.cpp:0
    #60 0x45ded2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #61 0x463985 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) cxa_noexception.cpp:0
    #62 0x48c672 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #63 0x7f4157cfa0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/libc-start.c:308:16
    #64 0x43b16d in _start
msg404099 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-10-16 16:31
Presto!! PR 28993
msg404117 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-10-16 21:46
I confirmed that 3.9 does NOT seem to have the problem:

Python 3.9.5 (default, May 19 2021, 11:32:47) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> x = r'''
... "\
... "(1for c in I,\
... \ '''
>>> import ast
>>> ast.literal_eval(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.9/ast.py", line 62, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/lib/python3.9/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 3
    "\
      ^
SyntaxError: Generator expression must be parenthesized
msg404119 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-10-16 22:54
> I confirmed that 3.9 does NOT seem to have the problem:


It does, is just that is not a crash. The point where the error message point is totally wrong
msg404341 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-10-19 19:24
New changeset a106343f632a99c8ebb0136fa140cf189b4a6a57 by Pablo Galindo Salgado in branch 'main':
bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993)
https://github.com/python/cpython/commit/a106343f632a99c8ebb0136fa140cf189b4a6a57
msg404349 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-10-19 20:31
New changeset 5c9cab595e56aeb118bff77ece784dbac30b4338 by Łukasz Langa in branch '3.10':
[3.10] bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993) (GH-29070)
https://github.com/python/cpython/commit/5c9cab595e56aeb118bff77ece784dbac30b4338
msg404359 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-10-19 21:39
Note: this *does* fail on 3.9, too. Even if it doesn't crash the production build, it does fail an assertion in a pydebug build:


test_error_offset_continuation_characters (test.test_exceptions.ExceptionTests) ... Assertion failed: (!_PyErr_Occurred(tstate)), function _PyObject_Call, file Objects/call.c, line 261.
Fatal Python error: Aborted

Current thread 0x00000001184d1dc0 (most recent call first):
  File "/private/tmp/cpy/Lib/test/test_exceptions.py", line 187 in check
  File "/private/tmp/cpy/Lib/test/test_exceptions.py", line 198 in test_error_offset_continuation_characters
msg404494 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-10-20 16:51
New changeset 88f4ec88e282bf861f0af2d237e9fe28fbc8deac by Łukasz Langa in branch '3.9':
[3.9] bpo-45494: Fix parser crash when reporting errors involving invalid continuation characters (GH-28993) (#29071)
https://github.com/python/cpython/commit/88f4ec88e282bf861f0af2d237e9fe28fbc8deac
msg404497 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-10-20 16:53
Thanks for the fix, Pablo! ✨ 🍰 ✨
msg406677 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-11-20 17:41
New changeset 79ff0d1687e3f823fb121a19f0297ad052871b1b by Pablo Galindo Salgado in branch 'main':
bpo-45494: Fix error location in EOF tokenizer errors (GH-29108)
https://github.com/python/cpython/commit/79ff0d1687e3f823fb121a19f0297ad052871b1b
msg406678 - (view) Author: miss-islington (miss-islington) Date: 2021-11-20 17:59
New changeset a427eb862f11888fa69fee520eb8a20bd396fcdb by Miss Islington (bot) in branch '3.10':
bpo-45494: Fix error location in EOF tokenizer errors (GH-29108)
https://github.com/python/cpython/commit/a427eb862f11888fa69fee520eb8a20bd396fcdb
History
Date User Action Args
2022-04-11 14:59:51adminsetgithub: 89657
2021-11-20 17:59:41miss-islingtonsetmessages: + msg406678
2021-11-20 17:41:20miss-islingtonsetnosy: + miss-islington

pull_requests: + pull_request27911
2021-11-20 17:41:03pablogsalsetmessages: + msg406677
2021-10-20 17:09:33pablogsalsetpull_requests: + pull_request27376
2021-10-20 16:53:29lukasz.langasetstatus: open -> closed
resolution: fixed
messages: + msg404497

stage: patch review -> resolved
2021-10-20 16:51:18lukasz.langasetmessages: + msg404494
2021-10-19 21:39:29lukasz.langasetmessages: + msg404359
2021-10-19 20:31:22lukasz.langasetmessages: + msg404349
2021-10-19 20:19:42lukasz.langasetpull_requests: + pull_request27339
2021-10-19 20:05:33lukasz.langasetpull_requests: + pull_request27338
2021-10-19 19:24:20lukasz.langasetnosy: + lukasz.langa
messages: + msg404341
2021-10-17 03:19:27gregory.p.smithsetpriority: normal -> high
versions: + Python 3.9
2021-10-16 22:54:51pablogsalsetmessages: + msg404119
2021-10-16 21:46:33gregory.p.smithsetnosy: + gregory.p.smith

messages: + msg404117
versions: + Python 3.10
2021-10-16 16:31:46pablogsalsetmessages: + msg404099
2021-10-16 16:31:13pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request27276
2021-10-16 14:24:17ammar2create