This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: [fuzzer] Weird input with continuation and newlines causes null deref in parser
Type: crash Stage: resolved
Components: Parser Versions: Python 3.11, Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, ammar2, gregory.p.smith, lukasz.langa, lys.nikolaou, pablogsal
Priority: release blocker Keywords: patch

Created on 2021-10-07 18:00 by ammar2, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Pull Requests
URL Status Linked Edit
PR 28812 merged pablogsal, 2021-10-07 20:27
PR 28813 merged pablogsal, 2021-10-07 22:18
Messages (6)
msg403427 - (view) Author: Ammar Askar (ammar2) * (Python committer) Date: 2021-10-07 18:00
From the newly added ast.literal_eval(x) fuzzer, the following string fed to ast.literal_eval will cause a null pointer in get_error_line:

\
(\
\

This can be recreated with:

❯ ./python          
Python 3.11.0a1+ (heads/fuzz_ast-dirty:6c942a86a4, Oct  6 2021, 16:27:52) [GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ast
>>> ast.literal_eval(r'''\
... \
... (\
... \ ''')
[1]    15464 segmentation fault  ./python


---------------
Raw ASAN report
---------------

	==85015==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f987730e08c bp 0x7fff7f8e8080 sp 0x7fff7f8e7838 T0)
==85015==The signal is caused by a READ memory access.
==85015==Hint: address points to the zero page.
    #0 0x7f987730e08c in strchr-avx2.S:57 /build/glibc-eX1tMB/glibc-2.31/sysdeps/x86_64/multiarch/strchr-avx2.S:57
    #1 0x4d7a58 in strchr /src/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:0
    #2 0x9f9d95 in get_error_line cpython3/Parser/pegen.c:406:25
    #3 0x9f9d95 in _PyPegen_raise_error_known_location cpython3/Parser/pegen.c:497:26
    #4 0x9fd492 in RAISE_ERROR_KNOWN_LOCATION cpython3/Parser/pegen.h:169:5
    #5 0xa00528 in raise_unclosed_parentheses_error cpython3/Parser/pegen.c:267:8
    #6 0xa00528 in _PyPegen_check_tokenizer_errors cpython3/Parser/pegen.c:1314:25
    #7 0x9ff9e3 in _PyPegen_run_parser cpython3/Parser/pegen.c:1352:17
    #8 0xa015c5 in _PyPegen_run_parser_from_string cpython3/Parser/pegen.c:1479:14
    #9 0xa805e9 in _PyParser_ASTFromString cpython3/Parser/peg_api.c:14:21
    #10 0x85f01a in Py_CompileStringObject cpython3/Python/pythonrun.c:1371:11
    #11 0xc0785f in builtin_compile_impl cpython3/Python/bltinmodule.c:841:14
    #12 0xc0785f in builtin_compile cpython3/Python/clinic/bltinmodule.c.h:249:20
    #13 0xb7b28e in cfunction_vectorcall_FASTCALL_KEYWORDS cpython3/Objects/methodobject.c:446:24
    #14 0x764f22 in call_function cpython3/Python/ceval.c:0
    #15 0x7482e6 in _PyEval_EvalFrameDefault cpython3/Python/ceval.c:4614:19
    #16 0x741225 in _PyEval_EvalFrame cpython3/Include/internal/pycore_ceval.h:46:12
    #17 0x741225 in _PyEval_Vector cpython3/Python/ceval.c:5636:24
    #18 0x57c510 in _PyFunction_Vectorcall cpython3/Objects/call.c:0
    #19 0x764f22 in call_function cpython3/Python/ceval.c:0
    #20 0x7482e6 in _PyEval_EvalFrameDefault cpython3/Python/ceval.c:4614:19
    #21 0x741225 in _PyEval_EvalFrame cpython3/Include/internal/pycore_ceval.h:46:12
    #22 0x741225 in _PyEval_Vector cpython3/Python/ceval.c:5636:24
    #23 0x57c510 in _PyFunction_Vectorcall cpython3/Objects/call.c:0
    #24 0x579def in _PyObject_VectorcallTstate /workspace/out/libfuzzer-address-x86_64/include/python3.11/cpython/abstract.h:114:11
    #25 0x579def in PyObject_CallOneArg /workspace/out/libfuzzer-address-x86_64/include/python3.11/cpython/abstract.h:184:12
    #26 0x579def in fuzz_ast_literal_eval cpython3/Modules/_xxtestfuzz/fuzzer.c:425:25
    #27 0x579def in _run_fuzz cpython3/Modules/_xxtestfuzz/fuzzer.c:443:14
    #28 0x579def in LLVMFuzzerTestOneInput cpython3/Modules/_xxtestfuzz/fuzzer.c:565:11
    #29 0x4725e3 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) cxa_noexception.cpp:0
    #30 0x45deb2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #31 0x463965 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) cxa_noexception.cpp:0
    #32 0x48c6b2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #33 0x7f98771aa0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/libc-start.c:308:16
    #34 0x43b16d in _start
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x18b08c)
==85015==ABORTING
msg403429 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-10-07 18:31
(unable to reproduce on 3.9)
msg403439 - (view) Author: Łukasz Langa (lukasz.langa) * (Python committer) Date: 2021-10-07 19:33
Confirmed in 3.10 and 3.11:

>>> ast.literal_eval(r'''\
... \
... (\
... \ ''')
fish: Job 1, 'python' terminated by signal SIGSEGV (Address boundary error)

3.9 raises SyntaxError:

>>> ast.literal_eval(r'''
... \
... (\
... \ ''')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ast.py", line 62, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 4
    \
    ^
SyntaxError: unexpected character after line continuation character
msg403441 - (view) Author: Gregory P. Smith (gregory.p.smith) * (Python committer) Date: 2021-10-07 20:01
Marking release blocker as a crash is bad for a function that is documented as safe for use on untrusted input so long as it isn't large enough to overflow the stack.

https://docs.python.org/3/library/ast.html#ast.literal_eval
msg403444 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-10-07 21:33
New changeset 0219017df7ec41839fd0d56a3076b5f09c58d313 by Pablo Galindo Salgado in branch 'main':
bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812)
https://github.com/python/cpython/commit/0219017df7ec41839fd0d56a3076b5f09c58d313
msg403449 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-10-07 23:50
New changeset 4ce55a2353e07962280181df40af0135aef1cf51 by Pablo Galindo Salgado in branch '3.10':
[3.10] bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812). (GH-28813)
https://github.com/python/cpython/commit/4ce55a2353e07962280181df40af0135aef1cf51
History
Date User Action Args
2022-04-11 14:59:50adminsetgithub: 89571
2021-10-07 23:50:22pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-10-07 23:50:18pablogsalsetmessages: + msg403449
2021-10-07 22:18:13pablogsalsetpull_requests: + pull_request27132
2021-10-07 21:33:14pablogsalsetmessages: + msg403444
2021-10-07 20:27:23pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request27131
2021-10-07 20:01:36gregory.p.smithsetpriority: normal -> release blocker

messages: + msg403441
2021-10-07 19:33:37lukasz.langasetnosy: + lukasz.langa
messages: + msg403439
2021-10-07 19:16:46BTaskayasetnosy: + BTaskaya
2021-10-07 18:31:18gregory.p.smithsetnosy: + gregory.p.smith

messages: + msg403429
versions: - Python 3.9
2021-10-07 18:02:09gregory.p.smithsetversions: + Python 3.9, Python 3.10
2021-10-07 18:00:19ammar2create