[fuzzer] Weird input with continuation and newlines causes null deref in parser #89571

ammaraskar · 2021-10-07T18:00:19Z

BPO	45408
Nosy	@gpshead, @ambv, @ammaraskar, @lysnikolaou, @pablogsal, @isidentical
PRs	bpo-45408: Don't override previous tokenizer errors in the second parser pass #28812 [3.10] bpo-45408: Don't override previous tokenizer errors in the second parser pass #28813

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-10-07.23:50:22.415>
created_at = <Date 2021-10-07.18:00:19.433>
labels = ['interpreter-core', 'release-blocker', '3.10', 'type-crash', '3.11']
title = '[fuzzer] Weird input with continuation and newlines causes null deref in parser'
updated_at = <Date 2021-10-07.23:50:22.415>
user = 'https://github.com/ammaraskar'

bugs.python.org fields:

activity = <Date 2021-10-07.23:50:22.415>
actor = 'pablogsal'
assignee = 'none'
closed = True
closed_date = <Date 2021-10-07.23:50:22.415>
closer = 'pablogsal'
components = ['Parser']
creation = <Date 2021-10-07.18:00:19.433>
creator = 'ammar2'
dependencies = []
files = []
hgrepos = []
issue_num = 45408
keywords = ['patch']
message_count = 6.0
messages = ['403427', '403429', '403439', '403441', '403444', '403449']
nosy_count = 6.0
nosy_names = ['gregory.p.smith', 'lukasz.langa', 'ammar2', 'lys.nikolaou', 'pablogsal', 'BTaskaya']
pr_nums = ['28812', '28813']
priority = 'release blocker'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'crash'
url = 'https://bugs.python.org/issue45408'
versions = ['Python 3.10', 'Python 3.11']

ammaraskar · 2021-10-07T18:00:19Z

From the newly added ast.literal_eval(x) fuzzer, the following string fed to ast.literal_eval will cause a null pointer in get_error_line:

\
(\
\

This can be recreated with:

❯ ./python          
Python 3.11.0a1+ (heads/fuzz_ast-dirty:6c942a86a4, Oct  6 2021, 16:27:52) [GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ast
>>> ast.literal_eval(r'''\
... \
... (\
... \ ''')
[1]    15464 segmentation fault  ./python

Raw ASAN report
---------------

==85015==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f987730e08c bp 0x7fff7f8e8080 sp 0x7fff7f8e7838 T0)

==85015==The signal is caused by a READ memory access.
==85015==Hint: address points to the zero page.
#0 0x7f987730e08c in strchr-avx2.S:57 /build/glibc-eX1tMB/glibc-2.31/sysdeps/x86_64/multiarch/strchr-avx2.S:57
#1 0x4d7a58 in strchr /src/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:0
#2 0x9f9d95 in get_error_line cpython3/Parser/pegen.c:406:25
#3 0x9f9d95 in _PyPegen_raise_error_known_location cpython3/Parser/pegen.c:497:26
#4 0x9fd492 in RAISE_ERROR_KNOWN_LOCATION cpython3/Parser/pegen.h:169:5
#5 0xa00528 in raise_unclosed_parentheses_error cpython3/Parser/pegen.c:267:8
#6 0xa00528 in _PyPegen_check_tokenizer_errors cpython3/Parser/pegen.c:1314:25
#7 0x9ff9e3 in _PyPegen_run_parser cpython3/Parser/pegen.c:1352:17
#8 0xa015c5 in _PyPegen_run_parser_from_string cpython3/Parser/pegen.c:1479:14
#9 0xa805e9 in _PyParser_ASTFromString cpython3/Parser/peg_api.c:14:21
#10 0x85f01a in Py_CompileStringObject cpython3/Python/pythonrun.c:1371:11
#11 0xc0785f in builtin_compile_impl cpython3/Python/bltinmodule.c:841:14
#12 0xc0785f in builtin_compile cpython3/Python/clinic/bltinmodule.c.h:249:20
#13 0xb7b28e in cfunction_vectorcall_FASTCALL_KEYWORDS cpython3/Objects/methodobject.c:446:24
#14 0x764f22 in call_function cpython3/Python/ceval.c:0
#15 0x7482e6 in _PyEval_EvalFrameDefault cpython3/Python/ceval.c:4614:19
#16 0x741225 in _PyEval_EvalFrame cpython3/Include/internal/pycore_ceval.h:46:12
#17 0x741225 in _PyEval_Vector cpython3/Python/ceval.c:5636:24
#18 0x57c510 in _PyFunction_Vectorcall cpython3/Objects/call.c:0
#19 0x764f22 in call_function cpython3/Python/ceval.c:0
#20 0x7482e6 in _PyEval_EvalFrameDefault cpython3/Python/ceval.c:4614:19
#21 0x741225 in _PyEval_EvalFrame cpython3/Include/internal/pycore_ceval.h:46:12
#22 0x741225 in _PyEval_Vector cpython3/Python/ceval.c:5636:24
#23 0x57c510 in _PyFunction_Vectorcall cpython3/Objects/call.c:0
#24 0x579def in _PyObject_VectorcallTstate /workspace/out/libfuzzer-address-x86_64/include/python3.11/cpython/abstract.h:114:11
#25 0x579def in PyObject_CallOneArg /workspace/out/libfuzzer-address-x86_64/include/python3.11/cpython/abstract.h:184:12
#26 0x579def in fuzz_ast_literal_eval cpython3/Modules/_xxtestfuzz/fuzzer.c:425:25
#27 0x579def in _run_fuzz cpython3/Modules/_xxtestfuzz/fuzzer.c:443:14
#28 0x579def in LLVMFuzzerTestOneInput cpython3/Modules/_xxtestfuzz/fuzzer.c:565:11
#29 0x4725e3 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) cxa_noexception.cpp:0
#30 0x45deb2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
#31 0x463965 in fuzzer::FuzzerDriver(int*, char***, int ()(unsigned char const, unsigned long)) cxa_noexception.cpp:0
#32 0x48c6b2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
#33 0x7f98771aa0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/libc-start.c:308:16
#34 0x43b16d in _start
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x18b08c)
==85015==ABORTING

gpshead · 2021-10-07T18:31:18Z

(unable to reproduce on 3.9)

ambv · 2021-10-07T19:33:37Z

Confirmed in 3.10 and 3.11:

>>> ast.literal_eval(r'''\
... \
... (\
... \ ''')
fish: Job 1, 'python' terminated by signal SIGSEGV (Address boundary error)

3.9 raises SyntaxError:

>>> ast.literal_eval(r'''
... \
... (\
... \ ''')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "ast.py", line 62, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
  File "<unknown>", line 4
    \
    ^
SyntaxError: unexpected character after line continuation character

gpshead · 2021-10-07T20:01:36Z

Marking release blocker as a crash is bad for a function that is documented as safe for use on untrusted input so long as it isn't large enough to overflow the stack.

https://docs.python.org/3/library/ast.html#ast.literal_eval

pablogsal · 2021-10-07T21:33:15Z

New changeset 0219017 by Pablo Galindo Salgado in branch 'main':
bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812)
0219017

pablogsal · 2021-10-07T23:50:18Z

New changeset 4ce55a2 by Pablo Galindo Salgado in branch '3.10':
[3.10] bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812). (GH-28813)
4ce55a2

ammaraskar added 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 7, 2021

gpshead added 3.9 only security fixes 3.10 only security fixes labels Oct 7, 2021

gpshead removed 3.9 only security fixes labels Oct 7, 2021

gpshead added release-blocker labels Oct 7, 2021

pablogsal closed this as completed Oct 7, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

vincedani mentioned this issue Feb 13, 2023

[fuzzer] Parser null deref with continuation characters and generator parenthesis error #89657

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fuzzer] Weird input with continuation and newlines causes null deref in parser #89571

[fuzzer] Weird input with continuation and newlines causes null deref in parser #89571

ammaraskar commented Oct 7, 2021

ammaraskar commented Oct 7, 2021

gpshead commented Oct 7, 2021

ambv commented Oct 7, 2021

gpshead commented Oct 7, 2021

pablogsal commented Oct 7, 2021

pablogsal commented Oct 7, 2021

[fuzzer] Weird input with continuation and newlines causes null deref in parser #89571

[fuzzer] Weird input with continuation and newlines causes null deref in parser #89571

Comments

ammaraskar commented Oct 7, 2021

ammaraskar commented Oct 7, 2021

gpshead commented Oct 7, 2021

ambv commented Oct 7, 2021

gpshead commented Oct 7, 2021

pablogsal commented Oct 7, 2021

pablogsal commented Oct 7, 2021