Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fuzzer] Weird input with continuation and newlines causes null deref in parser #89571

Closed
ammaraskar opened this issue Oct 7, 2021 · 6 comments
Labels
3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@ammaraskar
Copy link
Member

BPO 45408
Nosy @gpshead, @ambv, @ammaraskar, @lysnikolaou, @pablogsal, @isidentical
PRs
  • bpo-45408: Don't override previous tokenizer errors in the second parser pass #28812
  • [3.10] bpo-45408: Don't override previous tokenizer errors in the second parser pass #28813
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-10-07.23:50:22.415>
    created_at = <Date 2021-10-07.18:00:19.433>
    labels = ['interpreter-core', 'release-blocker', '3.10', 'type-crash', '3.11']
    title = '[fuzzer] Weird input with continuation and newlines causes null deref in parser'
    updated_at = <Date 2021-10-07.23:50:22.415>
    user = 'https://github.com/ammaraskar'

    bugs.python.org fields:

    activity = <Date 2021-10-07.23:50:22.415>
    actor = 'pablogsal'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-10-07.23:50:22.415>
    closer = 'pablogsal'
    components = ['Parser']
    creation = <Date 2021-10-07.18:00:19.433>
    creator = 'ammar2'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 45408
    keywords = ['patch']
    message_count = 6.0
    messages = ['403427', '403429', '403439', '403441', '403444', '403449']
    nosy_count = 6.0
    nosy_names = ['gregory.p.smith', 'lukasz.langa', 'ammar2', 'lys.nikolaou', 'pablogsal', 'BTaskaya']
    pr_nums = ['28812', '28813']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue45408'
    versions = ['Python 3.10', 'Python 3.11']

    @ammaraskar
    Copy link
    Member Author

    From the newly added ast.literal_eval(x) fuzzer, the following string fed to ast.literal_eval will cause a null pointer in get_error_line:

    \
    (\
    \

    This can be recreated with:

    ❯ ./python          
    Python 3.11.0a1+ (heads/fuzz_ast-dirty:6c942a86a4, Oct  6 2021, 16:27:52) [GCC 8.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import ast
    >>> ast.literal_eval(r'''\
    ... \
    ... (\
    ... \ ''')
    [1]    15464 segmentation fault  ./python

    Raw ASAN report
    ---------------

    ==85015==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f987730e08c bp 0x7fff7f8e8080 sp 0x7fff7f8e7838 T0)
    

    ==85015==The signal is caused by a READ memory access.
    ==85015==Hint: address points to the zero page.
    #0 0x7f987730e08c in strchr-avx2.S:57 /build/glibc-eX1tMB/glibc-2.31/sysdeps/x86_64/multiarch/strchr-avx2.S:57
    #1 0x4d7a58 in strchr /src/llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_common_interceptors.inc:0
    #2 0x9f9d95 in get_error_line cpython3/Parser/pegen.c:406:25
    #3 0x9f9d95 in _PyPegen_raise_error_known_location cpython3/Parser/pegen.c:497:26
    #4 0x9fd492 in RAISE_ERROR_KNOWN_LOCATION cpython3/Parser/pegen.h:169:5
    #5 0xa00528 in raise_unclosed_parentheses_error cpython3/Parser/pegen.c:267:8
    #6 0xa00528 in _PyPegen_check_tokenizer_errors cpython3/Parser/pegen.c:1314:25
    #7 0x9ff9e3 in _PyPegen_run_parser cpython3/Parser/pegen.c:1352:17
    #8 0xa015c5 in _PyPegen_run_parser_from_string cpython3/Parser/pegen.c:1479:14
    #9 0xa805e9 in _PyParser_ASTFromString cpython3/Parser/peg_api.c:14:21
    #10 0x85f01a in Py_CompileStringObject cpython3/Python/pythonrun.c:1371:11
    #11 0xc0785f in builtin_compile_impl cpython3/Python/bltinmodule.c:841:14
    #12 0xc0785f in builtin_compile cpython3/Python/clinic/bltinmodule.c.h:249:20
    #13 0xb7b28e in cfunction_vectorcall_FASTCALL_KEYWORDS cpython3/Objects/methodobject.c:446:24
    #14 0x764f22 in call_function cpython3/Python/ceval.c:0
    #15 0x7482e6 in _PyEval_EvalFrameDefault cpython3/Python/ceval.c:4614:19
    #16 0x741225 in _PyEval_EvalFrame cpython3/Include/internal/pycore_ceval.h:46:12
    #17 0x741225 in _PyEval_Vector cpython3/Python/ceval.c:5636:24
    #18 0x57c510 in _PyFunction_Vectorcall cpython3/Objects/call.c:0
    #19 0x764f22 in call_function cpython3/Python/ceval.c:0
    #20 0x7482e6 in _PyEval_EvalFrameDefault cpython3/Python/ceval.c:4614:19
    #21 0x741225 in _PyEval_EvalFrame cpython3/Include/internal/pycore_ceval.h:46:12
    #22 0x741225 in _PyEval_Vector cpython3/Python/ceval.c:5636:24
    #23 0x57c510 in _PyFunction_Vectorcall cpython3/Objects/call.c:0
    #24 0x579def in _PyObject_VectorcallTstate /workspace/out/libfuzzer-address-x86_64/include/python3.11/cpython/abstract.h:114:11
    #25 0x579def in PyObject_CallOneArg /workspace/out/libfuzzer-address-x86_64/include/python3.11/cpython/abstract.h:184:12
    #26 0x579def in fuzz_ast_literal_eval cpython3/Modules/_xxtestfuzz/fuzzer.c:425:25
    #27 0x579def in _run_fuzz cpython3/Modules/_xxtestfuzz/fuzzer.c:443:14
    #28 0x579def in LLVMFuzzerTestOneInput cpython3/Modules/_xxtestfuzz/fuzzer.c:565:11
    #29 0x4725e3 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) cxa_noexception.cpp:0
    #30 0x45deb2 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:324:6
    #31 0x463965 in fuzzer::FuzzerDriver(int*, char***, int ()(unsigned char const, unsigned long)) cxa_noexception.cpp:0
    #32 0x48c6b2 in main /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerMain.cpp:20:10
    #33 0x7f98771aa0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/libc-start.c:308:16
    #34 0x43b16d in _start
    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (/lib/x86_64-linux-gnu/libc.so.6+0x18b08c)
    ==85015==ABORTING

    @ammaraskar ammaraskar added 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Oct 7, 2021
    @gpshead gpshead added 3.9 only security fixes 3.10 only security fixes labels Oct 7, 2021
    @gpshead
    Copy link
    Member

    gpshead commented Oct 7, 2021

    (unable to reproduce on 3.9)

    @gpshead gpshead removed 3.9 only security fixes labels Oct 7, 2021
    @ambv
    Copy link
    Contributor

    ambv commented Oct 7, 2021

    Confirmed in 3.10 and 3.11:

    >>> ast.literal_eval(r'''\
    ... \
    ... (\
    ... \ ''')
    fish: Job 1, 'python' terminated by signal SIGSEGV (Address boundary error)

    3.9 raises SyntaxError:

    >>> ast.literal_eval(r'''
    ... \
    ... (\
    ... \ ''')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "ast.py", line 62, in literal_eval
        node_or_string = parse(node_or_string, mode='eval')
      File "ast.py", line 50, in parse
        return compile(source, filename, mode, flags,
      File "<unknown>", line 4
        \
        ^
    SyntaxError: unexpected character after line continuation character

    @gpshead
    Copy link
    Member

    gpshead commented Oct 7, 2021

    Marking release blocker as a crash is bad for a function that is documented as safe for use on untrusted input so long as it isn't large enough to overflow the stack.

    https://docs.python.org/3/library/ast.html#ast.literal_eval

    @pablogsal
    Copy link
    Member

    New changeset 0219017 by Pablo Galindo Salgado in branch 'main':
    bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812)
    0219017

    @pablogsal
    Copy link
    Member

    New changeset 4ce55a2 by Pablo Galindo Salgado in branch '3.10':
    [3.10] bpo-45408: Don't override previous tokenizer errors in the second parser pass (GH-28812). (GH-28813)
    4ce55a2

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes 3.11 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) release-blocker type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants