Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python heap corruption issue #68210

Closed
benjaminp opened this issue Apr 21, 2015 · 5 comments
Closed

Python heap corruption issue #68210

benjaminp opened this issue Apr 21, 2015 · 5 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@benjaminp
Copy link
Contributor

BPO 24022
Nosy @benjaminp, @serhiy-storchaka
Files
  • f69354561u44075.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2016-09-19.06:44:28.053>
    created_at = <Date 2015-04-21.16:04:01.869>
    labels = ['interpreter-core', 'type-crash']
    title = 'Python heap corruption issue'
    updated_at = <Date 2016-09-19.06:44:28.051>
    user = 'https://github.com/benjaminp'

    bugs.python.org fields:

    activity = <Date 2016-09-19.06:44:28.051>
    actor = 'python-dev'
    assignee = 'none'
    closed = True
    closed_date = <Date 2016-09-19.06:44:28.053>
    closer = 'python-dev'
    components = ['Interpreter Core']
    creation = <Date 2015-04-21.16:04:01.869>
    creator = 'benjamin.peterson'
    dependencies = []
    files = ['39160']
    hgrepos = []
    issue_num = 24022
    keywords = []
    message_count = 4.0
    messages = ['241720', '241721', '241722', '276949']
    nosy_count = 3.0
    nosy_names = ['benjamin.peterson', 'python-dev', 'serhiy.storchaka']
    pr_nums = []
    priority = 'high'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'crash'
    url = 'https://bugs.python.org/issue24022'
    versions = ['Python 2.7', 'Python 3.4', 'Python 3.5']

    @benjaminp
    Copy link
    Contributor Author

    Reported by "Hug Bounter" to security@

    Hello,

    I would like to report a heap corruption issue in Python/Parser/tokenizer.c:922, affecting latest Python 3.4.3 (from python.org) and also 2.7 ( tested 2.7.9-r1 on Gentoo ). The latest version available - 3.5.0a3 is also affected. It doesn't seem to affect 3.3 branch (tested with 3.3.5-r1 on Gentoo).
    The issue occurs when a malformed python script is executed by python binary, which results in a out-of-bound read access of heap and therefore a segmentation fault.
    I couldn't confirm nor deny its exploitability, to my knowledge this would be more of a infoleak, if anything. Nevertheless, as Google Project Zero proved many times, no heap corruption issue should be treated lightheartedly. :-) Hence the reason why I'm reporting it to security@python.org

    I tried to dig into the details of the bug and I have to admit the defeat - the Python Parser is quite a complex beast...
    What I was able to determine was that given malformed script (attached), the infinite 'for' loop defined in tokenizer.c:900 never reaches any of the exit conditions, which causes a infinite incrementation of *tok->cur and thus reading character by character of the heap, until the heap segment boundary is reached and segmentation fault occurrs.

    There seem to be a race condition involved as well, as the malformed script does not always result in crash sometimes producing the error below:

    ./python ~/Fuzz/crashes/python_stuff/heap_pattern.py
    File "/home/user/Fuzz/crashes/python_stuff/heap_pattern.py", line 44
    SyntaxError: Non-UTF-8 code starting with '\x9e' in file /home/user/Fuzz/crashes/python_stuff/heap_pattern.py on line 45, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

    I acknowledge that attack scenario is somehow limited, because one has to be in a position to provide their own script for execution. Nevertheless, at the very least, a malicious user could crash python environment.

    Depending on the particular script, ASAN detects either as a 'heap-use-after-free' or 'heap-buffer-overflow'.

    HEAP-BUFFER-OVERFLOW according to asan:

    $ ./python ~/heap3.py

    =================================================================
    ==23461==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62500001e0ff at pc 0xc90075 bp 0x7ffe53018fd0 sp 0x7ffe53018fc0
    READ of size 1 at 0x62500001e0ff thread T0
    #0 0xc90074 in tok_nextc Parser/tokenizer.c:1021
    #1 0xc9a6ef in tok_get Parser/tokenizer.c:1341
    #2 0xca0640 in PyTokenizer_Get Parser/tokenizer.c:1738
    #3 0xc81109 in parsetok Parser/parsetok.c:208
    #4 0xa0c449 in PyParser_ASTFromFileObject Python/pythonrun.c:2356
    #5 0xa0c449 in PyRun_FileExFlags Python/pythonrun.c:2126
    #6 0xa15f0b in PyRun_SimpleFileExFlags Python/pythonrun.c:1606
    #7 0x43a1aa in run_file Modules/main.c:319
    #8 0x43a1aa in Py_Main Modules/main.c:751
    #9 0x4234d3 in main Modules/python.c:69
    #10 0x7efcd1cf1f9f in __libc_start_main (/lib64/libc.so.6+0x1ff9f)
    #11 0x426a7c (/home/user/Fuzz/targets/Python-3.4.3_ASAN/python+0x426a7c)

    0x62500001e0ff is located 1 bytes to the left of 8192-byte region [0x62500001e100,0x625000020100)
    allocated by thread T0 here:
    #0 0x7efcd29eb7c7 in malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libasan.so.1+0x577c7)
    #1 0xc9997a in PyTokenizer_FromFile Parser/tokenizer.c:852

    SUMMARY: AddressSanitizer: heap-buffer-overflow Parser/tokenizer.c:1021 tok_nextc
    Shadow bytes around the buggy address:
    0x0c4a7fffbbc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c4a7fffbbd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c4a7fffbbe0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c4a7fffbbf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c4a7fffbc00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    =>0x0c4a7fffbc10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
    0x0c4a7fffbc20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c4a7fffbc30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c4a7fffbc40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c4a7fffbc50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c4a7fffbc60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable: 00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone: fa
    Heap right redzone: fb
    Freed heap region: fd
    Stack left redzone: f1
    Stack mid redzone: f2
    Stack right redzone: f3
    Stack partial redzone: f4
    Stack after return: f5
    Stack use after scope: f8
    Global redzone: f9
    Global init order: f6
    Poisoned by user: f7
    Contiguous container OOB:fc
    ASan internal: fe
    ==23461==ABORTING

    Below is an example of ASAN detecting a 'use-after-free':

    ./python ~/heap4_asan.py
    =================================================================
    ==23465==ERROR: AddressSanitizer: heap-use-after-free on address 0x62500001e101 at pc 0xc8f7c4 bp 0x7ffc35552000 sp 0x7ffc35551ff0
    READ of size 1 at 0x62500001e101 thread T0
    #0 0xc8f7c3 in tok_nextc Parser/tokenizer.c:902
    #1 0xc9a96f in tok_get Parser/tokenizer.c:1429
    #2 0xca0640 in PyTokenizer_Get Parser/tokenizer.c:1738
    #3 0xc81109 in parsetok Parser/parsetok.c:208
    #4 0xa0c449 in PyParser_ASTFromFileObject Python/pythonrun.c:2356
    #5 0xa0c449 in PyRun_FileExFlags Python/pythonrun.c:2126
    #6 0xa15f0b in PyRun_SimpleFileExFlags Python/pythonrun.c:1606
    #7 0x43a1aa in run_file Modules/main.c:319
    #8 0x43a1aa in Py_Main Modules/main.c:751
    #9 0x4234d3 in main Modules/python.c:69
    #10 0x7f71d129ef9f in __libc_start_main (/lib64/libc.so.6+0x1ff9f)
    #11 0x426a7c (/home/user/Fuzz/targets/Python-3.4.3_ASAN/python+0x426a7c)

    0x62500001e101 is located 1 bytes inside of 8192-byte region [0x62500001e100,0x625000020100)
    freed by thread T0 here:
    #0 0x7f71d1f98aa6 in __interceptor_realloc (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libasan.so.1+0x57aa6)
    #1 0xc8edb1 in tok_nextc Parser/tokenizer.c:1041

    previously allocated by thread T0 here:
    #0 0x7f71d1f987c7 in malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libasan.so.1+0x577c7)
    #1 0xc9997a in PyTokenizer_FromFile Parser/tokenizer.c:852

    SUMMARY: AddressSanitizer: heap-use-after-free Parser/tokenizer.c:902 tok_nextc
    Shadow bytes around the buggy address:
    0x0c4a7fffbbd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c4a7fffbbe0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c4a7fffbbf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c4a7fffbc00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c4a7fffbc10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    =>0x0c4a7fffbc20:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4a7fffbc30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4a7fffbc40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4a7fffbc50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4a7fffbc60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4a7fffbc70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable: 00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone: fa
    Heap right redzone: fb
    Freed heap region: fd
    Stack left redzone: f1
    Stack mid redzone: f2
    Stack right redzone: f3
    Stack partial redzone: f4
    Stack after return: f5
    Stack use after scope: f8
    Global redzone: f9
    Global init order: f6
    Poisoned by user: f7
    Contiguous container OOB:fc
    ASan internal: fe
    ==23465==ABORTING

    Without AddressSanitizer, this particular script does not crash, but causes one of two errors:

    File "/home/user/heap4_asan.py", line 5
    SyntaxError: Non-UTF-8 code starting with '\x9e' in file /home/user/heap4_asan.py on line 6, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

    or:

    File "/home/user/heap4_asan.py", line 5
    SyntaxError: unknown decode error

    In all cases, the crash always occurs in Parser/tokenizer.c at line no. 922, where *tok->curr is incremented, regardless where it currently points. Eventually, it will reach heap boundary and the *tok->cur++ will cause python to crash.

    Program received signal SIGSEGV, Segmentation fault.
    0x0000000000573657 in tok_nextc (tok=tok@entry=0x8fb250) at Parser/tokenizer.c:922
    922 return Py_CHARMASK(*tok->cur++);

    Sample GDB session can be found below:

    $ gdb --args ./python ~/heap1.py
    GNU gdb (Gentoo 7.9 vanilla) 7.9
    Copyright (C) 2015 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-pc-linux-gnu".
    Type "show configuration" for configuration details.
    For bug reporting instructions, please see:
    <http://bugs.gentoo.org/>.
    Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".
    Type "apropos word" to search for commands related to "word"...
    Reading symbols from ./python...done.
    warning: File "/home/user/Fuzz/targets/Python-3.4.3/python-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
    To enable execution of this file add
        add-auto-load-safe-path /home/user/Fuzz/targets/Python-3.4.3/python-gdb.py
    line to your configuration file "/home/user/.gdbinit".
    To completely disable this security protection add
        set auto-load safe-path /
    line to your configuration file "/home/user/.gdbinit".
    For more information about this security protection see the
    "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
    gdb-peda$ r
    Starting program: /home/user/Fuzz/targets/Python-3.4.3/python /home/user/heap1.py
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".

    Program received signal SIGSEGV, Segmentation fault.
    [----------------------------------registers-----------------------------------]
    RAX: 0x995001
    RBX: 0x963c40 --> 0x0
    RCX: 0x0
    RDX: 0x27 ("'")
    RSI: 0x0
    RDI: 0x963c40 --> 0x0
    RBP: 0x0
    RSP: 0x7fffffffdf40 --> 0x7ffff6f14660 --> 0x0
    RIP: 0x573657 (<tok_nextc+1367>: movzx eax,BYTE PTR [r12])
    R8 : 0x1bdf0
    R9 : 0x1bde0
    R10: 0x1bdd0
    R11: 0x4
    R12: 0x995000
    R13: 0x0
    R14: 0x7fffffffe010 --> 0x0
    R15: 0x0
    EFLAGS: 0x10216 (carry PARITY ADJUST zero sign trap INTERRUPT direction overflow)
    [-------------------------------------code-------------------------------------]
    0x57364a <tok_nextc+1354>: mov QWORD PTR [rbx+0x10],rax
    0x57364e <tok_nextc+1358>: lea rax,[r12+0x1]
    0x573653 <tok_nextc+1363>: mov QWORD PTR [rbx+0x8],rax
    => 0x573657 <tok_nextc+1367>: movzx eax,BYTE PTR [r12]
    0x57365c <tok_nextc+1372>: add rsp,0x18
    0x573660 <tok_nextc+1376>: pop rbx
    0x573661 <tok_nextc+1377>: pop rbp
    0x573662 <tok_nextc+1378>: pop r12
    [------------------------------------stack-------------------------------------]
    0000| 0x7fffffffdf40 --> 0x7ffff6f14660 --> 0x0
    0008| 0x7fffffffdf48 --> 0x57107e (<PyNode_AddChild+318>: mov rsi,rax)
    0016| 0x7fffffffdf50 --> 0x7ffff6f14660 --> 0x0
    0024| 0x7fffffffdf58 --> 0x27 ("'")
    0032| 0x7fffffffdf60 --> 0x963c40 --> 0x0
    0040| 0x7fffffffdf68 --> 0x3
    0048| 0x7fffffffdf70 --> 0x0
    0056| 0x7fffffffdf78 --> 0x7fffffffe010 --> 0x0
    [------------------------------------------------------------------------------]
    Legend: code, data, rodata, value
    Stopped reason: SIGSEGV
    0x0000000000573657 in tok_nextc (tok=tok@entry=0x963c40)
    at Parser/tokenizer.c:922
    922 return Py_CHARMASK(*tok->cur++);

    Thank you for reading this.
    Please let me know if you need more information.

    @benjaminp benjaminp added interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump labels Apr 21, 2015
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Apr 21, 2015

    New changeset 414e08c478f4 by Benjamin Peterson in branch '3.4':
    do not call into python api if an exception is set (bpo-24022)
    https://hg.python.org/cpython/rev/414e08c478f4

    New changeset 03b2259c6cd3 by Benjamin Peterson in branch 'default':
    merge 3.4 (bpo-24022)
    https://hg.python.org/cpython/rev/03b2259c6cd3

    @serhiy-storchaka
    Copy link
    Member

    Where are test scripts?

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Sep 19, 2016

    New changeset ccfea26e6582 by Benjamin Peterson in branch '3.4':
    properly handle the single null-byte file (closes bpo-24022)
    https://hg.python.org/cpython/rev/ccfea26e6582

    New changeset c6438a3df7a4 by Benjamin Peterson in branch '2.7':
    properly handle the single null-byte file (closes bpo-24022)
    https://hg.python.org/cpython/rev/c6438a3df7a4

    New changeset d2f86d9c53b9 by Benjamin Peterson in branch '3.6':
    merge 3.5 (bpo-24022)
    https://hg.python.org/cpython/rev/d2f86d9c53b9

    @python-dev python-dev mannequin closed this as completed Sep 19, 2016
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @vincedani
    Copy link

    Hello @benjaminp, it looks like you are (or were) fuzzing this repository, and you’ve found some interesting bugs. 🥇

    I would like to create a Python based test case reduction test suite that contains fuzzer generated outputs, and benchmark automatic test case reducers how they perform on Python inputs. It looks like to me you have opened this issue with the already reduced input that caused malfunction. Is it possible that you still have the output of the fuzzer, which is free of any reduction?

    I’m also interested in this issue :

    with the same motivation.

    Thanks in advance,
    Daniel

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-crash A hard crash of the interpreter, possibly with a core dump
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants