This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

Author benjamin.peterson
Recipients benjamin.peterson
Date 2015-04-21.16:03:59
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1429632241.9.0.734206220056.issue24022@psf.upfronthosting.co.za>
In-reply-to
Content
Reported by "Hug Bounter" to security@

Hello,

I would like to report a heap corruption issue in Python/Parser/tokenizer.c:922, affecting latest Python 3.4.3 (from python.org) and also 2.7 ( tested 2.7.9-r1 on Gentoo ). The latest version available - 3.5.0a3 is also affected. It doesn't seem to affect 3.3 branch (tested with 3.3.5-r1 on Gentoo).
The issue occurs when a malformed python script is executed by python binary, which results in a out-of-bound read access of heap and therefore a segmentation fault.
I couldn't confirm nor deny its exploitability, to my knowledge this would be more of a infoleak, if anything. Nevertheless, as Google Project Zero proved many times, no heap corruption issue should be treated lightheartedly. :-) Hence the reason why I'm reporting it to security@python.org

I tried to dig into the details of the bug and I have to admit the defeat - the Python Parser is quite a complex beast...
What I was able to determine was that given malformed script (attached), the infinite 'for' loop defined in tokenizer.c:900 never reaches any of the exit conditions, which causes a infinite incrementation of *tok->cur and thus reading character by character of the heap, until the heap segment boundary is reached and segmentation fault occurrs.

There seem to be a race condition involved as well, as the malformed script does not always result in crash sometimes producing the error below:

./python ~/Fuzz/crashes/python_stuff/heap_pattern.py
  File "/home/user/Fuzz/crashes/python_stuff/heap_pattern.py", line 44
SyntaxError: Non-UTF-8 code starting with '\x9e' in file /home/user/Fuzz/crashes/python_stuff/heap_pattern.py on line 45, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

I acknowledge that attack scenario is somehow limited, because one has to be in a position to provide their own script for execution. Nevertheless, at the very least, a malicious user could crash python environment.


Depending on the particular script, ASAN detects either as a 'heap-use-after-free' or 'heap-buffer-overflow'.

HEAP-BUFFER-OVERFLOW according to asan:

$ ./python ~/heap3.py
=================================================================
==23461==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62500001e0ff at pc 0xc90075 bp 0x7ffe53018fd0 sp 0x7ffe53018fc0
READ of size 1 at 0x62500001e0ff thread T0
    #0 0xc90074 in tok_nextc Parser/tokenizer.c:1021
    #1 0xc9a6ef in tok_get Parser/tokenizer.c:1341
    #2 0xca0640 in PyTokenizer_Get Parser/tokenizer.c:1738
    #3 0xc81109 in parsetok Parser/parsetok.c:208
    #4 0xa0c449 in PyParser_ASTFromFileObject Python/pythonrun.c:2356
    #5 0xa0c449 in PyRun_FileExFlags Python/pythonrun.c:2126
    #6 0xa15f0b in PyRun_SimpleFileExFlags Python/pythonrun.c:1606
    #7 0x43a1aa in run_file Modules/main.c:319
    #8 0x43a1aa in Py_Main Modules/main.c:751
    #9 0x4234d3 in main Modules/python.c:69
    #10 0x7efcd1cf1f9f in __libc_start_main (/lib64/libc.so.6+0x1ff9f)
    #11 0x426a7c (/home/user/Fuzz/targets/Python-3.4.3_ASAN/python+0x426a7c)

0x62500001e0ff is located 1 bytes to the left of 8192-byte region [0x62500001e100,0x625000020100)
allocated by thread T0 here:
    #0 0x7efcd29eb7c7 in malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libasan.so.1+0x577c7)
    #1 0xc9997a in PyTokenizer_FromFile Parser/tokenizer.c:852

SUMMARY: AddressSanitizer: heap-buffer-overflow Parser/tokenizer.c:1021 tok_nextc
Shadow bytes around the buggy address:
  0x0c4a7fffbbc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbbd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbbe0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbbf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbc00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c4a7fffbc10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa[fa]
  0x0c4a7fffbc20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4a7fffbc30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4a7fffbc40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4a7fffbc50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c4a7fffbc60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==23461==ABORTING


Below is an example of ASAN detecting a 'use-after-free':

./python ~/heap4_asan.py
=================================================================
==23465==ERROR: AddressSanitizer: heap-use-after-free on address 0x62500001e101 at pc 0xc8f7c4 bp 0x7ffc35552000 sp 0x7ffc35551ff0
READ of size 1 at 0x62500001e101 thread T0
    #0 0xc8f7c3 in tok_nextc Parser/tokenizer.c:902
    #1 0xc9a96f in tok_get Parser/tokenizer.c:1429
    #2 0xca0640 in PyTokenizer_Get Parser/tokenizer.c:1738
    #3 0xc81109 in parsetok Parser/parsetok.c:208
    #4 0xa0c449 in PyParser_ASTFromFileObject Python/pythonrun.c:2356
    #5 0xa0c449 in PyRun_FileExFlags Python/pythonrun.c:2126
    #6 0xa15f0b in PyRun_SimpleFileExFlags Python/pythonrun.c:1606
    #7 0x43a1aa in run_file Modules/main.c:319
    #8 0x43a1aa in Py_Main Modules/main.c:751
    #9 0x4234d3 in main Modules/python.c:69
    #10 0x7f71d129ef9f in __libc_start_main (/lib64/libc.so.6+0x1ff9f)
    #11 0x426a7c (/home/user/Fuzz/targets/Python-3.4.3_ASAN/python+0x426a7c)

0x62500001e101 is located 1 bytes inside of 8192-byte region [0x62500001e100,0x625000020100)
freed by thread T0 here:
    #0 0x7f71d1f98aa6 in __interceptor_realloc (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libasan.so.1+0x57aa6)
    #1 0xc8edb1 in tok_nextc Parser/tokenizer.c:1041

previously allocated by thread T0 here:
    #0 0x7f71d1f987c7 in malloc (/usr/lib/gcc/x86_64-pc-linux-gnu/4.9.2/libasan.so.1+0x577c7)
    #1 0xc9997a in PyTokenizer_FromFile Parser/tokenizer.c:852

SUMMARY: AddressSanitizer: heap-use-after-free Parser/tokenizer.c:902 tok_nextc
Shadow bytes around the buggy address:
  0x0c4a7fffbbd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbbe0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbbf0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbc00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c4a7fffbc10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c4a7fffbc20:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a7fffbc30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a7fffbc40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a7fffbc50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a7fffbc60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4a7fffbc70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==23465==ABORTING

Without AddressSanitizer, this particular script does not crash, but causes one of two errors:

File "/home/user/heap4_asan.py", line 5
SyntaxError: Non-UTF-8 code starting with '\x9e' in file /home/user/heap4_asan.py on line 6, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

or:

File "/home/user/heap4_asan.py", line 5
SyntaxError: unknown decode error


In all cases, the crash always occurs in Parser/tokenizer.c at line no. 922, where *tok->curr is incremented, regardless where it currently points. Eventually, it will reach heap boundary and the *tok->cur++ will cause python to crash.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000573657 in tok_nextc (tok=tok@entry=0x8fb250) at Parser/tokenizer.c:922
922                 return Py_CHARMASK(*tok->cur++);


Sample GDB session can be found below:

$ gdb --args ./python ~/heap1.py
GNU gdb (Gentoo 7.9 vanilla) 7.9
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./python...done.
warning: File "/home/user/Fuzz/targets/Python-3.4.3/python-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
    add-auto-load-safe-path /home/user/Fuzz/targets/Python-3.4.3/python-gdb.py
line to your configuration file "/home/user/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "/home/user/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
gdb-peda$ r
Starting program: /home/user/Fuzz/targets/Python-3.4.3/python /home/user/heap1.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
[----------------------------------registers-----------------------------------]
RAX: 0x995001
RBX: 0x963c40 --> 0x0
RCX: 0x0
RDX: 0x27 ("'")
RSI: 0x0
RDI: 0x963c40 --> 0x0
RBP: 0x0
RSP: 0x7fffffffdf40 --> 0x7ffff6f14660 --> 0x0
RIP: 0x573657 (<tok_nextc+1367>:    movzx  eax,BYTE PTR [r12])
R8 : 0x1bdf0
R9 : 0x1bde0
R10: 0x1bdd0
R11: 0x4
R12: 0x995000
R13: 0x0
R14: 0x7fffffffe010 --> 0x0
R15: 0x0
EFLAGS: 0x10216 (carry PARITY ADJUST zero sign trap INTERRUPT direction overflow)
[-------------------------------------code-------------------------------------]
   0x57364a <tok_nextc+1354>:    mov    QWORD PTR [rbx+0x10],rax
   0x57364e <tok_nextc+1358>:    lea    rax,[r12+0x1]
   0x573653 <tok_nextc+1363>:    mov    QWORD PTR [rbx+0x8],rax
=> 0x573657 <tok_nextc+1367>:    movzx  eax,BYTE PTR [r12]
   0x57365c <tok_nextc+1372>:    add    rsp,0x18
   0x573660 <tok_nextc+1376>:    pop    rbx
   0x573661 <tok_nextc+1377>:    pop    rbp
   0x573662 <tok_nextc+1378>:    pop    r12
[------------------------------------stack-------------------------------------]
0000| 0x7fffffffdf40 --> 0x7ffff6f14660 --> 0x0
0008| 0x7fffffffdf48 --> 0x57107e (<PyNode_AddChild+318>:    mov    rsi,rax)
0016| 0x7fffffffdf50 --> 0x7ffff6f14660 --> 0x0
0024| 0x7fffffffdf58 --> 0x27 ("'")
0032| 0x7fffffffdf60 --> 0x963c40 --> 0x0
0040| 0x7fffffffdf68 --> 0x3
0048| 0x7fffffffdf70 --> 0x0
0056| 0x7fffffffdf78 --> 0x7fffffffe010 --> 0x0
[------------------------------------------------------------------------------]
Legend: code, data, rodata, value
Stopped reason: SIGSEGV
0x0000000000573657 in tok_nextc (tok=tok@entry=0x963c40)
    at Parser/tokenizer.c:922
922                return Py_CHARMASK(*tok->cur++);


Thank you for reading this.
Please let me know if you need more information.
History
Date User Action Args
2015-04-21 16:04:01benjamin.petersonsetrecipients: + benjamin.peterson
2015-04-21 16:04:01benjamin.petersonsetmessageid: <1429632241.9.0.734206220056.issue24022@psf.upfronthosting.co.za>
2015-04-21 16:04:01benjamin.petersonlinkissue24022 messages
2015-04-21 16:03:59benjamin.petersoncreate