classification
Title: Crash in Tokenizer - Heap-use-after-free
Type: crash Stage:
Components: Interpreter Core Versions: Python 3.5
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: serhiy.storchaka Nosy List: William Bowling, serhiy.storchaka, swgillespie
Priority: high Keywords: patch

Created on 2016-01-03 13:50 by William Bowling, last changed 2016-02-21 22:27 by swgillespie.

Files
File name Uploaded Description Edit
asan2.txt William Bowling, 2016-01-04 03:02
tokenizer_double_free.patch swgillespie, 2016-02-21 22:27 review
Messages (4)
msg257417 - (view) Author: William Bowling (William Bowling) Date: 2016-01-03 13:50
Similar to https://bugs.python.org/issue25388 the following causes a crash on 3.5.1 and the latest 3.5 branch:

./python -c 'with open("vuln.py", "wb") as f: f.write(b"\x61\x73\x00\x0a\x79\x6e\x63\x5c\x0a\xef")'
./python vuln.py


Python 3.5.1+ (default, Jan  4 2016, 00:05:40) 
=================================================================
==24400==ERROR: AddressSanitizer: heap-use-after-free on address 0xf270f100 at pc 0x080ad09e bp 0xffef5ee8 sp 0xffef5ac0
READ of size 2 at 0xf270f100 thread T0
    #0 0x80ad09d in strncpy (/home/will/python/cpython/python+0x80ad09d)
    #1 0x8589b56 in parsetok /home/will/python/cpython/Parser/parsetok.c:235:13
    #2 0x858b301 in PyParser_ParseFileObject /home/will/python/cpython/Parser/parsetok.c:134:12
    #3 0x8439e0b in PyParser_ASTFromFileObject /home/will/python/cpython/Python/pythonrun.c:1150:15
    #4 0x843aa37 in PyRun_FileExFlags /home/will/python/cpython/Python/pythonrun.c:916:11
    #5 0x8438a98 in PyRun_SimpleFileExFlags /home/will/python/cpython/Python/pythonrun.c:396:13
    #6 0x84382a6 in PyRun_AnyFileExFlags /home/will/python/cpython/Python/pythonrun.c:80:16
    #7 0x813f194 in run_file /home/will/python/cpython/Modules/main.c:318:11
    #8 0x813f194 in Py_Main /home/will/python/cpython/Modules/main.c:768
    #9 0x8138070 in main /home/will/python/cpython/./Programs/python.c:69:11
    #10 0xf7558496 in __libc_start_main (/usr/lib32/libc.so.6+0x18496)
    #11 0x80715b7 in _start (/home/will/python/cpython/python+0x80715b7)

0xf270f100 is located 0 bytes inside of 8194-byte region [0xf270f100,0xf2711102)
freed by thread T0 here:
    #0 0x810c2a4 in __interceptor_cfree.localalias.1 (/home/will/python/cpython/python+0x810c2a4)
    #1 0x8139560 in _PyMem_RawFree /home/will/python/cpython/Objects/obmalloc.c:90:5
    #2 0x813852b in PyMem_Free /home/will/python/cpython/Objects/obmalloc.c:349:5
    #3 0x8596b05 in error_ret /home/will/python/cpython/Parser/tokenizer.c:198:9
    #4 0x8596b05 in decoding_fgets /home/will/python/cpython/Parser/tokenizer.c:636
    #5 0x8594df0 in tok_nextc /home/will/python/cpython/Parser/tokenizer.c:1016:21
    #6 0x858ebba in tok_get /home/will/python/cpython/Parser/tokenizer.c:1457:13
    #7 0x858fc79 in tok_get /home/will/python/cpython/Parser/tokenizer.c:1524:34
    #8 0x858e1da in PyTokenizer_Get /home/will/python/cpython/Parser/tokenizer.c:1804:18
    #9 0x85899a7 in parsetok /home/will/python/cpython/Parser/parsetok.c:208:16
    #10 0x858b301 in PyParser_ParseFileObject /home/will/python/cpython/Parser/parsetok.c:134:12
    #11 0x8439e0b in PyParser_ASTFromFileObject /home/will/python/cpython/Python/pythonrun.c:1150:15
    #12 0x843aa37 in PyRun_FileExFlags /home/will/python/cpython/Python/pythonrun.c:916:11
    #13 0x8438a98 in PyRun_SimpleFileExFlags /home/will/python/cpython/Python/pythonrun.c:396:13
    #14 0x84382a6 in PyRun_AnyFileExFlags /home/will/python/cpython/Python/pythonrun.c:80:16
    #15 0x813f194 in run_file /home/will/python/cpython/Modules/main.c:318:11
    #16 0x813f194 in Py_Main /home/will/python/cpython/Modules/main.c:768
    #17 0x8138070 in main /home/will/python/cpython/./Programs/python.c:69:11
    #18 0xf7558496 in __libc_start_main (/usr/lib32/libc.so.6+0x18496)

previously allocated by thread T0 here:
    #0 0x810c784 in realloc (/home/will/python/cpython/python+0x810c784)
    #1 0x8139541 in _PyMem_RawRealloc /home/will/python/cpython/Objects/obmalloc.c:84:12
    #2 0x8138506 in PyMem_Realloc /home/will/python/cpython/Objects/obmalloc.c:343:12
    #3 0x8594f1c in tok_nextc /home/will/python/cpython/Parser/tokenizer.c:1058:31
    #4 0x858e4c9 in tok_get /home/will/python/cpython/Parser/tokenizer.c:1354:17
    #5 0x858e1da in PyTokenizer_Get /home/will/python/cpython/Parser/tokenizer.c:1804:18
    #6 0x85899a7 in parsetok /home/will/python/cpython/Parser/parsetok.c:208:16
    #7 0x858b301 in PyParser_ParseFileObject /home/will/python/cpython/Parser/parsetok.c:134:12
    #8 0x8439e0b in PyParser_ASTFromFileObject /home/will/python/cpython/Python/pythonrun.c:1150:15
    #9 0x843aa37 in PyRun_FileExFlags /home/will/python/cpython/Python/pythonrun.c:916:11
    #10 0x8438a98 in PyRun_SimpleFileExFlags /home/will/python/cpython/Python/pythonrun.c:396:13
    #11 0x84382a6 in PyRun_AnyFileExFlags /home/will/python/cpython/Python/pythonrun.c:80:16
    #12 0x813f194 in run_file /home/will/python/cpython/Modules/main.c:318:11
    #13 0x813f194 in Py_Main /home/will/python/cpython/Modules/main.c:768
    #14 0x8138070 in main /home/will/python/cpython/./Programs/python.c:69:11
    #15 0xf7558496 in __libc_start_main (/usr/lib32/libc.so.6+0x18496)

SUMMARY: AddressSanitizer: heap-use-after-free (/home/will/python/cpython/python+0x80ad09d) in strncpy
Shadow bytes around the buggy address:
  0x3e4e1dd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3e4e1de0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3e4e1df0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3e4e1e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3e4e1e10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x3e4e1e20:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==24400==ABORTING
msg257439 - (view) Author: William Bowling (William Bowling) Date: 2016-01-04 03:02
Also a very similar source causes a slightly different crash (heap-buffer-overflow instead of heap-use-after-free):

./python -c 'with open("vuln2.py", "wb") as f: f.write(b"\x61\x73\x00\x0a\x79\x6e\x63\x5c\x0a\x00\x0d\xdd")'
./python vuln2.py

Python 3.5.1+ (default, Jan  4 2016, 00:05:40)

Attached the asan report
msg260583 - (view) Author: Sean Gillespie (swgillespie) Date: 2016-02-21 00:10
Is anyone currently working on this? If not, I'd like to try and fix this. I've debugged this a little and think I have an idea of what's going on.
msg260644 - (view) Author: Sean Gillespie (swgillespie) Date: 2016-02-21 22:27
Went ahead and did it since I had the time - the issue is that when doing a token of lookahead to see whether an 'async' at a top-level begins an 'async def' function or if it is an identifier. A shallow copy of the current token is made and given to another call to tok_get, which frees the token's buffer if a decoding error occurs. Since the shallow copy cloned the token's buffer pointer, the still-live token contains a freed pointer to its buffer that gets freed again later on.

By explicitly nulling-out the token's buffer pointer like tok_get does if the copied token's buffer pointer was nulled out, we avoid the double-free issue and present the correct syntax error:

$ ./python vuln.py 
  File "vuln.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xef' in file vuln.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

William Bowling's second program is also fixed with this change, with one additional wrinkle: if a token contains a null byte as the
first character, an invalid write occurs when we attempt to replace the null character with a newline. This fix checks to make sure
that this is not the case before performing the newline insertion.

With this change, both of William Bowling's programs pass valgrind and
present the appropriate syntax error. I tried to add this to the couroutine syntax tests, but any way to load the file outside of giving it to ./python itself fails (correctly) because the program contains a null byte.
History
Date User Action Args
2016-02-21 22:27:15swgillespiesetfiles: + tokenizer_double_free.patch
keywords: + patch
messages: + msg260644
2016-02-21 00:10:43swgillespiesetnosy: + swgillespie
messages: + msg260583
2016-01-04 03:02:20William Bowlingsetfiles: + asan2.txt

messages: + msg257439
2016-01-03 16:32:52serhiy.storchakasetpriority: normal -> high
assignee: serhiy.storchaka

nosy: + serhiy.storchaka
2016-01-03 13:50:57William Bowlingcreate