This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: Crash in Tokenizer - Heap-use-after-free
Type: crash Stage: resolved
Components: Interpreter Core Versions: Python 3.5
process
Status: closed Resolution: duplicate
Dependencies: Superseder: Crashes with lines of the form "async \"
View: 31852
Assigned To: serhiy.storchaka Nosy List: William Bowling, serhiy.storchaka, swgillespie, xtreak
Priority: high Keywords: patch

Created on 2016-01-03 13:50 by William Bowling, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
asan2.txt William Bowling, 2016-01-04 03:02
tokenizer_double_free.patch swgillespie, 2016-02-21 22:27 review
Messages (8)
msg257417 - (view) Author: William Bowling (William Bowling) Date: 2016-01-03 13:50
Similar to https://bugs.python.org/issue25388 the following causes a crash on 3.5.1 and the latest 3.5 branch:

./python -c 'with open("vuln.py", "wb") as f: f.write(b"\x61\x73\x00\x0a\x79\x6e\x63\x5c\x0a\xef")'
./python vuln.py


Python 3.5.1+ (default, Jan  4 2016, 00:05:40) 
=================================================================
==24400==ERROR: AddressSanitizer: heap-use-after-free on address 0xf270f100 at pc 0x080ad09e bp 0xffef5ee8 sp 0xffef5ac0
READ of size 2 at 0xf270f100 thread T0
    #0 0x80ad09d in strncpy (/home/will/python/cpython/python+0x80ad09d)
    #1 0x8589b56 in parsetok /home/will/python/cpython/Parser/parsetok.c:235:13
    #2 0x858b301 in PyParser_ParseFileObject /home/will/python/cpython/Parser/parsetok.c:134:12
    #3 0x8439e0b in PyParser_ASTFromFileObject /home/will/python/cpython/Python/pythonrun.c:1150:15
    #4 0x843aa37 in PyRun_FileExFlags /home/will/python/cpython/Python/pythonrun.c:916:11
    #5 0x8438a98 in PyRun_SimpleFileExFlags /home/will/python/cpython/Python/pythonrun.c:396:13
    #6 0x84382a6 in PyRun_AnyFileExFlags /home/will/python/cpython/Python/pythonrun.c:80:16
    #7 0x813f194 in run_file /home/will/python/cpython/Modules/main.c:318:11
    #8 0x813f194 in Py_Main /home/will/python/cpython/Modules/main.c:768
    #9 0x8138070 in main /home/will/python/cpython/./Programs/python.c:69:11
    #10 0xf7558496 in __libc_start_main (/usr/lib32/libc.so.6+0x18496)
    #11 0x80715b7 in _start (/home/will/python/cpython/python+0x80715b7)

0xf270f100 is located 0 bytes inside of 8194-byte region [0xf270f100,0xf2711102)
freed by thread T0 here:
    #0 0x810c2a4 in __interceptor_cfree.localalias.1 (/home/will/python/cpython/python+0x810c2a4)
    #1 0x8139560 in _PyMem_RawFree /home/will/python/cpython/Objects/obmalloc.c:90:5
    #2 0x813852b in PyMem_Free /home/will/python/cpython/Objects/obmalloc.c:349:5
    #3 0x8596b05 in error_ret /home/will/python/cpython/Parser/tokenizer.c:198:9
    #4 0x8596b05 in decoding_fgets /home/will/python/cpython/Parser/tokenizer.c:636
    #5 0x8594df0 in tok_nextc /home/will/python/cpython/Parser/tokenizer.c:1016:21
    #6 0x858ebba in tok_get /home/will/python/cpython/Parser/tokenizer.c:1457:13
    #7 0x858fc79 in tok_get /home/will/python/cpython/Parser/tokenizer.c:1524:34
    #8 0x858e1da in PyTokenizer_Get /home/will/python/cpython/Parser/tokenizer.c:1804:18
    #9 0x85899a7 in parsetok /home/will/python/cpython/Parser/parsetok.c:208:16
    #10 0x858b301 in PyParser_ParseFileObject /home/will/python/cpython/Parser/parsetok.c:134:12
    #11 0x8439e0b in PyParser_ASTFromFileObject /home/will/python/cpython/Python/pythonrun.c:1150:15
    #12 0x843aa37 in PyRun_FileExFlags /home/will/python/cpython/Python/pythonrun.c:916:11
    #13 0x8438a98 in PyRun_SimpleFileExFlags /home/will/python/cpython/Python/pythonrun.c:396:13
    #14 0x84382a6 in PyRun_AnyFileExFlags /home/will/python/cpython/Python/pythonrun.c:80:16
    #15 0x813f194 in run_file /home/will/python/cpython/Modules/main.c:318:11
    #16 0x813f194 in Py_Main /home/will/python/cpython/Modules/main.c:768
    #17 0x8138070 in main /home/will/python/cpython/./Programs/python.c:69:11
    #18 0xf7558496 in __libc_start_main (/usr/lib32/libc.so.6+0x18496)

previously allocated by thread T0 here:
    #0 0x810c784 in realloc (/home/will/python/cpython/python+0x810c784)
    #1 0x8139541 in _PyMem_RawRealloc /home/will/python/cpython/Objects/obmalloc.c:84:12
    #2 0x8138506 in PyMem_Realloc /home/will/python/cpython/Objects/obmalloc.c:343:12
    #3 0x8594f1c in tok_nextc /home/will/python/cpython/Parser/tokenizer.c:1058:31
    #4 0x858e4c9 in tok_get /home/will/python/cpython/Parser/tokenizer.c:1354:17
    #5 0x858e1da in PyTokenizer_Get /home/will/python/cpython/Parser/tokenizer.c:1804:18
    #6 0x85899a7 in parsetok /home/will/python/cpython/Parser/parsetok.c:208:16
    #7 0x858b301 in PyParser_ParseFileObject /home/will/python/cpython/Parser/parsetok.c:134:12
    #8 0x8439e0b in PyParser_ASTFromFileObject /home/will/python/cpython/Python/pythonrun.c:1150:15
    #9 0x843aa37 in PyRun_FileExFlags /home/will/python/cpython/Python/pythonrun.c:916:11
    #10 0x8438a98 in PyRun_SimpleFileExFlags /home/will/python/cpython/Python/pythonrun.c:396:13
    #11 0x84382a6 in PyRun_AnyFileExFlags /home/will/python/cpython/Python/pythonrun.c:80:16
    #12 0x813f194 in run_file /home/will/python/cpython/Modules/main.c:318:11
    #13 0x813f194 in Py_Main /home/will/python/cpython/Modules/main.c:768
    #14 0x8138070 in main /home/will/python/cpython/./Programs/python.c:69:11
    #15 0xf7558496 in __libc_start_main (/usr/lib32/libc.so.6+0x18496)

SUMMARY: AddressSanitizer: heap-use-after-free (/home/will/python/cpython/python+0x80ad09d) in strncpy
Shadow bytes around the buggy address:
  0x3e4e1dd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3e4e1de0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3e4e1df0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3e4e1e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x3e4e1e10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x3e4e1e20:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e40: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x3e4e1e70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==24400==ABORTING
msg257439 - (view) Author: William Bowling (William Bowling) Date: 2016-01-04 03:02
Also a very similar source causes a slightly different crash (heap-buffer-overflow instead of heap-use-after-free):

./python -c 'with open("vuln2.py", "wb") as f: f.write(b"\x61\x73\x00\x0a\x79\x6e\x63\x5c\x0a\x00\x0d\xdd")'
./python vuln2.py

Python 3.5.1+ (default, Jan  4 2016, 00:05:40)

Attached the asan report
msg260583 - (view) Author: Sean Gillespie (swgillespie) * Date: 2016-02-21 00:10
Is anyone currently working on this? If not, I'd like to try and fix this. I've debugged this a little and think I have an idea of what's going on.
msg260644 - (view) Author: Sean Gillespie (swgillespie) * Date: 2016-02-21 22:27
Went ahead and did it since I had the time - the issue is that when doing a token of lookahead to see whether an 'async' at a top-level begins an 'async def' function or if it is an identifier. A shallow copy of the current token is made and given to another call to tok_get, which frees the token's buffer if a decoding error occurs. Since the shallow copy cloned the token's buffer pointer, the still-live token contains a freed pointer to its buffer that gets freed again later on.

By explicitly nulling-out the token's buffer pointer like tok_get does if the copied token's buffer pointer was nulled out, we avoid the double-free issue and present the correct syntax error:

$ ./python vuln.py 
  File "vuln.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xef' in file vuln.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

William Bowling's second program is also fixed with this change, with one additional wrinkle: if a token contains a null byte as the
first character, an invalid write occurs when we attempt to replace the null character with a newline. This fix checks to make sure
that this is not the case before performing the newline insertion.

With this change, both of William Bowling's programs pass valgrind and
present the appropriate syntax error. I tried to add this to the couroutine syntax tests, but any way to load the file outside of giving it to ./python itself fails (correctly) because the program contains a null byte.
msg326160 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-23 15:42
Is this still reproducible? On master (Python 3.8) with a debug build it throws a SyntaxError. I don't have Python 3.5 installed to check this though

$ ./python.exe
Python 3.8.0a0 (heads/master:c87d9f406b, Sep 23 2018, 19:48:30)
[Clang 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
➜  cpython git:(master) ./python.exe -c 'with open("vuln.py", "wb") as f: f.write(b"\x61\x73\x00\x0a\x79\x6e\x63\x5c\x0a\xef")'
➜  cpython git:(master) ✗ ./python.exe vuln.py
  File "vuln.py", line 2
SyntaxError: Non-UTF-8 code starting with '\xef' in file vuln.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
➜  cpython git:(master) ✗ ./python.exe -c 'with open("vuln2.py", "wb") as f: f.write(b"\x61\x73\x00\x0a\x79\x6e\x63\x5c\x0a\x00\x0d\xdd")'
➜  cpython git:(master) ✗ ./python.exe vuln2.py
  File "vuln2.py", line 3
SyntaxError: Non-UTF-8 code starting with '\xdd' in file vuln2.py on line 3, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details


Thanks
msg326199 - (view) Author: William Bowling (William Bowling) Date: 2018-09-24 02:43
> Is this still reproducible? On master (Python 3.8) with a debug build it throws a SyntaxError. I don't have Python 3.5 installed to check this though

Looks like it's fixed in master and 3.6.6 but still happening in 3.5.6
msg326204 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-24 07:08
Thanks William for the information. I can reproduce this on 3.5.6. I was able to bisect this down to
#31852 that deals with similar cases and fixed with commit 690c36f2f1085145d364a89bfed5944dd2470308.

$ cpython git:(master) git checkout 690c36f2f1085145d364a89bfed5944dd2470308
HEAD is now at 690c36f2f1 [3.6] bpo-31852: Fix segfault caused by using the async soft keyword (GH-4122)
$ cpython git:(690c36f2f1) git clean -xdf && ./configure --with-pydebug && make -s -j4
$ cpython git:(690c36f2f1) ./python.exe ../backups/vuln.py
  File "../backups/vuln.py", line 2
SyntaxError: Non-UTF-8 code starting with '\xef' in file ../backups/vuln.py on line 2, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
$ cpython git:(690c36f2f1) ./python.exe ../backups/vuln2.py
  File "../backups/vuln2.py", line 3
SyntaxError: Non-UTF-8 code starting with '\xdd' in file ../backups/vuln2.py on line 3, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

# Reproduce the crash

➜  cpython git:(690c36f2f1) git checkout 690c36f2f1085145d364a89bfed5944dd2470308~1
Previous HEAD position was 690c36f2f1 [3.6] bpo-31852: Fix segfault caused by using the async soft keyword (GH-4122)
HEAD is now at 2702380870 bpo-31304: Update starmap_async documentation. (GH-4168) (GH-4177)
➜  cpython git:(2702380870) make
➜  cpython git:(2702380870) ./python.exe ../backups/vuln2.py
Assertion failed: (!PyErr_Occurred()), function PyObject_Call, file Objects/abstract.c, line 2247.
^[[A[2]    71701 abort      ./python.exe ../backups/vuln2.py
➜  cpython git:(2702380870) ./python.exe ../backups/vuln.py
Assertion failed: (!PyErr_Occurred()), function PyObject_Call, file Objects/abstract.c, line 2247.
[2]    71712 abort      ./python.exe ../backups/vuln.py

It doesn't affect master, 3.7.0 and v3.6.4+ . Since 3.5 is in security mode and was not backported to 3.5 in the linked ticket. I propose to close this ticket and reopen a separate one with Larry added to it if the fix needs an explicit backport to 3.5.6 on priority.


Thanks
msg326250 - (view) Author: Karthikeyan Singaravelan (xtreak) * (Python committer) Date: 2018-09-24 15:32
As part of triaging I am closing this issue as duplicate adding issue31852 as superseder which has the relevant PR and discussion about the fix. I have also verified the fix as in https://bugs.python.org/issue26000#msg326204. I think backporting the fix to Python 3.5 can be opened as a separate issue adding Larry since 3.5 is in security fixes mode if needed.

Thanks again everyone for the details.
History
Date User Action Args
2022-04-11 14:58:25adminsetgithub: 70188
2018-09-24 15:32:03xtreaksetstatus: open -> closed
superseder: Crashes with lines of the form "async \"
messages: + msg326250

resolution: duplicate
stage: resolved
2018-09-24 07:08:21xtreaksetmessages: + msg326204
2018-09-24 02:43:34William Bowlingsetmessages: + msg326199
2018-09-23 15:42:50xtreaksetnosy: + xtreak
messages: + msg326160
2016-02-21 22:27:15swgillespiesetfiles: + tokenizer_double_free.patch
keywords: + patch
messages: + msg260644
2016-02-21 00:10:43swgillespiesetnosy: + swgillespie
messages: + msg260583
2016-01-04 03:02:20William Bowlingsetfiles: + asan2.txt

messages: + msg257439
2016-01-03 16:32:52serhiy.storchakasetpriority: normal -> high
assignee: serhiy.storchaka

nosy: + serhiy.storchaka
2016-01-03 13:50:57William Bowlingcreate