This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

classification
Title: pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror()
Type: security Stage: resolved
Components: C API Versions: Python 3.11, Python 3.10
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: Nosy List: BTaskaya, elmanto, gvanrossum, lys.nikolaou, miss-islington, pablogsal
Priority: release blocker Keywords: patch

Created on 2021-06-11 14:39 by elmanto, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
crashes.tgz elmanto, 2021-06-11 14:39 Inputs that result in the asan violation
Pull Requests
URL Status Linked Edit
PR 26676 merged pablogsal, 2021-06-11 17:18
PR 26695 merged miss-islington, 2021-06-12 17:53
Messages (10)
msg395637 - (view) Author: alessandro mantovani (elmanto) Date: 2021-06-11 14:39
Use After Free in python3.11 (commit  2ab27c4af4ddf752)
Steps to reproduce:

1) ./configure --with-address-sanitizer
2) make
3) ./python <input>

I attach some of the input  that lead to the undefined behavior

For the complete description you can find the asan report here:

==1082579==ERROR: AddressSanitizer: heap-use-after-free on address 0x626000045a40 at pc 0x000000735155 bp 0x7fffffffbed0 sp 0x7fffffffbec8
READ of size 8 at 0x626000045a40 thread T0
    #0 0x735154 in ascii_decode /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5091:28
    #1 0x735154 in unicode_decode_utf8 /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5158:10
    #2 0xc98381 in syntaxerror /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c:1087:15
    #3 0xc8d616 in tok_get /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c
    #4 0xc8696b in PyTokenizer_Get /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c:1884:18
    #5 0xead74c in _PyPegen_check_tokenizer_errors /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1260:17
    #6 0xead74c in _PyPegen_run_parser /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1292:17
    #7 0xeaebca in _PyPegen_run_parser_from_file_pointer /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1377:14
    #8 0xc83a91 in _PyParser_ASTFromFile /home/elmanto/ddg/other_targets/cpython/Parser/peg_api.c:26:12
    #9 0xa0abf1 in pyrun_file /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:1197:11
    #10 0xa0abf1 in _PyRun_SimpleFileObject /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:455:13
    #11 0xa09b19 in _PyRun_AnyFileObject /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:89:15
    #12 0x4dfe94 in pymain_run_file_obj /home/elmanto/ddg/other_targets/cpython/Modules/main.c:353:15
    #13 0x4dfe94 in pymain_run_file /home/elmanto/ddg/other_targets/cpython/Modules/main.c:372:15
    #14 0x4dfe94 in pymain_run_python /home/elmanto/ddg/other_targets/cpython/Modules/main.c:587:21
    #15 0x4dfe94 in Py_RunMain /home/elmanto/ddg/other_targets/cpython/Modules/main.c:666:5
    #16 0x4e154c in pymain_main /home/elmanto/ddg/other_targets/cpython/Modules/main.c:696:12
    #17 0x4e1874 in Py_BytesMain /home/elmanto/ddg/other_targets/cpython/Modules/main.c:720:12
    #18 0x7ffff7a2e0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
    #19 0x43501d in _start (/home/elmanto/ddg/other_targets/cpython/python+0x43501d)

0x626000045a40 is located 2368 bytes inside of 10560-byte region [0x626000045100,0x626000047a40)
freed by thread T0 here:
    #0 0x4ada79 in realloc (/home/elmanto/ddg/other_targets/cpython/python+0x4ada79)
    #1 0x638e61 in PyMem_RawRealloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:602:12
    #2 0x638e61 in _PyObject_Realloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:2339:12

previously allocated by thread T0 here:
    #0 0x4ada79 in realloc (/home/elmanto/ddg/other_targets/cpython/python+0x4ada79)
    #1 0x638e61 in PyMem_RawRealloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:602:12
    #2 0x638e61 in _PyObject_Realloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:2339:12

SUMMARY: AddressSanitizer: heap-use-after-free /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5091:28 in ascii_decode
Shadow bytes around the buggy address:
  0x0c4c80000af0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4c80000b00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4c80000b10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4c80000b20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4c80000b30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
=>0x0c4c80000b40: fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd
  0x0c4c80000b50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4c80000b60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4c80000b70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4c80000b80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c4c80000b90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==1082579==ABORTING
msg395641 - (view) Author: Guido van Rossum (gvanrossum) * (Python committer) Date: 2021-06-11 15:44
Lysandros and Pablo, this *only* occurs when the lexer is reading directly from a file, not when it's reading the same source code from a (bytes) string. All examples are syntax errors (some raise ValueError in the parser).
msg395646 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-11 16:46
Here is a smaller reproducer:

x = "ijosdfsd\
def blech():
    pass

This seems to be an error with:

commit a698d52c3975c80b45b139b2f08402ec514dce75
Author: Batuhan Taskaya <isidentical@gmail.com>
Date:   Thu Jan 21 00:38:47 2021 +0300

    bpo-40176: Improve error messages for unclosed string literals (GH-19346)



    Automerge-Triggered-By: GH:isidentical

Batuhan, can you take a look?
msg395647 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-11 16:54
I think this should fix the issue, but someone should validate this:

diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c
index 6002f3e05a..1c28737183 100644
--- a/Parser/tokenizer.c
+++ b/Parser/tokenizer.c
@@ -1084,17 +1084,16 @@ syntaxerror(struct tok_state *tok, const char *format, ...)
         goto error;
     }

-    errtext = PyUnicode_DecodeUTF8(tok->line_start, tok->cur - tok->line_start,
+    errtext = PyUnicode_DecodeUTF8(tok->buf, tok->inp - tok->buf,
                                    "replace");
     if (!errtext) {
         goto error;
     }
     int offset = (int)PyUnicode_GET_LENGTH(errtext);
-    Py_ssize_t line_len = strcspn(tok->line_start, "\n");
-    if (line_len != tok->cur - tok->line_start) {
+    Py_ssize_t line_len = strcspn(tok->buf, "\n");
+    if (line_len != tok->buf - tok->inp) {
         Py_DECREF(errtext);
-        errtext = PyUnicode_DecodeUTF8(tok->line_start, line_len,
-                                       "replace");
+        errtext = PyUnicode_DecodeUTF8(tok->buf, line_len, "replace");
     }
     if (!errtext) {
         goto error;
msg395648 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-11 16:58
This affects 3.10 as well
msg395651 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-11 17:18
Ok, found the problem, we are not resetting the multi-line-start pointer when we are reallocating the tokenizer buffers.
msg395694 - (view) Author: miss-islington (miss-islington) Date: 2021-06-12 17:53
New changeset a342cc5891dbd8a08d40e9444f2e2c9e93258721 by Pablo Galindo in branch 'main':
bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676)
https://github.com/python/cpython/commit/a342cc5891dbd8a08d40e9444f2e2c9e93258721
msg395695 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-12 17:57
alessandro mantovani, one question, how did you generate the crash scripts?
msg395705 - (view) Author: Pablo Galindo Salgado (pablogsal) * (Python committer) Date: 2021-06-12 20:27
New changeset d03f342a8389f1ea9100efb0d1a205601e607254 by Miss Islington (bot) in branch '3.10':
bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676) (GH-26695)
https://github.com/python/cpython/commit/d03f342a8389f1ea9100efb0d1a205601e607254
msg395726 - (view) Author: alessandro mantovani (elmanto) Date: 2021-06-13 03:43
Fuzzing experimental techniques, but then I observed the same behavior was happening with vanilla afl++. As a starting queue I used the *.py files that I found in the repo under ‘test’ or so 

Best 

Alessandro Mantovani 

Inviato da iPhone

> Il giorno 12.06.2021, alle ore 19:57, Pablo Galindo Salgado <report@bugs.python.org> ha scritto:
> 
> 
> Pablo Galindo Salgado <pablogsal@gmail.com> added the comment:
> 
> alessandro mantovani, one question, how did you generate the crash scripts?
> 
> ----------
> 
> _______________________________________
> Python tracker <report@bugs.python.org>
> <https://bugs.python.org/issue44396>
> _______________________________________
History
Date User Action Args
2022-04-11 14:59:46adminsetgithub: 88562
2021-06-13 03:43:40elmantosetmessages: + msg395726
2021-06-12 20:27:49pablogsalsetstatus: open -> closed
resolution: fixed
stage: patch review -> resolved
2021-06-12 20:27:10pablogsalsetmessages: + msg395705
2021-06-12 17:57:28pablogsalsetmessages: + msg395695
2021-06-12 17:53:58miss-islingtonsetstage: patch review
pull_requests: + pull_request25280
2021-06-12 17:53:57miss-islingtonsetnosy: + miss-islington
messages: + msg395694
2021-06-11 17:18:30pablogsalsetmessages: + msg395651
stage: patch review -> (no value)
2021-06-11 17:18:12pablogsalsetkeywords: + patch
stage: patch review
pull_requests: + pull_request25262
2021-06-11 16:58:59pablogsalsetpriority: normal -> release blocker
2021-06-11 16:58:32pablogsalsetmessages: + msg395648
versions: + Python 3.10
2021-06-11 16:54:50pablogsalsetmessages: + msg395647
2021-06-11 16:46:26pablogsalsetnosy: + BTaskaya
messages: + msg395646
2021-06-11 15:44:02gvanrossumsetmessages: + msg395641
2021-06-11 15:10:17vstinnersetnosy: + gvanrossum, lys.nikolaou, pablogsal

title: Use-After-Free -> pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror()
2021-06-11 14:39:45elmantocreate