Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror() #88562

Closed
elManto mannequin opened this issue Jun 11, 2021 · 10 comments
Closed

pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror() #88562

elManto mannequin opened this issue Jun 11, 2021 · 10 comments
Labels
3.10 only security fixes 3.11 only security fixes release-blocker topic-C-API type-security A security issue

Comments

@elManto
Copy link
Mannequin

elManto mannequin commented Jun 11, 2021

BPO 44396
Nosy @gvanrossum, @lysnikolaou, @pablogsal, @miss-islington, @isidentical, @elManto
PRs
  • bpo-44396: Update multi-line-start location when reallocating tokenizer buffers #26676
  • [3.10] bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676) #26695
  • Files
  • crashes.tgz: Inputs that result in the asan violation
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2021-06-12.20:27:49.008>
    created_at = <Date 2021-06-11.14:39:45.033>
    labels = ['type-security', 'expert-C-API', 'release-blocker', '3.10', '3.11']
    title = 'pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror()'
    updated_at = <Date 2021-06-13.03:43:40.144>
    user = 'https://github.com/elManto'

    bugs.python.org fields:

    activity = <Date 2021-06-13.03:43:40.144>
    actor = 'elmanto'
    assignee = 'none'
    closed = True
    closed_date = <Date 2021-06-12.20:27:49.008>
    closer = 'pablogsal'
    components = ['C API']
    creation = <Date 2021-06-11.14:39:45.033>
    creator = 'elmanto'
    dependencies = []
    files = ['50105']
    hgrepos = []
    issue_num = 44396
    keywords = ['patch']
    message_count = 10.0
    messages = ['395637', '395641', '395646', '395647', '395648', '395651', '395694', '395695', '395705', '395726']
    nosy_count = 6.0
    nosy_names = ['gvanrossum', 'lys.nikolaou', 'pablogsal', 'miss-islington', 'BTaskaya', 'elmanto']
    pr_nums = ['26676', '26695']
    priority = 'release blocker'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'security'
    url = 'https://bugs.python.org/issue44396'
    versions = ['Python 3.10', 'Python 3.11']

    @elManto
    Copy link
    Mannequin Author

    elManto mannequin commented Jun 11, 2021

    Use After Free in python3.11 (commit 2ab27c4)
    Steps to reproduce:

    1. ./configure --with-address-sanitizer
    2. make
    3. ./python <input>

    I attach some of the input that lead to the undefined behavior

    For the complete description you can find the asan report here:

    ==1082579==ERROR: AddressSanitizer: heap-use-after-free on address 0x626000045a40 at pc 0x000000735155 bp 0x7fffffffbed0 sp 0x7fffffffbec8
    READ of size 8 at 0x626000045a40 thread T0
    #0 0x735154 in ascii_decode /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5091:28
    #1 0x735154 in unicode_decode_utf8 /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5158:10
    #2 0xc98381 in syntaxerror /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c:1087:15
    #3 0xc8d616 in tok_get /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c
    #4 0xc8696b in PyTokenizer_Get /home/elmanto/ddg/other_targets/cpython/Parser/tokenizer.c:1884:18
    #5 0xead74c in _PyPegen_check_tokenizer_errors /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1260:17
    #6 0xead74c in _PyPegen_run_parser /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1292:17
    #7 0xeaebca in _PyPegen_run_parser_from_file_pointer /home/elmanto/ddg/other_targets/cpython/Parser/pegen.c:1377:14
    #8 0xc83a91 in _PyParser_ASTFromFile /home/elmanto/ddg/other_targets/cpython/Parser/peg_api.c:26:12
    #9 0xa0abf1 in pyrun_file /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:1197:11
    #10 0xa0abf1 in _PyRun_SimpleFileObject /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:455:13
    #11 0xa09b19 in _PyRun_AnyFileObject /home/elmanto/ddg/other_targets/cpython/Python/pythonrun.c:89:15
    #12 0x4dfe94 in pymain_run_file_obj /home/elmanto/ddg/other_targets/cpython/Modules/main.c:353:15
    #13 0x4dfe94 in pymain_run_file /home/elmanto/ddg/other_targets/cpython/Modules/main.c:372:15
    #14 0x4dfe94 in pymain_run_python /home/elmanto/ddg/other_targets/cpython/Modules/main.c:587:21
    #15 0x4dfe94 in Py_RunMain /home/elmanto/ddg/other_targets/cpython/Modules/main.c:666:5
    #16 0x4e154c in pymain_main /home/elmanto/ddg/other_targets/cpython/Modules/main.c:696:12
    #17 0x4e1874 in Py_BytesMain /home/elmanto/ddg/other_targets/cpython/Modules/main.c:720:12
    #18 0x7ffff7a2e0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
    #19 0x43501d in _start (/home/elmanto/ddg/other_targets/cpython/python+0x43501d)

    0x626000045a40 is located 2368 bytes inside of 10560-byte region [0x626000045100,0x626000047a40)
    freed by thread T0 here:
    #0 0x4ada79 in realloc (/home/elmanto/ddg/other_targets/cpython/python+0x4ada79)
    #1 0x638e61 in PyMem_RawRealloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:602:12
    #2 0x638e61 in _PyObject_Realloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:2339:12

    previously allocated by thread T0 here:
    #0 0x4ada79 in realloc (/home/elmanto/ddg/other_targets/cpython/python+0x4ada79)
    #1 0x638e61 in PyMem_RawRealloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:602:12
    #2 0x638e61 in _PyObject_Realloc /home/elmanto/ddg/other_targets/cpython/Objects/obmalloc.c:2339:12

    SUMMARY: AddressSanitizer: heap-use-after-free /home/elmanto/ddg/other_targets/cpython/Objects/unicodeobject.c:5091:28 in ascii_decode
    Shadow bytes around the buggy address:
    0x0c4c80000af0: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4c80000b00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4c80000b10: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4c80000b20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4c80000b30: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    =>0x0c4c80000b40: fd fd fd fd fd fd fd fd[fd]fd fd fd fd fd fd fd
    0x0c4c80000b50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4c80000b60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4c80000b70: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4c80000b80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    0x0c4c80000b90: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
    Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable: 00
    Partially addressable: 01 02 03 04 05 06 07
    Heap left redzone: fa
    Freed heap region: fd
    Stack left redzone: f1
    Stack mid redzone: f2
    Stack right redzone: f3
    Stack after return: f5
    Stack use after scope: f8
    Global redzone: f9
    Global init order: f6
    Poisoned by user: f7
    Container overflow: fc
    Array cookie: ac
    Intra object redzone: bb
    ASan internal: fe
    Left alloca redzone: ca
    Right alloca redzone: cb
    Shadow gap: cc
    ==1082579==ABORTING

    @elManto elManto mannequin added 3.11 only security fixes topic-C-API type-security A security issue labels Jun 11, 2021
    @vstinner vstinner changed the title Use-After-Free pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror() Jun 11, 2021
    @vstinner vstinner changed the title Use-After-Free pegen _PyParser_ASTFromFile(): Use-After-Free in syntaxerror() Jun 11, 2021
    @gvanrossum
    Copy link
    Member

    Lysandros and Pablo, this *only* occurs when the lexer is reading directly from a file, not when it's reading the same source code from a (bytes) string. All examples are syntax errors (some raise ValueError in the parser).

    @pablogsal
    Copy link
    Member

    Here is a smaller reproducer:

    x = "ijosdfsd\
    def blech():
        pass

    This seems to be an error with:

    commit a698d52
    Author: Batuhan Taskaya <isidentical@gmail.com>
    Date: Thu Jan 21 00:38:47 2021 +0300

    bpo-40176: Improve error messages for unclosed string literals (GH-19346)
    
    
    
    Automerge-Triggered-By: GH:isidentical
    

    Batuhan, can you take a look?

    @pablogsal
    Copy link
    Member

    I think this should fix the issue, but someone should validate this:

    diff --git a/Parser/tokenizer.c b/Parser/tokenizer.c
    index 6002f3e05a..1c28737183 100644
    --- a/Parser/tokenizer.c
    +++ b/Parser/tokenizer.c
    @@ -1084,17 +1084,16 @@ syntaxerror(struct tok_state *tok, const char *format, ...)
    goto error;
    }

    -    errtext = PyUnicode_DecodeUTF8(tok->line_start, tok->cur - tok->line_start,
    +    errtext = PyUnicode_DecodeUTF8(tok->buf, tok->inp - tok->buf,
                                        "replace");
         if (!errtext) {
             goto error;
         }
         int offset = (int)PyUnicode_GET_LENGTH(errtext);
    -    Py_ssize_t line_len = strcspn(tok->line_start, "\n");
    -    if (line_len != tok->cur - tok->line_start) {
    +    Py_ssize_t line_len = strcspn(tok->buf, "\n");
    +    if (line_len != tok->buf - tok->inp) {
             Py_DECREF(errtext);
    -        errtext = PyUnicode_DecodeUTF8(tok->line_start, line_len,
    -                                       "replace");
    +        errtext = PyUnicode_DecodeUTF8(tok->buf, line_len, "replace");
         }
         if (!errtext) {
             goto error;

    @pablogsal
    Copy link
    Member

    This affects 3.10 as well

    @pablogsal pablogsal added 3.10 only security fixes release-blocker labels Jun 11, 2021
    @pablogsal
    Copy link
    Member

    Ok, found the problem, we are not resetting the multi-line-start pointer when we are reallocating the tokenizer buffers.

    @miss-islington
    Copy link
    Contributor

    New changeset a342cc5 by Pablo Galindo in branch 'main':
    bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676)
    a342cc5

    @pablogsal
    Copy link
    Member

    alessandro mantovani, one question, how did you generate the crash scripts?

    @pablogsal
    Copy link
    Member

    New changeset d03f342 by Miss Islington (bot) in branch '3.10':
    bpo-44396: Update multi-line-start location when reallocating tokenizer buffers (GH-26676) (GH-26695)
    d03f342

    @elManto
    Copy link
    Mannequin Author

    elManto mannequin commented Jun 13, 2021

    Fuzzing experimental techniques, but then I observed the same behavior was happening with vanilla afl++. As a starting queue I used the *.py files that I found in the repo under ‘test’ or so

    Best

    Alessandro Mantovani

    Inviato da iPhone

    Il giorno 12.06.2021, alle ore 19:57, Pablo Galindo Salgado <report@bugs.python.org> ha scritto:

    
    Pablo Galindo Salgado <pablogsal@gmail.com> added the comment:

    alessandro mantovani, one question, how did you generate the crash scripts?

    ----------


    Python tracker <report@bugs.python.org>
    <https://bugs.python.org/issue44396\>


    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes 3.11 only security fixes release-blocker topic-C-API type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants