Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bytearray.find Buffer Over-read #68650

Closed
JohnLeitch mannequin opened this issue Jun 17, 2015 · 7 comments
Closed

bytearray.find Buffer Over-read #68650

JohnLeitch mannequin opened this issue Jun 17, 2015 · 7 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-security A security issue

Comments

@JohnLeitch
Copy link
Mannequin

JohnLeitch mannequin commented Jun 17, 2015

BPO 24462
Nosy @serhiy-storchaka
Superseder
  • bpo-24467: bytearray pop and remove Buffer Over-read
  • Files
  • bytearray.find_Buffer_Over-read.py
  • issue24462-2.7.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2015-06-29.18:25:33.709>
    created_at = <Date 2015-06-17.14:26:39.907>
    labels = ['type-security', 'interpreter-core']
    title = 'bytearray.find Buffer Over-read'
    updated_at = <Date 2015-07-05.21:17:37.372>
    user = 'https://bugs.python.org/JohnLeitch'

    bugs.python.org fields:

    activity = <Date 2015-07-05.21:17:37.372>
    actor = 'Arfrever'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2015-06-29.18:25:33.709>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2015-06-17.14:26:39.907>
    creator = 'JohnLeitch'
    dependencies = []
    files = ['39719', '39772']
    hgrepos = []
    issue_num = 24462
    keywords = ['patch']
    message_count = 7.0
    messages = ['245436', '245456', '245457', '245569', '245622', '245635', '245955']
    nosy_count = 4.0
    nosy_names = ['Arfrever', 'serhiy.storchaka', 'JohnLeitch', 'dev_zzo']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = 'resolved'
    status = 'closed'
    superseder = '24467'
    type = 'security'
    url = 'https://bugs.python.org/issue24462'
    versions = ['Python 2.7']

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Jun 17, 2015

    The bytearray.find method suffers from a buffer over-read that can be triggered by passing a string equal in length to the buffer. The result is a read off the end of the buffer, which could potentially be exploited to disclose the contents of adjacent memory.

    Repro:
    var_kcjtxvgr = bytearray([0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44,0x41,0x42,0x43,0x44])
    var_kcjtxvgr.find("\x41" * 0x58)

    Exception:
    0:000> r
    eax=00000002 ebx=00000058 ecx=071adf41 edx=00000000 esi=071f2264 edi=00000057
    eip=1e081cf9 esp=0027fc2c ebp=071ae000 iopl=0 nv up ei pl nz na pe nc
    cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
    python27!stringlib_find+0x169:
    1e081cf9 0fbe0c2a movsx ecx,byte ptr [edx+ebp] ds:002b:071ae000=??
    0:000> dV
    str = 0x071adfa8 "ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCD"
    str_len = 0n2
    sub = 0x071f2264 "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
    sub_len = 0n88
    offset = 0n0
    0:000> db ebp-0x10
    071adff0 41 42 43 44 41 42 43 44-41 42 43 44 41 42 43 44 ABCDABCDABCDABCD
    071ae000 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    071ae010 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    071ae020 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    071ae030 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    071ae040 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    071ae050 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    071ae060 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ????????????????
    0:000> !analyze -v -nodb


    •                                                                         *
      
    •                    Exception Analysis                                   *
      
    •                                                                         *
      

    FAULTING_IP:
    python27!stringlib_find+169 [c:\build27\cpython\objects\stringlib\find.h @ 22]
    1e081cf9 0fbe0c2a movsx ecx,byte ptr [edx+ebp]

    EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff)
    ExceptionAddress: 1e081cf9 (python27!stringlib_find+0x00000169)
    ExceptionCode: c0000005 (Access violation)
    ExceptionFlags: 0000000
    NumberParameters: 2
    Parameter[0]: 0000000
    Parameter[1]: 071ae000
    Attempt to read from address 071ae000

    CONTEXT: 0000000 -- (.cxr 0x0;r)
    eax=00000002 ebx=00000058 ecx=071adf41 edx=00000000 esi=071f2264 edi=00000057
    eip=1e081cf9 esp=0027fc2c ebp=071ae000 iopl=0 nv up ei pl nz na pe nc
    cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010206
    python27!stringlib_find+0x169:
    1e081cf9 0fbe0c2a movsx ecx,byte ptr [edx+ebp] ds:002b:071ae000=??

    FAULTING_THREAD: 00001e90

    DEFAULT_BUCKET_ID: INVALID_POINTER_READ

    PROCESS_NAME: pythonw.exe

    ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

    EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

    EXCEPTION_PARAMETER1: 0000000

    EXCEPTION_PARAMETER2: 071ae000

    READ_ADDRESS: 071ae000

    FOLLOWUP_IP:
    python27!stringlib_find+169 [c:\build27\cpython\objects\stringlib\find.h @ 22]
    1e081cf9 0fbe0c2a movsx ecx,byte ptr [edx+ebp]

    NTGLOBALFLAG: 2000000

    APPLICATION_VERIFIER_FLAGS: 0

    APP: pythonw.exe

    ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) x86fre

    PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ

    BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ

    LAST_CONTROL_TRANSFER: from 1e081ee5 to 1e081cf9

    STACK_TEXT:
    0027fc48 1e081ee5 071adfa8 071f2264 00000058 python27!stringlib_find+0x169
    0027fc5 1e083ac1 071adfa8 071f2264 00000058 python27!stringlib_find_slice+0x35
    0027fcb4 1e083b20 00000001 1e083b10 1e0aafd7 python27!bytearray_find_internal+0x81
    0027fcc0 1e0aafd7 070880c8 071d7a10 07086170 python27!bytearray_find+0x10
    0027fcd8 1e0edd10 07086170 071d7a10 0000000 python27!PyCFunction_Call+0x47
    0027fd04 1e0f017a 0027fd5c 06cc7c80 06cc7c80 python27!call_function+0x2b0
    0027fd74 1e0f1150 07060d60 0000000 06cc7c80 python27!PyEval_EvalFrameEx+0x239a
    0027fda8 1e0f11b2 06cc7c80 07060d60 06ccba50 python27!PyEval_EvalCodeEx+0x690
    0027fdd4 1e11707a 06cc7c80 06ccba50 06ccba50 python27!PyEval_EvalCode+0x22
    0027fdec 1e1181c5 0710ee20 06ccba50 06ccba50 python27!run_mod+0x2a
    0027fe0c 1e118760 623a7408 06c87fa4 00000101 python27!PyRun_FileExFlags+0x75
    0027fe4c 1e1190d9 623a7408 06c87fa4 00000001 python27!PyRun_SimpleFileExFlags+0x190
    0027fe68 1e038d35 623a7408 06c87fa4 00000001 python27!PyRun_AnyFileExFlags+0x59
    0027fee4 1d001017 00000002 06c87f80 1d0011b6 python27!Py_Main+0x965
    0027fef0 1d0011b6 1d000000 0000000 03bdffa0 pythonw!WinMain+0x17
    0027ff80 76477c04 7ffde000 76477be0 c4cc721a pythonw!__tmainCRTStartup+0x140
    0027ff94 7799ad1f 7ffde000 c53e80b9 0000000 KERNEL32!BaseThreadInitThunk+0x24
    0027ffdc 7799acea ffffffff 77980232 0000000 ntdll!__RtlUserThreadStart+0x2f
    0027ffec 0000000 1d001395 7ffde000 0000000 ntdll!_RtlUserThreadStart+0x1b

    STACK_COMMAND: .cxr 0x0 ; kb

    FAULTING_SOURCE_LINE: c:\build27\cpython\objects\stringlib\find.h

    FAULTING_SOURCE_FILE: c:\build27\cpython\objects\stringlib\find.h

    FAULTING_SOURCE_LINE_NUMBER: 22

    FAULTING_SOURCE_CODE:
    18: return -1;
    19: if (sub_len == 0)
    20: return offset;
    21:

    22: pos = fastsearch(str, str_len, sub, sub_len, -1, FAST_SEARCH);
    23:
    24: if (pos >= 0)
    25: pos += offset;
    26:
    27: return pos;

    SYMBOL_STACK_INDEX: 0

    SYMBOL_NAME: python27!stringlib_find+169

    FOLLOWUP_NAME: MachineOwner

    MODULE_NAME: python27

    IMAGE_NAME: python27.dll

    DEBUG_FLR_IMAGE_TIMESTAMP: 5488ac17

    FAILURE_BUCKET_ID: INVALID_POINTER_READ_c0000005_python27.dll!stringlib_find

    BUCKET_ID: APPLICATION_FAULT_INVALID_POINTER_READ_python27!stringlib_find+169

    ANALYSIS_SOURCE: UM

    FAILURE_ID_HASH_STRING: um:invalid_pointer_read_c0000005_python27.dll!stringlib_find

    FAILURE_ID_HASH: {e37593ba-d07f-07cf-b7e5-32630cfd6e24}

    Followup: MachineOwner
    ---------

    @JohnLeitch JohnLeitch mannequin added the type-security A security issue label Jun 17, 2015
    @devzzo
    Copy link
    Mannequin

    devzzo mannequin commented Jun 18, 2015

    Quick analysis tells this can be attributed to the following code (in 2.7):

    https://hg.python.org/cpython/file/a8e24d776e99/Objects/stringlib/fastsearch.h#l110
    https://hg.python.org/cpython/file/a8e24d776e99/Objects/stringlib/fastsearch.h#l116

    Suppose i = 0, then s[i+m] causes OOB access when m=n. Note only one iteration is possible in case of m=n due to loop condition of i <= (w = n-m = 0). Theoretically, one can try disclosing one adjacent byte, but more likely results are nothing (or potentially invalid match result) or a potential crash in an unlucky case of s[m] hitting an unmapped page.

    The same code lives in 3.2 (and likely any prior 3.x release), and 3.3 seems to be affected as well. 3.4 code has a modified version, but has the same problem (ss = s + m - 1; if (!STRINGLIB_BLOOM(mask, ss[i+1])) ...).

    @devzzo
    Copy link
    Mannequin

    devzzo mannequin commented Jun 18, 2015

    From the author's page at http://effbot.org/zone/stringlib.htm

    "Note that the above Python code may access s[n], which would result in an IndexError exception. For the CPython implementation, this is not really a problem, since CPython adds trailing NULL entries to both 8-bit and Unicode strings."

    Apparently, this flaw was known to the author, but was not documented in C code.

    A possible quick-and-dirty solution is to treat m=n as a special case and resort to memcmp() or somesuch as there is no actual need to perform multiple match tries. This should fix things for bytearray and str in case str's implementation changes from appending a trailing NUL.

    @JohnLeitch
    Copy link
    Mannequin Author

    JohnLeitch mannequin commented Jun 20, 2015

    Given my understanding of the issue, the memcmp approach seems like a viable fix.

    @devzzo
    Copy link
    Mannequin

    devzzo mannequin commented Jun 22, 2015

    I am preparing a patch for this issue, then.

    @devzzo
    Copy link
    Mannequin

    devzzo mannequin commented Jun 22, 2015

    Attached please find a patch against the 2.7 branch. CPython built with the patch passes the tests from the test suite. Unfortunately, as there is not much control over memory allocation, there is no 100% reliable test case that would allow for reproducing the reported issue.

    Some notes on the patch.

    I have studied possible ways of fixing the issue narrowing them to two options; the approaches considered were:
    a. Patch bytearray methods so they use stringlib's functions with respect to the corner case of out-of-bounds access on m = n.
    b. Patch fastsearch() avoiding the out-of-bounds access on m = n completely.

    Of these two, approach a is less "invasive" as changes, in theory, would be contained in bytearray() code only and should not affect any other code. Approach b fixes all possible cases, but affects other code not related to bytearray.

    Upon closer studying of both bytearray and stringlib code, I discovered that it might be impossible to patch bytearray code only as stringlib contains a few methods that make use of the affected fastsearch() function, see e.g. stringlib_partition() as used in bytearray_partition(). If the approach of fixing bytearray specific code only would be chosen, I have to incorporate at least some of code following the fastsearch() call in stringlib_partition(). Similar considerations apply to other bytearray methods that make use of stringlib; the amount of code duplication varies. The end result is, I chose to patch fastsearch() instead.

    Performance wise, the change incurs a small penalty due to one extra branch when m != n and brings considerable gain in (potentially rare) case when m = n.

    I would appreciate if someone could test and review the patch.

    NB. I stand corrected for the comment in msg245457 -- there is a note I missed in the C code. My sincere apologies to the author.

    @serhiy-storchaka serhiy-storchaka self-assigned this Jun 28, 2015
    @serhiy-storchaka
    Copy link
    Member

    The patch for bpo-24467 also fixed this issue in different way.

    In any case thank you Dmitry for your patches.

    @serhiy-storchaka serhiy-storchaka added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Jun 29, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    interpreter-core (Objects, Python, Grammar, and Parser dirs) type-security A security issue
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant