New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bytearray.find Buffer Over-read #68650
Comments
The bytearray.find method suffers from a buffer over-read that can be triggered by passing a string equal in length to the buffer. The result is a read off the end of the buffer, which could potentially be exploited to disclose the contents of adjacent memory. Repro: Exception:
FAULTING_IP: EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) CONTEXT: 0000000 -- (.cxr 0x0;r) FAULTING_THREAD: 00001e90 DEFAULT_BUCKET_ID: INVALID_POINTER_READ PROCESS_NAME: pythonw.exe ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s. EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s. EXCEPTION_PARAMETER1: 0000000 EXCEPTION_PARAMETER2: 071ae000 READ_ADDRESS: 071ae000 FOLLOWUP_IP: NTGLOBALFLAG: 2000000 APPLICATION_VERIFIER_FLAGS: 0 APP: pythonw.exe ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) x86fre PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ LAST_CONTROL_TRANSFER: from 1e081ee5 to 1e081cf9 STACK_TEXT: STACK_COMMAND: .cxr 0x0 ; kb FAULTING_SOURCE_LINE: c:\build27\cpython\objects\stringlib\find.h FAULTING_SOURCE_FILE: c:\build27\cpython\objects\stringlib\find.h FAULTING_SOURCE_LINE_NUMBER: 22 FAULTING_SOURCE_CODE:
SYMBOL_STACK_INDEX: 0 SYMBOL_NAME: python27!stringlib_find+169 FOLLOWUP_NAME: MachineOwner MODULE_NAME: python27 IMAGE_NAME: python27.dll DEBUG_FLR_IMAGE_TIMESTAMP: 5488ac17 FAILURE_BUCKET_ID: INVALID_POINTER_READ_c0000005_python27.dll!stringlib_find BUCKET_ID: APPLICATION_FAULT_INVALID_POINTER_READ_python27!stringlib_find+169 ANALYSIS_SOURCE: UM FAILURE_ID_HASH_STRING: um:invalid_pointer_read_c0000005_python27.dll!stringlib_find FAILURE_ID_HASH: {e37593ba-d07f-07cf-b7e5-32630cfd6e24} Followup: MachineOwner |
Quick analysis tells this can be attributed to the following code (in 2.7): https://hg.python.org/cpython/file/a8e24d776e99/Objects/stringlib/fastsearch.h#l110 Suppose i = 0, then s[i+m] causes OOB access when m=n. Note only one iteration is possible in case of m=n due to loop condition of i <= (w = n-m = 0). Theoretically, one can try disclosing one adjacent byte, but more likely results are nothing (or potentially invalid match result) or a potential crash in an unlucky case of s[m] hitting an unmapped page. The same code lives in 3.2 (and likely any prior 3.x release), and 3.3 seems to be affected as well. 3.4 code has a modified version, but has the same problem (ss = s + m - 1; if (!STRINGLIB_BLOOM(mask, ss[i+1])) ...). |
From the author's page at http://effbot.org/zone/stringlib.htm "Note that the above Python code may access s[n], which would result in an IndexError exception. For the CPython implementation, this is not really a problem, since CPython adds trailing NULL entries to both 8-bit and Unicode strings." Apparently, this flaw was known to the author, but was not documented in C code. A possible quick-and-dirty solution is to treat m=n as a special case and resort to memcmp() or somesuch as there is no actual need to perform multiple match tries. This should fix things for bytearray and str in case str's implementation changes from appending a trailing NUL. |
Given my understanding of the issue, the memcmp approach seems like a viable fix. |
I am preparing a patch for this issue, then. |
Attached please find a patch against the 2.7 branch. CPython built with the patch passes the tests from the test suite. Unfortunately, as there is not much control over memory allocation, there is no 100% reliable test case that would allow for reproducing the reported issue. Some notes on the patch. I have studied possible ways of fixing the issue narrowing them to two options; the approaches considered were: Of these two, approach a is less "invasive" as changes, in theory, would be contained in bytearray() code only and should not affect any other code. Approach b fixes all possible cases, but affects other code not related to bytearray. Upon closer studying of both bytearray and stringlib code, I discovered that it might be impossible to patch bytearray code only as stringlib contains a few methods that make use of the affected fastsearch() function, see e.g. stringlib_partition() as used in bytearray_partition(). If the approach of fixing bytearray specific code only would be chosen, I have to incorporate at least some of code following the fastsearch() call in stringlib_partition(). Similar considerations apply to other bytearray methods that make use of stringlib; the amount of code duplication varies. The end result is, I chose to patch fastsearch() instead. Performance wise, the change incurs a small penalty due to one extra branch when m != n and brings considerable gain in (potentially rare) case when m = n. I would appreciate if someone could test and review the patch. NB. I stand corrected for the comment in msg245457 -- there is a note I missed in the C code. My sincere apologies to the author. |
The patch for bpo-24467 also fixed this issue in different way. In any case thank you Dmitry for your patches. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: