New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
int and float constructing from non NUL-terminated buffer #68990
Comments
Python suffers from a buffer over-read in PyFloat_FromString() that is caused by the incorrect assumption that buffers returned by PyObject_GetBuffer() are null-terminated. This could potentially result in the disclosure of adjacent memory. PyObject *
PyFloat_FromString(PyObject *v)
{
const char *s, *last, *end;
double x;
PyObject *s_buffer = NULL;
Py_ssize_t len;
Py_buffer view = {NULL, NULL};
PyObject *result = NULL;
if (PyUnicode_Check(v)) {
s_buffer = _PyUnicode_TransformDecimalAndSpaceToASCII(v);
if (s_buffer == NULL)
return NULL;
s = PyUnicode_AsUTF8AndSize(s_buffer, &len);
if (s == NULL) {
Py_DECREF(s_buffer);
return NULL;
}
}
else if (PyObject_GetBuffer(v, &view, PyBUF_SIMPLE) == 0) {
s = (const char *)view.buf; <<<< The unterminated buffer is retrieved here.
len = view.len;
}
else {
PyErr_Format(PyExc_TypeError,
"float() argument must be a string or a number, not '%.200s'",
Py_TYPE(v)->tp_name);
return NULL;
}
last = s + len;
/* strip space */
while (s < last && Py_ISSPACE(*s))
s++;
while (s < last - 1 && Py_ISSPACE(last[-1]))
last--;
/* We don't care about overflow or underflow. If the platform
* supports them, infinities and signed zeroes (on underflow) are
* fine. */
x = PyOS_string_to_double(s, (char **)&end, NULL); <<<< The buffer is then passed
along here.
if (end != last) {
PyErr_Format(PyExc_ValueError,
"could not convert string to float: "
"%R", v);
result = NULL;
}
else if (x == -1.0 && PyErr_Occurred())
result = NULL;
else
result = PyFloat_FromDouble(x);
PyBuffer_Release(&view);
Py_XDECREF(s_buffer);
return result;
}
double
PyOS_string_to_double(const char *s,
char **endptr,
PyObject *overflow_exception)
{
double x, result=-1.0;
char *fail_pos; errno = 0;
PyFPE_START_PROTECT("PyOS_string_to_double", return -1.0)
x = _PyOS_ascii_strtod(s, &fail_pos);
PyFPE_END_PROTECT(x) if (errno == ENOMEM) {
PyErr_NoMemory();
fail_pos = (char *)s;
}
else if (!endptr && (fail_pos == s || *fail_pos != '\0'))
PyErr_Format(PyExc_ValueError, <<<< If any of these error paths are taken, the
unterminated buffer is passed along without
its length, ultimately resulting in a call
to unicode_fromformat_write_cstr().
"could not convert string to float: "
"%.200s", s);
else if (fail_pos == s)
PyErr_Format(PyExc_ValueError,
"could not convert string to float: "
"%.200s", s);
else if (errno == ERANGE && fabs(x) >= 1.0 && overflow_exception)
PyErr_Format(overflow_exception,
"value too large to convert to float: "
"%.200s", s);
else
result = x;
if (endptr != NULL)
*endptr = fail_pos;
return result;
}
static int
unicode_fromformat_write_cstr(_PyUnicodeWriter *writer, const char *str,
Py_ssize_t width, Py_ssize_t precision)
{
/* UTF-8 */
Py_ssize_t length;
PyObject *unicode;
int res; length = strlen(str); <<<< str points to the unterminated buffer, which means
strlen() my read off the end, depending on the contents
of adjacent memory.
if (precision != -1)
length = Py_MIN(length, precision);
unicode = PyUnicode_DecodeUTF8Stateful(str, length, "replace", NULL); <<<< The new,
incorrect length is
passed along.
if (unicode == NULL)
return -1; res = unicode_fromformat_write_str(writer, unicode, width, -1);
Py_DECREF(unicode);
return res;
} A script that reproduces the issue is as follows: import array
float(array.array("B",b"A"*0x10)) And it produces the following exception: 0:000> gu
FAULTING_IP: EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) CONTEXT: 0000000 -- (.cxr 0x0;r) FAULTING_THREAD: 00000bec DEFAULT_BUCKET_ID: INVALID_POINTER_READ PROCESS_NAME: python.exe ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s. EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s. EXCEPTION_PARAMETER1: 0000000 EXCEPTION_PARAMETER2: 090a3000 READ_ADDRESS: 090a3000 FOLLOWUP_IP: NTGLOBALFLAG: 2000000 APPLICATION_VERIFIER_FLAGS: 0 FAULTING_LOCAL_VARIABLE_NAME: length APP: python.exe ANALYSIS_VERSION: 6.3.9600.17029 (debuggers(dbg).140219-1702) x86fre PRIMARY_PROBLEM_CLASS: INVALID_POINTER_READ BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ LAST_CONTROL_TRANSFER: from 651ac6e9 to 651ac280 STACK_TEXT: STACK_COMMAND: .cxr 0x0 ; kb FAULTING_SOURCE_LINE: c:\build\cpython\objects\unicodeobject.c FAULTING_SOURCE_FILE: c:\build\cpython\objects\unicodeobject.c FAULTING_SOURCE_LINE_NUMBER: 2368 FAULTING_SOURCE_CODE:
SYMBOL_STACK_INDEX: 0 SYMBOL_NAME: python35!unicode_fromformat_write_cstr+10 FOLLOWUP_NAME: MachineOwner MODULE_NAME: python35 IMAGE_NAME: python35.dll DEBUG_FLR_IMAGE_TIMESTAMP: 5598ccc2 FAILURE_BUCKET_ID: INVALID_POINTER_READ_c0000005_python35.dll!unicode_fromformat_write_cstr BUCKET_ID: APPLICATION_FAULT_INVALID_POINTER_READ_python35!unicode_fromformat_write_cstr+10 ANALYSIS_SOURCE: UM FAILURE_ID_HASH_STRING: um:invalid_pointer_read_c0000005_python35.dll!unicode_fromformat_write_cstr FAILURE_ID_HASH: {1d85cbc9-3259-9a7e-3da2-8540573292b2} Followup: MachineOwner To fix the issue, it is recommended that PyFloat_FromString() check the type of argument v after a successful PyObject_GetBuffer() call to determine if the buffer is null-terminated or otherwise. A proposed patch is attached. |
Attaching repro |
I prefer to merge bpo-24802 and bpo-24803 and discuss them at one place. Here is merged and revised patch. The patch is changed. There is a very rare corner case: when the type of argument is a subclass of bytes or bytearray with overloaded tp_as_buffer->bf_getbuffer, returned view can be not NUL terminated. To avoid ambiguity and for unifying with int constructor, I have added special cases for bytes and bytearray (this is also restores pre-issue22896 behavior for int(bytes, base)). Tests are moved and added additional tests for memoryview slices. >>> int(memoryview(b'123')[1:3])
23
>>> int(memoryview(b'123\x00')[1:3])
23
>>> int(memoryview(b'123 ')[1:3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: b'23'
>>> int(memoryview(b'123A')[1:3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: b'23'
>>> int(memoryview(b'1234')[1:3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: b'23'
>>> float(memoryview(b'12.3')[1:4])
2.3
>>> float(memoryview(b'12.3\x00')[1:4])
2.3
>>> float(memoryview(b'12.3 ')[1:4])
2.3
>>> float(memoryview(b'12.3A')[1:4])
2.3
>>> float(memoryview(b'12.34')[1:4])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: <memory at 0xb6fee02c> There is similar dangerously looking code for complex. But it is never executed, because complex accepts only one non-numeric type: str. The patch removes this misleading dead code. |
The patch looks good to me. However I don’t understand what you meant about restoring int(bytes, base) behaviour. None of the patches here, nor in bpo-22896, touch long_new() in /Objects/longobject.c. |
Going through the commits for bpo-22896, I noticed compile(), eval() and exec() also suffer from a similar flaw. They check strlen(buffer) but the buffer may not be null-terminated: >>> eval(memoryview(b"1234")[1:3])
TypeError: source code string cannot contain null bytes |
This patch builds on Serhiy’s, and also fixes compile(), eval() and exec(). Also, I took the liberty of converting your assertRaises() calls to assertRaisesRegex() with context managers. I find this is a good general practice, to avoid hiding an accidental exception. |
Great! Besides few nitpicks the patch LGTM.
long_new() uses PyNumber_Long(). I was wrong, the contrary case is affected, when the base is not specified. |
New changeset a75336ac40e0 by Martin Panter in branch '3.4': New changeset 95b9c07b27f7 by Martin Panter in branch '3.5': New changeset 4df1eaecb506 by Martin Panter in branch 'default': |
I have committed changes to Python 3. The compile() test now exists as test_compile.TestSpecifics.test_null_terminated. Serhiy: You set these bugs for Python 2 as well. Is there a way to produce a non-null-terminated buffer in Python 2 to help verify this? Both John’s original array() tests and the memoryview() tests don’t seem to be supported by Python 2’s idea of buffer objects: >>> float(array.array("B",b"A"*0x10))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: float() argument must be a string or a number
>>> float(memoryview(b"1234")[1:-1])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: float() argument must be a string or a number I have a half-ported patch for Python 2, but it would be nice to know (a) if it is actually needed, and (b) what to do about unit tests. |
Yes, looks as this issue is not related to 2.7. |
One post-commit comment: I would use ValueError instead of Exceptions in tests for 3.5+. This will help to avoid regression and can catch incompatibilities in other implementations. |
New changeset a13d9656f954 by Martin Panter in branch '3.5': New changeset 96cdd2532034 by Martin Panter in branch 'default': |
Thanks for picking that up Serhiy |
New changeset 3ef7d1af5195 by Serhiy Storchaka in branch '2.7': |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: