New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not assume signed integer overflow behavior #45962
Comments
The resolution to http://bugs.python.org/issue1608 looks like it'll add We should fix all dependencies on integer overflow behavior, starting by |
My gcc 4.1 doesn't have the -Wstrict-overflow option. gcc-Version 4.1.3 2007092 (prerelease) (Ubuntu 4.1.2-16ubuntu2) |
Should we use -ansi (C90 aka C89 standard) option, too? Python core _bsddb _codecs_iso2022 _ctypes |
Socket and SSL are using bluetooth.h which defines some functionas as The Python core can be compiled with -ansi but the extension modules |
Using ansi is out of scope of this issue, and should not be mixed with I think there is disagreement on whether Python should stop relying on |
Whatever you change regarding the compiler options for Python, please Otherwise, you're likely going to break lots and lots of extensions. Thanks. |
I compiled Python using gcc 4.3.0 with the -Wstrict-overflow, and that's Objects/doubledigits.c: In function ‘_PyFloat_Digits’: I am sure yet how to interpret it, though. It says that the overflow I will try to investigate this further, when I will have a bit more time |
heh if thats the only warning gcc -Wstrict-overflow gives then I've Anyone know if the commercial analysis tools we've run the code base |
Alexandre, which Python version did you compile with -Wstrict-overflow? I will contact Coverity to ask if they check for this kind of thing. MvL: I don't want 2s complement throughout the language, I just want the FWIW, I've heard that some commercial compilers (e.g. XLC) assume that |
Marc-Andre: what do you mean by breaking lots and lots of extensions? |
-Wstrict-overflow=3 with gcc 4.3 trunk here shows : Modules/cPickle.c: In function 'Unpickler_noload': Modules/cPickle.c:194: warning: assuming signed overflow does not occur But also note that -fno-strict-aliasing is also just another workaround |
Sure - however, that is fixed in Python 3 (and unrelated to this issue) |
Hm. I don't get any warning, related to the overflow issue, neither with |
Make sure you use gcc 4.3 trunk and at least -O2 is enabled. I tested |
FWIW gcc hacker Ian Lance Taylor has a nice article about signed Regards, |
Please be specific. I read it, and I don't think it's better to use |
Ian says -fno-strict-overflow still allows some optimizations, and his Regards, |
I think the -Wstrict-overflow option may not be enough for the audit we The overflow issue in expandtabs() still exists (in 2.5 as well as in if (*p == '\n' || *p == '\r') {
i += j;
old_j = j = 0;
if (i < 0) {
PyErr_SetString(PyExc_OverflowError,
"new string is too long");
return NULL;
}
} Here i and j are signed ints (Py_ssize_t) initially know to be >= 0; i If we ever hope to clean up the code to the point where -fwrapv is no |
FWIW I reported this to GCC bugzilla as a missing diagnostic @ |
Problem was that -Wall at the end was resetting -Wstrict-overflow, so Parser/acceler.c: In function 'fixstate': Parser/node.c: In function 'PyNode_AddChild': Parser/firstsets.c: In function 'calcfirstset': Parser/pgen.c: In function 'compile_item': Parser/tokenizer.c: In function 'new_string': Objects/abstract.c: In function 'PyObject_CallMethodObjArgs': Objects/intobject.c: In function 'PyInt_FromUnicode': Objects/listobject.c: In function 'merge_at': Objects/longobject.c: In function 'PyLong_FromUnicode': Objects/stringobject.c: In function 'string_expandtabs': Objects/unicodeobject.c: In function 'unicode_expandtabs': Python/ast.c: In function 'ast_for_genexp': Python/bltinmodule.c: In function 'builtin_map': Parser/tokenizer.c:1163: warning: assuming signed overflow does not Python/getargs.c:994: warning: assuming signed overflow does not occur Python/import.c: In function 'PyImport_ExtendInittab': Python/modsupport.c: In function 'va_build_value': Python/sysmodule.c: In function 'sys_getframe': Modules/gcmodule.c: In function 'collect': ./Modules/_sre.c: In function 'sre_match': /packages/python-2.5/Modules/stropmodule.c: In function 'strop_replace': /packages/python-2.5/Modules/_heapqmodule.c: In function 'heappop': /packages/python-2.5/Modules/_hotshot.c: In function 'pack_line_times': /packages/python-2.5/Modules/binascii.c: In function 'binascii_a2b_base64': /packages/python-2.5/Modules/parsermodule.c: In function /packages/python-2.5/Modules/cPickle.c: In function 'Unpickler_noload': /packages/python-2.5/Modules/audioop.c: In function 'audioop_ratecv': /packages/python-2.5/Modules/imageop.c: In function 'imageop_dither2grey2': /packages/python-2.5/Modules/_csv.c: In function 'join_append_data': /packages/python-2.5/Modules/expat/xmlparse.c: In function 'getAttributeId': /packages/python-2.5/Modules/dlmodule.c: In function 'dl_call': |
Thanks! Good catch about -Wall. I think I am now able to reproduce these |
I still don't get additional error messages on the trunk. I've altered gcc -pthread -c -fno-strict-aliasing -g -Wall -Wstrict-prototypes Either all problems are already solved or I'm doing something wrong here. $ LC_ALL=C gcc-4.2 -v
Using built-in specs.
Target: i486-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2
--enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr
--enable-targets=all --enable-checking=release --build=i486-linux-gnu
--host=i486-linux-gnu --target=i486-linux-gnu
Thread model: posix
gcc version 4.2.1 (Ubuntu 4.2.1-5ubuntu4) |
-Wstrict-overflow=5 is not valid afaik its 1-3, 3 for most verbose also About the -Wall thing it seems to be a gcc bug, but for now workaround Regards, |
Btw I think we need an unsigned version of Py_ssize_t to fix this |
I don't think we can make Py_ssize_t unsigned. On several occasions |
No I mean we need a new unsigned variant. Else we will have to cast it |
The proper thing to do here is to add -Werror=strict-overflow to the CFLAGS (before -Wall -- we should fix the position of -Wall!); This should be done in favor of -fwrapv, but only if strict-overflow is Can someone come up with a patch? |
An unsigned variant of Py_ssize_t would just be size_t -- that's a much |
Xiang: I don’t think we need to make the tests do anything special. Just make sure they exercise the code that handles overflows. I have been running the test suite without any -j0 option, and I can look over the output and see the error messages. Or if we get to a stage where all the errors are eliminated, you could run with UBSAN_OPTIONS=halt_on_error=1. E.g. in this patch, I add two simple tests to cover parts of the code that weren’t covered before (and if I hadn’t fixed the overflows, the tests would trigger extra UBSAN errors). ctypes_v2.patch is an update of array-size.patch. I improved the test case, and added a new fix for overflows like the following: >>> class S(ctypes.Structure):
... _fields_ = (("field", ctypes.c_longlong, 64),)
...
>>> s = S()
>>> s.field = 3
Modules/_ctypes/cfield.c:900:9: runtime error: signed integer overflow: -9223372036854775808 - 1 cannot be represented in type 'long long int' |
unicode.patch avoids an overflow in PyUnicode_Join(): >>> size = int(sys.maxsize**0.5) + 1
>>> "".join(("A" * size,) * size)
Objects/unicodeobject.c:9927:12: runtime error: signed integer overflow: 46341 + 2147441940 cannot be represented in type 'int'
OverflowError: join() result is too long for a Python string |
It turns out you just want to see the output. That is easy. Patch v3 adding the test. |
New changeset db93af6080e7 by Martin Panter in branch 'default': |
Martin, I upload a patch to fix another possible overflow in listextend. |
overflow_fix_in_listextend.patch: I doubt Python supports the kinds of platform where this overflow would be possible. It may require pointers smaller than 32 bits, or char objects larger than 8 bits. Perhaps we could just add a comment explaining we assume the overflow cannot happen. It seems list objects will hold one pointer for each element, but the overflow involves the number of elements. Python defines PY_SSIZE_T_MAX as PY_SIZE_MAX // 2. For the overflow to happen we would need m + n > PY_SSIZE_T_MAX Assuming a “flat” address space that can allocate up to PY_SIZE_MAX bytes _in total_, the total number of elements cannot exceed m + n == PY_SIZE_MAX // sizeof (PyObject *) So in this scenario, the overflow cannot happen unless sizeof (PyObject *) == 1. Considering things like the 16-bit segmented Intel “large” memory model (which I doubt Python is compatible with), each list could _independently_ be up to PY_SIZE_MAX bytes. Therefore the total number of elements may reach m + n == PY_SIZE_MAX // sizeof (PyObject *) * 2 So even in this case, sizeof (PyObject *) == 4 (large model) is fine, but anything less (e.g. 16-bit char, or 1-byte segment + 2-byte offset) might overflow. |
Hmm, I don't tend to infer platform characteristics. IMHO, it's a simple problem: sum up two lists' length which may overflow in logic. With your argument, does it means it seems somewhat meaningless to have a List a Py_ssize_t length since it can never reach it? Checks against PY_SSIZE_T_MAX have already existed (for example, in ins1). |
The check in ins1() was originally added in revision b9002da46f69. I presume it references the Python-dev thread “can this overflow (list insertion)?” <20000812145155.A7629@ActiveState.com>, <https://marc.info/?l=python-dev&m=107666472818169\>. At that time, ob_size was an int, so overflow checking was definitely needed. Later, revision 7fdc639bc5b4 changed ob_size to Py_ssize_t, and then revision 606818c33e50 updated the overflow check from INT_MAX to PY_SSIZE_T_MAX. BTW I made a small mistake in my previous message. The worst case would be extending a list with itself. But I think the conclusion is still the same. |
So these checks are superfluous? Do we need to remove them? Hmm, I still doubt such checks should consider platform characteristics first. In theory List can be PY_SSIZE_T_MAX length. Do we have to put the PY_SIZE_MAX // sizeof(PyObject *) limit on it? |
I don't believe Python would really ever work on a platform with non-8-bit-bytes, I believe there are way too much assumptions *everywhere*. You can program in C on such a platform, yes, but not that sure about Python. And on 8-bit-byte platfomrs, there is no large model with 16-bit pointers anywhere. There just are not enough bits that you could have multiple 64k byte-addressable segments that are addressed with 16-bit pointers. It might be that some obscure platform in the past would have had 128k memory, with large pointers, 2 allocatable 64k segments, >16 bit char pointer and 16-bit object pointers pointing to even bytes, but I doubt you'd be really porting *Python 3* to such a platform, basically we're talking about something like Commodore 128 here. |
Looking over r60793, the overflow check at Modules/cjkcodecs/multibytecodec.c:836 looks vulnerable to being optimized away, because it can only detect the overflow if the line above has already overflowed. Perhaps change PY_SSIZE_T_MAX to MAXDECPENDING. I wonder if any of the GCC optimization and warning modes can detect this case? Also, Python/ast.c:3988 checks using PY_SIZE_MAX, but then passes the value to PyBytes_FromStringAndSize(), which expects ssize_t and in the best case would raise SystemError. |
I agree. For multibytecode, how about switching the positions of the two checks? If npendings + ctx->pendingsize overflows, the result can be anything, larger, smaller than or equal to MAXDECPENDING. For ast, although a SystemError may be raised but the message seems not obvious to the reason. |
Xiang: regarding your overflow_fix_in_listextend.patch, what do you think about adding a comment or debugging assertion instead, something like: /* It should not be possible to allocate a list large enough to cause an overflow on any relevant platform */ |
It's good Martin. Just commit it. |
New changeset dd2c7d497878 by Martin Panter in branch '3.5': New changeset eb6eafafdb44 by Martin Panter in branch 'default': |
Martin, do you mind to create a PR for your ctypes_v2.patch? |
Sorry I haven’t made a PR for ctypes_v2.patch, but I don’t mind if someone else takes over. I understand the HAVE_LONG_LONG check may no longer necessary for newer Python versions. |
Previously, there was a explicitly check for error raising PyErr_NoMemory() on overflow. Now you rely on PyMem_Malloc() to detect the overflow. I'm not sure that it's a good idea. |
On Tue, Sep 11, 2018 at 8:26 PM, STINNER Victor <report@bugs.python.org> wrote:
+1. It will probably work as expected on Solaris and other OSes that Jeff |
Now overflow is not possible or am I missing something? |
I asked if there is an issue. In fact, all Python memory allocators start by checking if the size is larger than PY_SSIZE_T_MAX. Example: void *
PyMem_RawMalloc(size_t size)
{
/*
* Limit ourselves to PY_SSIZE_T_MAX bytes to prevent security holes.
* Most python internals blindly use a signed Py_ssize_t to track
* things without checking for overflows or negatives.
* As size_t is unsigned, checking for size < 0 is not required.
*/
if (size > (size_t)PY_SSIZE_T_MAX)
return NULL;
return _PyMem_Raw.malloc(_PyMem_Raw.ctx, size);
} |
Benjamin: what do you think of adding an explicit check after the "new_size <<= 1;" loop? if (new_size > (size_t)PY_SSIZE_T_MAX) {
PyErr_NoMemory();
return -1;
} Technically, PyMem_Malloc() already implements the check, so it's not really needed. So I'm not sure if it's needed :-) |
Thank you very much to the task force who worked on this issues which can be seen as boring and useless, but are very important nowadays with C compilers which are more and more agressive to optimize everything (I'm looking at you clang!). This bug is open for 11 years and dozens and dozens of undefined behaviours have been fixed in the meanwhile. This bug is a giant beast with many patches and many pull requests. I dislike such bug, it's very hard to follow them. I suggest to open new bugs for undefined behaviour on specific functions, rather than a very vague "let's open a single bug to track everything". It's now time to close the issue. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: