Author josh.r
Recipients garybernhardt, gdr@garethrees.org, josh.r, mark.dickinson, ncoghlan, pitrou, skrah, vstinner
Date 2014-04-10.03:38:46
SpamBayes Score -1.0
Marked as misclassified Yes
Message-id <1397101129.67.0.540930114212.issue20539@psf.upfronthosting.co.za>
In-reply-to
Content
A few examples (some are patently ridiculous, since the range of values anyone would use ends long before you'd overflow a 32 bit integer, let alone a 64 bit value on my build of Python, but bear with me:

>>> datetime.datetime(2**64, 1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
>>> datetime.datetime(-2**64, 1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> time.mktime(time.struct_time([2**64]*9))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> sqlite3.enable_callback_tracebacks(2**64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

(That last one should really be changed to type code 'p' over 'i', or to 'B' since it's just a boolean, so overflow doesn't matter, just truthy/falsy behavior)

It also happens if you pass re functions/methods a too large flags value:
>>> re.sub(r'(abc)', r'\1', 'abcd', re.IGNORECASE << 64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shadowranger/src/cpython/Lib/re.py", line 175, in sub
    return _compile(pattern, flags).sub(repl, string, count)
OverflowError: Python int too large to convert to C ssize_t

Skipping the tracebacks, a few more examples of functions where at least one argument can raise OverflowError for implementation specific reasons, rather than a logical overflow of some kind (which is what math.factorial's is):

os.getpriority, os.setpriority, os.waitid, os.tcsetpgrp, a central utility function (get_data) in zipimport (I believe the value it's parsing is derived from a zip header, so it may not be possible to feed it too-large values; haven't checked), quite a number of socket functions (often for stuff that should really be ValueErrors, e.g. port numbers out of range), and more random things. I found all of these with a simple:

find cpython/ -type f -name '*.c' -exec grep -nP 'PyArg_Parse.*?"\w*?[bhilL]"' {} + > exampleoverflow.txt

There aren't any other good examples in math, largely because the other functions there deal with floats (or have arbitrary precision integer fallback paths, in the case of the log suite of functions).

That find only scratches the surface; many PyArg_Parse* calls are split across lines (so my simple regex won't catch them), and old Python code has a habit of not using PyArg_Parse* even when it makes sense (presumably because they wanted to customize error messages, or didn't like the way the provided formatting codes handled edge cases).

In reality, any place PyLong_As* is called (when * is not one of the masking functions) on an argument that came from the user without explicitly checking for an replacing OverflowError will potentially trigger this issue. A cursory search of locations where this function is called reveals OverflowErrors in the r parameter to to itertools.permutations, and that decimal is riddled with cases where they return if PyLong_As* has an error (including OverflowError) without changing the exception type, then a second round of range checking will set ValueError if it didn't Overflow. Examples include Context object's prec and clamp properties, but there are a dozen or more functions doing this, and I don't know if all of them are publically accessible.

Fewer of the calls will be publically visible, so there's more to look through, but you can run the same search to find tons of places with potentially similar behavior:

find cpython/ -type f -name '*.c' -exec grep -nP 'Py(Long|Number)_As(?!.*(?:Mask|NULL|PyExc_(?!Overflow)))' {} + > exampleoverflowdirectcall.txt

I suspect that for every case where Python standard libs behave this way (raising OverflowErrors in ways that disregard the official docs description of when it should be used), there are a dozen where a third party module behaves this way, since third party modules are more likely to use the standardized argument parsing and numeric parsing APIs without rejiggering the default exceptions, assuming that the common APIs raise the "correct" errors.
History
Date User Action Args
2014-04-10 03:38:50josh.rsetrecipients: + josh.r, mark.dickinson, ncoghlan, pitrou, vstinner, skrah, gdr@garethrees.org, garybernhardt
2014-04-10 03:38:49josh.rsetmessageid: <1397101129.67.0.540930114212.issue20539@psf.upfronthosting.co.za>
2014-04-10 03:38:49josh.rlinkissue20539 messages
2014-04-10 03:38:46josh.rcreate