Issue 20539: math.factorial may throw OverflowError

➜

This issue tracker has been migrated to GitHub, and is currently read-only.
For more information, see the GitHub FAQs in the Python's Developer Guide.

This issue has been migrated to GitHub: https://github.com/python/cpython/issues/64738

classification

Title:	math.factorial may throw OverflowError
Type:	behavior	Stage:
Components:	Extension Modules	Versions:	Python 3.5

process

Status:	closed	Resolution:	fixed
Dependencies:		Superseder:
Assigned To:	mark.dickinson	Nosy List:	garybernhardt, gdr@garethrees.org, josh.r, mark.dickinson, ncoghlan, pitrou, python-dev, skrah, vstinner
Priority:	low	Keywords:	patch

Created on 2014-02-07 10:54 by ncoghlan, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Files
File name	Uploaded	Description	Edit
huge_factorial_input.patch	mark.dickinson, 2014-02-07 17:08		review
huge_factorial_input_v2.patch	mark.dickinson, 2014-02-07 19:10		review
huge_factorial_input_v3.patch	mark.dickinson, 2014-04-09 19:59		review

Messages (29)
msg210448 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2014-02-07 10:54
I believe this is mostly a curiousity (since actually calculating a factorial this big would take an interminable amount of time), but math.factorial can be provoked into throwing OverflowError by a large enough input: >>> math.factorial(10**19) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long >>> math.factorial(1e19) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long
msg210449 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-02-07 11:00
What behaviour would you suggest instead? Apart from the time, the result would have over 10**20 digits, so would need exabytes of memory to represent. >>> from math import lgamma, log >>> lgamma(1e19) / log(10) 1.8565705518096748e+20
msg210450 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-02-07 11:03
> What behaviour would you suggest instead? Some others Python functions like str * int raises a MemoryError on overflow. But I don't know if OverflowError or MemoryError is better.
msg210451 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-02-07 11:13
> Some others Python functions like str * int raises a MemoryError on overflow. I have to say that that's always seemed wrong to me: IMO MemoryError should only be raised when we've really run out of memory, or perhaps where the amount of memory we're about to need exceeds all possible limits (e.g., more than 2**63 bytes on a 64-bit system).
msg210453 - (view)	Author: Stefan Krah (skrah) *	Date: 2014-02-07 11:40
OverflowError seems like a good choice if only values in the range of a C long are accepted. ValueError would perhaps be more intuitive, since the user normally only cares about the accepted range of input values rather than the internal details.
msg210454 - (view)	Author: Nick Coghlan (ncoghlan) *	Date: 2014-02-07 11:52
I figured it was some internal storage overflowing, and I don't have an issue with the error type. The bit that is confusing is the error message - what int doesn't fit into a C long? (especially in the second case of float input where no user level int is involved at all). It's mostly a theoretical problem though - as far as I am aware, the user that hit it was deliberately testing how far the limits of the "infinite precision" integers promised in the documentation actually extend.
msg210455 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-02-07 11:52
> The bit that is confusing is the error message - what int doesn't fit into a C long? Agreed: it would be nice to intercept this and give a better message.
msg210477 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2014-02-07 14:41
I also think OverflowError is better than MemoryError here.
msg210480 - (view)	Author: Gareth Rees (gdr@garethrees.org) *	Date: 2014-02-07 15:24
It's not a case of internal storage overflowing. The error is from Modules/mathmodule.c:1426 and it's the input 1019 that's too large to convert to a C long. You get the same kind of error in other places where PyLong_AsLong or PyLong_AsInt is called on a user-supplied value, for example: >>> import pickle >>> pickle.dumps(1019, 10**19) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long
msg210496 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-02-07 17:08
Thinking about it, `ValueError` seems like the right exception type: nothing's actually overflowing, because we haven't even tried to do any computation. This is just a limitation of what `math.factorial` is prepared to accept as input (and perhaps that limitation should be documented). Here's a patch. Bikeshedding about the exception type and the exception message is welcome.
msg210497 - (view)	Author: STINNER Victor (vstinner) *	Date: 2014-02-07 17:19
OverflowError makes sense because math.factorial(1019) will overflow in CPython on common platforms, even if it didn't overflowed yet. On a supercomputer with a different Python implementation, you may be able to compute it. IMO An OverflowError is specific to a platform and Python implementation, whereas ValueError is "portable": any Python implementation must raise such error. I can imagine that a Python implementation may return a pseudo-int type which is supposed to be the result of math.factorial(), so you can compute for example math.factorial(1019) % 2 (hint: result is 0).
msg210510 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-02-07 18:16
Okay; OverflowError seems to have the majority vote. I'll change it to that. Any strong opinions on the message?
msg210511 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2014-02-07 18:21
"Outrageously" sounds like the argument is immoral, but do we have an objective test for that? :-) (nitpicking a bit: negative values should probably raise a proper ValueError, no?)
msg210512 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-02-07 18:23
> nitpicking a bit: negative values should probably raise a proper ValueError, no? I think they do, with this patch. Or maybe I'm misunderstanding? With the current form of the patch: >>> math.factorial(1020) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: factorial() argument outrageously large >>> math.factorial(-1020) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: factorial() not defined for negative values
msg210513 - (view)	Author: Antoine Pitrou (pitrou) *	Date: 2014-02-07 18:25
Ah, well, my bad. I just wanted to have a good argument :-)
msg210528 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-02-07 19:10
Updated patch.
msg210555 - (view)	Author: Stefan Krah (skrah) *	Date: 2014-02-07 20:55
I slightly favor the ValueError patch because there is only a single exception to catch. PyLong_AsUnsignedLong() also raises OverflowError for both positive values that are too large and for negative values. Perhaps the error message could contain the actual range of allowed values, especially since LONG_MAX differs across platforms. The drawback is that "outrageously" would have to go then. :)
msg210571 - (view)	Author: Gary Bernhardt (garybernhardt)	Date: 2014-02-07 22:28
Although I thoroughly enjoyed "outrageously", I agree with Stefan about including the range of values. As a user who doesn't know the implementation, "outrageously" will just leave me asking why, but indicating the range will tell me exactly why the exception was raised.
msg215842 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-04-09 19:59
Updated patch.
msg215847 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2014-04-09 21:26
Mark, you said: > OverflowError seems to have the majority vote. I'll change it to that. But your most recent patch seems to have gone back to ValueError for both cases. Is there a reason for that? FWIW, I favor OverflowError as the "too large" type, though I prefer the less hyperbolic phrasing of the error message in the new patch.
msg215852 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-04-09 22:11
Josh: well spotted; thanks for picking up on this. I was wondering whether anyone would pick up on that, and I should have mentioned the change explicitly. I do think ValueError is the correct exception here. My interpretation of OverflowError is that it represents an overflow occurring during the computation, and that's not what's happening here: we're flat-out rejecting an input value because it's out of the range of values we accept. Stefan's message influenced me a bit in that direction, too.
msg215857 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2014-04-09 23:56
I just think it's a little odd to make math.factorial uniquely sensitive to the documented definition of OverflowError. Basically every one of the non-masking PyLong_As<CType> interfaces uses OverflowError for too large values. In fact, all the functions which are converting to unsigned C types and aren't doing so for "program logic reasons" (e.g. the shift functions don't accept a negative shift due to semantic obscurity, and long_to_bytes doesn't accept a negative length for the output) use OverflowError when a negative value is passed as well. The PyArg_ParseTuple* functions raise OverflowError as well, either implicitly (when they call a PyLong_As<CType> function), or explicitly for those functions that impose more restrictive ranges than the PyLong function they wrap. Long story short: The majority of C extension functions that convert to C level types are raising OverflowError to mean "out of range for implementation's chosen C type" (whether too large, too small, or negative when only unsigned values excepted) as a side-effect of using Python's documented APIs. Making math.factorial the exception would be violating the reasonable expectation one might get from using similar functions.
msg215861 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-04-10 01:06
> Making math.factorial the exception would be violating the reasonable expectation one might get from using similar functions. Can you point to some examples of those similar functions? I can't think of any obvious examples offhand.
msg215862 - (view)	Author: Josh Rosenberg (josh.r) *	Date: 2014-04-10 03:38
A few examples (some are patently ridiculous, since the range of values anyone would use ends long before you'd overflow a 32 bit integer, let alone a 64 bit value on my build of Python, but bear with me: >>> datetime.datetime(264, 1, 2) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long >>> datetime.datetime(-264, 1, 2) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long >>> time.mktime(time.struct_time([2*64]9)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long >>> sqlite3.enable_callback_tracebacks(2*64) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long (That last one should really be changed to type code 'p' over 'i', or to 'B' since it's just a boolean, so overflow doesn't matter, just truthy/falsy behavior) It also happens if you pass re functions/methods a too large flags value: >>> re.sub(r'(abc)', r'\1', 'abcd', re.IGNORECASE << 64) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/shadowranger/src/cpython/Lib/re.py", line 175, in sub return _compile(pattern, flags).sub(repl, string, count) OverflowError: Python int too large to convert to C ssize_t Skipping the tracebacks, a few more examples of functions where at least one argument can raise OverflowError for implementation specific reasons, rather than a logical overflow of some kind (which is what math.factorial's is): os.getpriority, os.setpriority, os.waitid, os.tcsetpgrp, a central utility function (get_data) in zipimport (I believe the value it's parsing is derived from a zip header, so it may not be possible to feed it too-large values; haven't checked), quite a number of socket functions (often for stuff that should really be ValueErrors, e.g. port numbers out of range), and more random things. I found all of these with a simple: find cpython/ -type f -name '.c' -exec grep -nP 'PyArg_Parse.?"\w?[bhilL]"' {} + > exampleoverflow.txt There aren't any other good examples in math, largely because the other functions there deal with floats (or have arbitrary precision integer fallback paths, in the case of the log suite of functions). That find only scratches the surface; many PyArg_Parse* calls are split across lines (so my simple regex won't catch them), and old Python code has a habit of not using PyArg_Parse* even when it makes sense (presumably because they wanted to customize error messages, or didn't like the way the provided formatting codes handled edge cases). In reality, any place PyLong_As* is called (when * is not one of the masking functions) on an argument that came from the user without explicitly checking for an replacing OverflowError will potentially trigger this issue. A cursory search of locations where this function is called reveals OverflowErrors in the r parameter to to itertools.permutations, and that decimal is riddled with cases where they return if PyLong_As* has an error (including OverflowError) without changing the exception type, then a second round of range checking will set ValueError if it didn't Overflow. Examples include Context object's prec and clamp properties, but there are a dozen or more functions doing this, and I don't know if all of them are publically accessible. Fewer of the calls will be publically visible, so there's more to look through, but you can run the same search to find tons of places with potentially similar behavior: find cpython/ -type f -name '.c' -exec grep -nP 'Py(Long\|Number)_As(?!.(?:Mask\|NULL\|PyExc_(?!Overflow)))' {} + > exampleoverflowdirectcall.txt I suspect that for every case where Python standard libs behave this way (raising OverflowErrors in ways that disregard the official docs description of when it should be used), there are a dozen where a third party module behaves this way, since third party modules are more likely to use the standardized argument parsing and numeric parsing APIs without rejiggering the default exceptions, assuming that the common APIs raise the "correct" errors.
msg215869 - (view)	Author: Stefan Krah (skrah) *	Date: 2014-04-10 10:58
OverflowError during initialization dates back to the early days of Python: http://hg.python.org/cpython/rev/0437738279a8 Integer arithmetic actually did raise OverflowError back then: >>> 2222222222222 * 22222222222222 Traceback (most recent call last): File "<stdin>", line 1, in ? OverflowError: integer multiplication >>> This of course no longer happens, so the whole idea of OverflowError seems slightly outdated.
msg215873 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-04-10 12:32
Looks like this harmless-looking issue has opened up a can of worms. > the whole idea of OverflowError seems slightly outdated. Well, not entirely. It's still got a place for overflow of mathematical operations, and I think it's clearly the correct exception in that case (indeed, it's about the only case I can think of where OverflowError is clearly the correct exception). >>> from math import exp >>> exp(5000.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: math range error I'd treat conversion to float as a special case of this; that is, the following is another case where OverflowError is (IMO) the right thing to use: >>> float(101000) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: long int too large to convert to float A less clear-cut case: >>> sqrt(10500) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: long int too large to convert to float This one still seems okay to me, even though the sqrt result is within the floating-point range. You can read this as two operations: a conversion from the integer input to a float, followed by a floating-point square root operation, and it's the intermediate result that overflows. And a case that's clearly wrong: >>> 53 >> 10*100 # Why not simply return 0? Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C ssize_t Another historical oddity is that OverflowError inherits from ArithmeticError (along with FloatingPointError and ZeroDivisionError), which supports its use for arithmetic and mathematical operations that overflow their target type. >>> OverflowError.__mro__ (<class 'OverflowError'>, <class 'ArithmeticError'>, <class 'Exception'>, <class 'BaseException'>, <class 'object'>) In the interests of moving this particular* issue forward, I'll revert to OverflowError for this patch. I still personally think it's the wrong exception, but: (1) It's not more wrong than before: we were already raising OverflowError for large positive values. (2) It achieves the main aim of replacing an obscure error message with a more understandable one (which is what Nick wanted in the first place: msg210454). (3) Given the lack of clarity about which exception types are appropriate where, I think we shouldn't be changing exception types unless/until there's a clear vision of where we want to end up. Getting that clear vision may require a python-dev discussion. We can still make the change from OverflowError to ValueError at some later point once it's clear that that's the right thing to do. But converting from OverflowError to ValueError now, and then deciding later that that was wrong and converting back to OverflowError would be a horrible thing to do.
msg215877 - (view)	Author: Stefan Krah (skrah) *	Date: 2014-04-10 13:08
Mark Dickinson <report@bugs.python.org> wrote: > > the whole idea of OverflowError seems slightly outdated. > > Well, not entirely. It's still got a place for overflow of mathematical operations, and I think it's clearly the correct exception in that case (indeed, it's about the only case I can think of where OverflowError is clearly the correct exception). Indeed, I was focusing on integer arithmetic and assignments. For float operations and conversions OverflowError is quite natural. > (3) Given the lack of clarity about which exception types are appropriate where, I think we shouldn't be changing exception types unless/until there's a clear vision of where we want to end up. Getting that clear vision may require a python-dev discussion. Using OverflowError for now sounds like a good plan.
msg215879 - (view)	Author: Roundup Robot (python-dev)	Date: 2014-04-10 13:29
New changeset 273e17260d25 by Mark Dickinson in branch 'default': Issue #20539: Improve math.factorial error messages and types for large inputs. http://hg.python.org/cpython/rev/273e17260d25
msg215881 - (view)	Author: Mark Dickinson (mark.dickinson) *	Date: 2014-04-10 13:35
Patch applied to the default branch (but with OverflowError instead of ValueError for large positive inputs). I don't see a lot of benefit in applying the fix to the maintenance branches. Closing this issue.

History
Date	User	Action	Args
2022-04-11 14:57:58	admin	set	github: 64738
2014-04-10 13:35:30	mark.dickinson	set	status: open -> closed resolution: fixed messages: + msg215881 versions: + Python 3.5, - Python 3.4
2014-04-10 13:29:52	python-dev	set	nosy: + python-dev messages: + msg215879
2014-04-10 13:08:56	skrah	set	messages: + msg215877
2014-04-10 12:32:05	mark.dickinson	set	messages: + msg215873
2014-04-10 10:58:39	skrah	set	messages: + msg215869
2014-04-10 03:38:49	josh.r	set	messages: + msg215862
2014-04-10 01:06:35	mark.dickinson	set	messages: + msg215861
2014-04-09 23:56:18	josh.r	set	messages: + msg215857
2014-04-09 22:11:06	mark.dickinson	set	messages: + msg215852
2014-04-09 21:26:13	josh.r	set	nosy: + josh.r messages: + msg215847
2014-04-09 20:08:35	mark.dickinson	set	assignee: mark.dickinson
2014-04-09 20:00:00	mark.dickinson	set	files: + huge_factorial_input_v3.patch messages: + msg215842
2014-02-07 22:28:16	garybernhardt	set	nosy: + garybernhardt messages: + msg210571
2014-02-07 20:55:08	skrah	set	messages: + msg210555
2014-02-07 19:10:08	mark.dickinson	set	files: + huge_factorial_input_v2.patch messages: + msg210528
2014-02-07 18:25:05	pitrou	set	messages: + msg210513
2014-02-07 18:23:30	mark.dickinson	set	messages: + msg210512
2014-02-07 18:21:45	pitrou	set	messages: + msg210511
2014-02-07 18:16:41	mark.dickinson	set	messages: + msg210510
2014-02-07 17:19:39	vstinner	set	messages: + msg210497
2014-02-07 17:16:39	mark.dickinson	set	type: behavior components: + Extension Modules versions: + Python 3.4
2014-02-07 17:08:27	mark.dickinson	set	files: + huge_factorial_input.patch keywords: + patch messages: + msg210496
2014-02-07 15:24:31	gdr@garethrees.org	set	nosy: + gdr@garethrees.org messages: + msg210480
2014-02-07 14:41:28	pitrou	set	nosy: + pitrou messages: + msg210477
2014-02-07 11:52:43	mark.dickinson	set	messages: + msg210455
2014-02-07 11:52:01	ncoghlan	set	priority: normal -> low messages: + msg210454
2014-02-07 11:40:26	skrah	set	nosy: + skrah messages: + msg210453
2014-02-07 11:13:46	mark.dickinson	set	messages: + msg210451
2014-02-07 11:03:44	vstinner	set	priority: low -> normal nosy: + vstinner messages: + msg210450
2014-02-07 11:00:11	mark.dickinson	set	nosy: + mark.dickinson messages: + msg210449
2014-02-07 10:56:03	ncoghlan	set	priority: normal -> low
2014-02-07 10:54:29	ncoghlan	create