classification
Title: math.factorial may throw OverflowError
Type: behavior Stage:
Components: Extension Modules Versions: Python 3.5
process
Status: closed Resolution: fixed
Dependencies: Superseder:
Assigned To: mark.dickinson Nosy List: Gareth.Rees, garybernhardt, haypo, josh.r, mark.dickinson, ncoghlan, pitrou, python-dev, skrah
Priority: low Keywords: patch

Created on 2014-02-07 10:54 by ncoghlan, last changed 2014-04-10 13:35 by mark.dickinson. This issue is now closed.

Files
File name Uploaded Description Edit
huge_factorial_input.patch mark.dickinson, 2014-02-07 17:08 review
huge_factorial_input_v2.patch mark.dickinson, 2014-02-07 19:10 review
huge_factorial_input_v3.patch mark.dickinson, 2014-04-09 19:59 review
Messages (29)
msg210448 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-07 10:54
I believe this is mostly a curiousity (since actually calculating a factorial this big would take an interminable amount of time), but math.factorial can be provoked into throwing OverflowError by a large enough input:

>>> math.factorial(10**19)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> math.factorial(1e19)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
msg210449 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-02-07 11:00
What behaviour would you suggest instead?

Apart from the time, the result would have over 10**20 digits, so would need exabytes of memory to represent.

>>> from math import lgamma, log
>>> lgamma(1e19) / log(10)
1.8565705518096748e+20
msg210450 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-07 11:03
> What behaviour would you suggest instead?

Some others Python functions like str * int raises a MemoryError on overflow.

But I don't know if OverflowError or MemoryError is better.
msg210451 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-02-07 11:13
> Some others Python functions like str * int raises a MemoryError on overflow.

I have to say that that's always seemed wrong to me: IMO MemoryError should only be raised when we've really run out of memory, or perhaps where the amount of memory we're about to need exceeds all possible limits (e.g., more than 2**63 bytes on a 64-bit system).
msg210453 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-02-07 11:40
OverflowError seems like a good choice if only values in the range
of a C long are accepted.  ValueError would perhaps be more intuitive,
since the user normally only cares about the accepted range of
input values rather than the internal details.
msg210454 - (view) Author: Nick Coghlan (ncoghlan) * (Python committer) Date: 2014-02-07 11:52
I figured it was some internal storage overflowing, and I don't have an issue with the error type. The bit that is confusing is the error *message* - what int doesn't fit into a C long? (especially in the second case of float input where no user level int is involved at all).

It's mostly a theoretical problem though - as far as I am aware, the user that hit it was deliberately testing how far the limits of the "infinite precision" integers promised in the documentation actually extend.
msg210455 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-02-07 11:52
> The bit that is confusing is the error *message* - what int doesn't fit into a C long?

Agreed: it would be nice to intercept this and give a better message.
msg210477 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-02-07 14:41
I also think OverflowError is better than MemoryError here.
msg210480 - (view) Author: Gareth Rees (Gareth.Rees) * Date: 2014-02-07 15:24
It's not a case of internal storage overflowing. The error is from
Modules/mathmodule.c:1426 and it's the input 10**19 that's too large
to convert to a C long. You get the same kind of error in other
places where PyLong_AsLong or PyLong_AsInt is called on a
user-supplied value, for example:

    >>> import pickle
    >>> pickle.dumps(10**19, 10**19)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    OverflowError: Python int too large to convert to C long
msg210496 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-02-07 17:08
Thinking about it, `ValueError` seems like the right exception type: nothing's actually overflowing, because we haven't even tried to do any computation.  This is just a limitation of what `math.factorial` is prepared to accept as input (and perhaps that limitation should be documented).

Here's a patch.  Bikeshedding about the exception type and the exception message is welcome.
msg210497 - (view) Author: STINNER Victor (haypo) * (Python committer) Date: 2014-02-07 17:19
OverflowError makes sense because math.factorial(10**19) will overflow in CPython on common platforms, even if it didn't overflowed yet.

On a supercomputer with a different Python implementation, you may be able to compute it.

IMO An OverflowError is specific to a platform and Python implementation, whereas ValueError is "portable": any Python implementation must raise such error.

I can imagine that a Python implementation may return a pseudo-int type which is supposed to be the result of math.factorial(), so you can compute for example math.factorial(10**19) % 2 (hint: result is 0).
msg210510 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-02-07 18:16
Okay; OverflowError seems to have the majority vote. I'll change it to that.

Any strong opinions on the message?
msg210511 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-02-07 18:21
"Outrageously" sounds like the argument is immoral, but do we have an objective test for that? :-)

(nitpicking a bit: negative values should probably raise a proper ValueError, no?)
msg210512 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-02-07 18:23
> nitpicking a bit: negative values should probably raise a proper ValueError, no?

I think they do, with this patch.  Or maybe I'm misunderstanding?

With the current form of the patch:

>>> math.factorial(10**20)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: factorial() argument outrageously large
>>> math.factorial(-10**20)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: factorial() not defined for negative values
msg210513 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2014-02-07 18:25
Ah, well, my bad. I just wanted to have a good argument :-)
msg210528 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-02-07 19:10
Updated patch.
msg210555 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-02-07 20:55
I slightly favor the ValueError patch because there is only a single exception
to catch. PyLong_AsUnsignedLong() also raises OverflowError for both positive
values that are too large and for negative values.

Perhaps the error message could contain the actual range of allowed values,
especially since LONG_MAX differs across platforms.

The drawback is that "outrageously" would have to go then. :)
msg210571 - (view) Author: Gary Bernhardt (garybernhardt) Date: 2014-02-07 22:28
Although I thoroughly enjoyed "outrageously", I agree with Stefan about including the range of values. As a user who doesn't know the implementation, "outrageously" will just leave me asking why, but indicating the range will tell me exactly why the exception was raised.
msg215842 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-04-09 19:59
Updated patch.
msg215847 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-04-09 21:26
Mark, you said:

> OverflowError seems to have the majority vote. I'll change it to that.

But your most recent patch seems to have gone back to ValueError for both cases. Is there a reason for that? FWIW, I favor OverflowError as the "too large" type, though I prefer the less hyperbolic phrasing of the error message in the new patch.
msg215852 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-04-09 22:11
Josh: well spotted; thanks for picking up on this.  I was wondering whether anyone would pick up on that, and I should have mentioned the change explicitly.

I do think ValueError is the correct exception here.  My interpretation of OverflowError is that it represents an overflow occurring during the computation, and that's not what's happening here: we're flat-out rejecting an input value because it's out of the range of values we accept.  Stefan's message influenced me a bit in that direction, too.
msg215857 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-04-09 23:56
I just think it's a little odd to make math.factorial uniquely sensitive to the documented definition of OverflowError. Basically every one of the non-masking PyLong_As<CType> interfaces uses OverflowError for too large values.

In fact, all the functions which are converting to unsigned C types and aren't doing so for "program logic reasons" (e.g. the shift functions don't accept a negative shift due to semantic obscurity, and long_to_bytes doesn't accept a negative length for the output) use OverflowError when a negative value is passed as well.

The PyArg_ParseTuple* functions raise OverflowError as well, either implicitly (when they call a PyLong_As<CType> function), or explicitly for those functions that impose more restrictive ranges than the PyLong function they wrap.

Long story short: The majority of C extension functions that convert to C level types are raising OverflowError to mean "out of range for implementation's chosen C type" (whether too large, too small, or negative when only unsigned values excepted) as a side-effect of using Python's documented APIs. Making math.factorial the exception would be violating the reasonable expectation one might get from using similar functions.
msg215861 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-04-10 01:06
> Making math.factorial the exception would be violating the reasonable expectation one might get from using similar functions.

Can you point to some examples of those similar functions?  I can't think of any obvious examples offhand.
msg215862 - (view) Author: Josh Rosenberg (josh.r) * Date: 2014-04-10 03:38
A few examples (some are patently ridiculous, since the range of values anyone would use ends long before you'd overflow a 32 bit integer, let alone a 64 bit value on my build of Python, but bear with me:

>>> datetime.datetime(2**64, 1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long
>>> datetime.datetime(-2**64, 1, 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> time.mktime(time.struct_time([2**64]*9))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

>>> sqlite3.enable_callback_tracebacks(2**64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C long

(That last one should really be changed to type code 'p' over 'i', or to 'B' since it's just a boolean, so overflow doesn't matter, just truthy/falsy behavior)

It also happens if you pass re functions/methods a too large flags value:
>>> re.sub(r'(abc)', r'\1', 'abcd', re.IGNORECASE << 64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shadowranger/src/cpython/Lib/re.py", line 175, in sub
    return _compile(pattern, flags).sub(repl, string, count)
OverflowError: Python int too large to convert to C ssize_t

Skipping the tracebacks, a few more examples of functions where at least one argument can raise OverflowError for implementation specific reasons, rather than a logical overflow of some kind (which is what math.factorial's is):

os.getpriority, os.setpriority, os.waitid, os.tcsetpgrp, a central utility function (get_data) in zipimport (I believe the value it's parsing is derived from a zip header, so it may not be possible to feed it too-large values; haven't checked), quite a number of socket functions (often for stuff that should really be ValueErrors, e.g. port numbers out of range), and more random things. I found all of these with a simple:

find cpython/ -type f -name '*.c' -exec grep -nP 'PyArg_Parse.*?"\w*?[bhilL]"' {} + > exampleoverflow.txt

There aren't any other good examples in math, largely because the other functions there deal with floats (or have arbitrary precision integer fallback paths, in the case of the log suite of functions).

That find only scratches the surface; many PyArg_Parse* calls are split across lines (so my simple regex won't catch them), and old Python code has a habit of not using PyArg_Parse* even when it makes sense (presumably because they wanted to customize error messages, or didn't like the way the provided formatting codes handled edge cases).

In reality, any place PyLong_As* is called (when * is not one of the masking functions) on an argument that came from the user without explicitly checking for an replacing OverflowError will potentially trigger this issue. A cursory search of locations where this function is called reveals OverflowErrors in the r parameter to to itertools.permutations, and that decimal is riddled with cases where they return if PyLong_As* has an error (including OverflowError) without changing the exception type, then a second round of range checking will set ValueError if it didn't Overflow. Examples include Context object's prec and clamp properties, but there are a dozen or more functions doing this, and I don't know if all of them are publically accessible.

Fewer of the calls will be publically visible, so there's more to look through, but you can run the same search to find tons of places with potentially similar behavior:

find cpython/ -type f -name '*.c' -exec grep -nP 'Py(Long|Number)_As(?!.*(?:Mask|NULL|PyExc_(?!Overflow)))' {} + > exampleoverflowdirectcall.txt

I suspect that for every case where Python standard libs behave this way (raising OverflowErrors in ways that disregard the official docs description of when it should be used), there are a dozen where a third party module behaves this way, since third party modules are more likely to use the standardized argument parsing and numeric parsing APIs without rejiggering the default exceptions, assuming that the common APIs raise the "correct" errors.
msg215869 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-10 10:58
OverflowError during initialization dates back to the early days of Python:

  http://hg.python.org/cpython/rev/0437738279a8

Integer arithmetic actually did raise OverflowError back then:

>>> 2222222222222 * 22222222222222
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: integer multiplication
>>>

This of course no longer happens, so the whole idea of OverflowError seems
slightly outdated.
msg215873 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-04-10 12:32
Looks like this harmless-looking issue has opened up a can of worms.

> the whole idea of OverflowError seems slightly outdated.

Well, not entirely.  It's still got a place for overflow of mathematical operations, and I think it's clearly the correct exception in that case (indeed, it's about the only case I can think of where OverflowError is clearly the correct exception).

>>> from math import exp
>>> exp(5000.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: math range error

I'd treat conversion to float as a special case of this; that is, the following is another case where OverflowError is (IMO) the right thing to use:

>>> float(10**1000)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to float

A less clear-cut case:

>>> sqrt(10**500)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to float

This one still seems okay to me, even though the sqrt result *is* within the floating-point range.  You can read this as two operations: a conversion from the integer input to a float, followed by a floating-point square root operation, and it's the intermediate result that overflows.

And a case that's clearly wrong:

>>> 53 >> 10**100  # Why not simply return 0?
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: Python int too large to convert to C ssize_t

Another historical oddity is that OverflowError inherits from ArithmeticError (along with FloatingPointError and ZeroDivisionError), which supports its use for arithmetic and mathematical operations that overflow their target type.

>>> OverflowError.__mro__
(<class 'OverflowError'>, <class 'ArithmeticError'>, <class 'Exception'>, <class 'BaseException'>, <class 'object'>)


In the interests of moving this *particular* issue forward, I'll revert to OverflowError for this patch.  I still personally think it's the wrong exception, but:

(1) It's not more wrong than before: we were already raising OverflowError for large positive values.
(2) It achieves the main aim of replacing an obscure error message with a more understandable one (which is what Nick wanted in the first place: msg210454).
(3) Given the lack of clarity about which exception types are appropriate where, I think we shouldn't be changing exception types unless/until there's a clear vision of where we want to end up.  Getting that clear vision may require a python-dev discussion.

We can still make the change from OverflowError to ValueError at some later point once it's clear that that's the right thing to do.  But converting from OverflowError to ValueError now, and then deciding later that that was wrong and converting back to OverflowError would be a horrible thing to do.
msg215877 - (view) Author: Stefan Krah (skrah) * (Python committer) Date: 2014-04-10 13:08
Mark Dickinson <report@bugs.python.org> wrote:
> > the whole idea of OverflowError seems slightly outdated.
> 
> Well, not entirely.  It's still got a place for overflow of mathematical operations, and I think it's clearly the correct exception in that case (indeed, it's about the only case I can think of where OverflowError is clearly the correct exception).

Indeed, I was focusing on integer arithmetic and assignments. For float
operations and conversions OverflowError is quite natural.

> (3) Given the lack of clarity about which exception types are appropriate where, I think we shouldn't be changing exception types unless/until there's a clear vision of where we want to end up.  Getting that clear vision may require a python-dev discussion.

Using OverflowError for now sounds like a good plan.
msg215879 - (view) Author: Roundup Robot (python-dev) Date: 2014-04-10 13:29
New changeset 273e17260d25 by Mark Dickinson in branch 'default':
Issue #20539: Improve math.factorial error messages and types for large inputs.
http://hg.python.org/cpython/rev/273e17260d25
msg215881 - (view) Author: Mark Dickinson (mark.dickinson) * (Python committer) Date: 2014-04-10 13:35
Patch applied to the default branch (but with OverflowError instead of ValueError for large positive inputs).  I don't see a lot of benefit in applying the fix to the maintenance branches.  Closing this issue.
History
Date User Action Args
2014-04-10 13:35:30mark.dickinsonsetstatus: open -> closed
resolution: fixed
messages: + msg215881

versions: + Python 3.5, - Python 3.4
2014-04-10 13:29:52python-devsetnosy: + python-dev
messages: + msg215879
2014-04-10 13:08:56skrahsetmessages: + msg215877
2014-04-10 12:32:05mark.dickinsonsetmessages: + msg215873
2014-04-10 10:58:39skrahsetmessages: + msg215869
2014-04-10 03:38:49josh.rsetmessages: + msg215862
2014-04-10 01:06:35mark.dickinsonsetmessages: + msg215861
2014-04-09 23:56:18josh.rsetmessages: + msg215857
2014-04-09 22:11:06mark.dickinsonsetmessages: + msg215852
2014-04-09 21:26:13josh.rsetnosy: + josh.r
messages: + msg215847
2014-04-09 20:08:35mark.dickinsonsetassignee: mark.dickinson
2014-04-09 20:00:00mark.dickinsonsetfiles: + huge_factorial_input_v3.patch

messages: + msg215842
2014-02-07 22:28:16garybernhardtsetnosy: + garybernhardt
messages: + msg210571
2014-02-07 20:55:08skrahsetmessages: + msg210555
2014-02-07 19:10:08mark.dickinsonsetfiles: + huge_factorial_input_v2.patch

messages: + msg210528
2014-02-07 18:25:05pitrousetmessages: + msg210513
2014-02-07 18:23:30mark.dickinsonsetmessages: + msg210512
2014-02-07 18:21:45pitrousetmessages: + msg210511
2014-02-07 18:16:41mark.dickinsonsetmessages: + msg210510
2014-02-07 17:19:39hayposetmessages: + msg210497
2014-02-07 17:16:39mark.dickinsonsettype: behavior
components: + Extension Modules
versions: + Python 3.4
2014-02-07 17:08:27mark.dickinsonsetfiles: + huge_factorial_input.patch
keywords: + patch
messages: + msg210496
2014-02-07 15:24:31Gareth.Reessetnosy: + Gareth.Rees
messages: + msg210480
2014-02-07 14:41:28pitrousetnosy: + pitrou
messages: + msg210477
2014-02-07 11:52:43mark.dickinsonsetmessages: + msg210455
2014-02-07 11:52:01ncoghlansetpriority: normal -> low

messages: + msg210454
2014-02-07 11:40:26skrahsetnosy: + skrah
messages: + msg210453
2014-02-07 11:13:46mark.dickinsonsetmessages: + msg210451
2014-02-07 11:03:44hayposetpriority: low -> normal
nosy: + haypo
messages: + msg210450

2014-02-07 11:00:11mark.dickinsonsetnosy: + mark.dickinson
messages: + msg210449
2014-02-07 10:56:03ncoghlansetpriority: normal -> low
2014-02-07 10:54:29ncoghlancreate